Git for Data Science: Working with Git on Oracle Data Science - Part 1

Author: Philip Godfrey 

What is Git?

Git is a version control system that allows you to track changes made to a set of files, making it perfect for collaboration between teams, and allows you to revert to previous version of the files as needed.

Implementing version control your code is essential so you can keep track of any changes as you work through your various data science projects.

In this section, we will work through the steps to establish a simple connection between Oracle Data Science Cloud Service and a remote Git repository.

 

=========================================================

Pre-Requisite - Authenticating to GitHub

You must create a personal access token to use as a password when authenticating to GitHub on the command line using HTTPS URLs.

Personal Access Tokens

Personal access tokens are intended to access GitHub resources on behalf of yourself. To access resources on behalf of an organization, or for long-lived integrations, you should use a GitHub App.

Generating a Public Access Token

Once logged into GitHub, click on your profile (upper-right-corner) and click on Settings.

Navigate to Developer Settings -> Personal Access Tokens -> Tokens (classic) -> Generate New Token (classic)




Provide information including:

1.    In the "Note" field, give your token a descriptive name.

2.    To give your token an expiration, select Expiration, then choose a default option or click Custom to enter a date.

3.    Select the scopes you'd like to grant this token. To use your token to access repositories from the command line, select repo. A token with no assigned scopes can only access public information. For more information, see "Scopes for OAuth Apps".

4.    Click Generate token.

5.    Optionally, to copy the new token to your clipboard, click

 

============================================================

Oracle Data Science

 

Using Git extension in JupyterLab Notebook Sessions

You can use the file browser in JupyterLab to view the Git repository and a terminal window to execute Git commands as you would with any Git repository.

Alternatively, you can use the Git interface by clicking Git in the navigation panel to make authenticating users, creating branches, committing, and pushing changes, and cloning easier.


Clone Repository

For the purposes of testing, I’ve created a repository named “DataScience” within my personal GitHub account, however the same process applies for any other repository you may be working with.


In this example we will look to clone this existing repository, and using HTTPS, although there are multiple methods to cloning repositories, including SSH and the recently introduced GitHub CLI. 

To clone, click on Code and tab to HTTPS and copy the URL provided on screen.


In Oracle Data Science, paste in the URL and click clone.


You will be prompted to enter your credentials, which is your username and the personal access token (in place of a password) which should then clone the repository.


You can confirm this in the left-hand menu, which should now contain the folder of the repository cloned (DataScience) and any files (Readme.md)


In the next blog we will focus on some Git practices, including creating branches, staging changes in Data Science and then pushing back to Git repository.






 


Comments