Data Science Toolkit: Part 4 - Top 10 tips using Oracle Data Science Notebooks

Author: Philip Godfrey

In my years as a Data Scientist, and working with data in general, I’ve picked up a lot of tips and tricks along the way. My latest blog series is looking to share some of those I’ve found particularly helpful, with the hope that you could apply some of these to make your day-to-day work life a little easier.

The earlier blogs in this series looked at utilizing Oracle-ADS to help speed up EDA, and Data Preparation and Enrichments.

My fourth and final blog in the series looks to provide tips, shortcuts and general best practice that I’ve picked up and found particularly helpful within Oracle Data Science Notebooks.

What is Oracle Data Science?

For those who don’t know already, Oracle Data Science Platform is a fully-managed platform for teams of data scientists to build, train, deploy, and manage machine learning models using Python and open-source tools.

It uses a JupyterLab-based environment to experiment and develop models, with a wide range of libraries available pre-packaged as conda enviornments. A few examples are below.

My top 10 tips when using Oracle Data Science Notebooks

1. Organize your code: The JupyterLab environment allows the use of headings, comments, and markdown cells to structure your notebook and make it more readable. This will help you and others understand the purpose and flow of your code.

2. Use code cells wisely: Code cells allow you to break your code into logical segments using separate cells. This makes it easier to run specific parts of the code or make changes without rerunning the entire notebook. Breaking up your notebook into cells for loading libraries, importing data, data cleaning and enrichments are all helpful as these may be ran one or many times, depending on requirements.

3. Document your thought process: Include descriptive text, images, and graphs throughout your notebook to document your thought process, explain your methodology, and showcase your results.

4. Utilize conda environments: Oracle Data Science has several “conda” environments provisioned. These allow users to run notebook sessions in different kernels, providing flexibility and each kernel has a set of Python libraries associated with it.

To install a conda environment in the terminal, run:

conda install -s generalml_p37_cpu_v1

To activate a conda environment in the terminal, run:

                conda activate /home/datascience/conda/generalml_p37_cpu_v1

5. Utilize Notebook Session Metrics within OCI: there are metrics available within the OCI console of each notebook session.

Available metrics include:

· CPU Utilisation

· Memory Utilisation

· Network Receive/Transmit Bytes

6. Metrics within Data Science Notebook Terminal: It is also possible to view these metrics within the Notebook Session Terminal. Within the terminal, use the ‘top’ command and you will be returned with the performance metrics of the notebook session.

7. Version control of Data Science projects: Data Science Notebooks provide built-in support for Git integration, allowing you to clone, commit, push, and pull code repositories directly from your notebook interface. This makes it easier to collaborate with teammates, keep track of changes, and maintain a version history of your notebook code.

8. Full integration with OCI Data Science Components: In addition to being a powerful tool for building AI/ML models in the cloud, Data Science Notebook Sessions have first-class integration with other OCI Data Science components like Jobs, Models and Deployments.

9. AutoML (Automated Machine Learning): The Data Science service also includes automated machine learning (AutoML) functionality, which helps in selecting and optimizing the best machine learning model for your specific task without the need for extensive manual tuning. This saves time and effort in the traditional trial-and-error approach, enabling data scientists to focus on higher-level tasks and insights.

Source https://blogs.oracle.com/coretec/post/oracle-auto-ml-use-inbuilt-expertise-to-develop-effective-machine-learning-model

10. Leverage keyboard shortcuts: Oracle Data Science includes several keyboard shortcuts that can greatly enhance your productivity, and below are a few of my favourites that you might try:

· Ctrl+Enter: Run the current cell.

· Shift+Enter: Run the current cell and move to the next cell.

· Alt+Enter: Run the current cell and insert a new cell below.

· Ctrl+/: Comment or uncomment the selected code.

· Esc+M: Convert the current cell to a Markdown cell.

· Esc+Y: Convert the current cell to a code cell.

Hopefully you have found some of the above useful, and if you have any tips of your own that aren’t included above, I’d love to hear about them in the comments!

For more information on Oracle Data Science Service, you can take a look at the Oracle documentation here.

The Oracle Alchemist: Turning Data into Insights

Search This Blog

Data Science Toolkit: Part 4 - Top 10 tips using Oracle Data Science Notebooks

Comments

Post a Comment