Data Driven Series: A Comprehensive Guide to Oracle Data Labelling Service: Part 1


Author: Phil Godfrey


What is Data Labelling?

Data Labelling is the process of “identifying properties (labels) of documents, text, and images, and annotating them (labelling)”.

What are some examples?

What you can label is almost endless, it could be in the format of text, documents and even images.

  • The topic of a news article
  • The sentiment of a tweet
  • Objects identified within an image
  • and many more

 

Data Labelling Service

The OCI Data Labelling service is an OCI native service that allows customers and business users to leverage labelling functionality. This includes utilizing built-in functionality to

  • Create and browse datasets
  • View data records (text, images)
  • Apply labels for the purposes of building AI/ML models.

The service also provides interactive user interfaces designed to aid in the labelling process, with an interactive user interface to draw bounding boxes used for object detection within images.

 

Pre-Requisites

Administrators with appropriate privileges must create:         

  • Create a compartment to be utilized by the Data Labelling Service
  • Specific object storage buckets created (utilized to store data labelling outputs)  
  • Associated policies in IAM for the data labelling service
  • Dynamic Group to control access

An example of this is included below:

allow dynamic-group <group_name> to read buckets in compartment <compartment_name>

allow dynamic-group <group_name> to read objects in compartment <compartment_name>

allow dynamic-group <group_name> to manage objects in compartment <compartment_name> where any {request.permission='OBJECT_CREATE'}


Accessing the Data Labelling Service

In OCI, navigate to Analytics & AI, and under the Machine Learning subheading, you will find Data Labeling.


Once you click here, providing the necessary pre-requisites have been granted, you should see the overview of the Data Labelling service.


This has lots of useful resources for you to utilize, including video tutorials introducing the service in simple and easy to consume terms, through to more detailed and in-depth release notes and documentation.

On the left-hand side, you have access to three areas:

  • Overview – the page above
  • Datasets – area where datasets are stored
  • Work Requests – view any work requests initiated by the service


Datasets

These are critical components of the data labelling service, and are your data you want to label, whether that’s a set of documents, images, or text.



There are options available to you when working with datasets, you can either:

  • Create a dataset from scratch
  • Import a previously annotated dataset


Supported file formats

Dataset Type

Supported File Types

Images

JPEG, JPG and PNG

Text

CSV, TEXT and TXT

Documents

PDF, TIF, TIFF, JPEG, JPG and PNG

Join the next blog in this series where we will create a dataset and begin to work with the Data Labelling Service.


Comments