Data Driven Series: A Comprehensive Guide to Oracle Data Labelling Service: Part 2

Author: Phil Godfrey

In the previous blog, we introduced the Data Labelling service, explaining what it is, any necessary pre-requisites to enable the service, and exploring the key terms.

The next blog in the series will work through creating a dataset in Oracle Data Labelling.

What is Data Labelling?

Data Labelling is the process of “identifying properties (labels) of documents, text, and images, and annotating them (labelling)”.

What are some examples?

What you can label is almost endless, it could be in the format of text, documents and even images.

The topic of a news article
The sentiment of a tweet
Objects identified within an image
and many more

Data Labelling Service

The OCI Data Labelling service is an OCI native service that allows customers and business users to leverage labelling functionality. This includes utilizing built-in functionality to

Create and browse datasets
View data records (text, images)
Apply labels for the purposes of building AI/ML models.

The service also provides interactive user interfaces designed to aid in the labelling process, with an interactive user interface to draw bounding boxes used for object detection within images.

Accessing the Data Labelling Service

In OCI, navigate to Analytics & AI, and under the Machine Learning subheading, you will find Data Labeling.

Once you click here, providing the necessary pre-requisites have been granted, you should see the overview of the Data Labelling service.

This has lots of useful resources for you to utilize, including video tutorials introducing the service in simple and easy to consume terms, through to more detailed and in-depth release notes and documentation.

On the left-hand side, you have access to three areas:

Overview – the page above
Datasets – area where datasets are stored
Work Requests – view any work requests initiated by the service

Datasets

These are critical components of the data labelling service, and are your data you want to label, whether that’s a set of documents, images, or text.

There are options available to you when working with datasets, you can either:

Create a dataset from scratch
Import a previously annotated dataset

Supported file formats

Dataset Type	Supported File Types
Images	JPEG, JPG and PNG
Text	CSV, TEXT and TXT
Documents	PDF, TIF, TIFF, JPEG, JPG and PNG

Create Dataset
For the purposes of demonstration, we will use Data Labelling to create a dataset for the Data Science department, who are working on a model to predict car park spaces in near real-time for a live-traffic app.

For this we can create a custom dataset, using images of a car park, to allow us to annotate these records to identify:
· Cars
· Spaces

Dataset Details

Name: a suitable name for the dataset
Description: a description for the dataset (optional)
Labelling Instructions: any instructions you’d like the labellers to be aware of (useful for collaboration with others on large datasets)
Dataset Format: Images / Text or Documents
Annotation Class: Single label / Multiple Label / Object Detection

Tags: apply tags to the dataset (optional)

The next stage we add files and labels.

These can be uploaded from a local directory, or selected from object storage, depending on where your images reside.

Note: if you’re selecting object storage, you will need to provide the relevant compartment, namespace and bucket

Below this you can provide the relevant images to be included in the dataset. As we’re working with images in this example, we can drag-and-drop these files in the drop-zone.

We also need to provide labels for the identified objects in the image, in our case we will have two labels, one for a car and another for spaces.

Review

The final step of the dataset creation is the review stage, at this point you can edit the dataset details or the files and labels, but if you’re happy, click Create and this will be created as a dataset in the labelling service.

The dataset will then be creating - while creating, the dataset records (images) are generated and added into the dataset.

Once all records have created successfully, the Dataset will then show in an “Active” state, if everything has worked correctly.

Join the next blog as we begin to explore the labelling service in more detail by annotating our imagery data to create our labelled dataset.

The Oracle Alchemist: Turning Data into Insights

Search This Blog

Data Driven Series: A Comprehensive Guide to Oracle Data Labelling Service: Part 2

Comments

Post a Comment