Author: Phil Godfrey
What is Data Labelling?
Data Labelling is the process of “identifying properties
(labels) of documents, text, and images, and annotating them (labelling)”.
What are some examples?
What you can label is almost endless, it could be in the format of text, documents and even images.
- The topic of a news article
- The sentiment of a tweet
- Objects identified within an image
- and many more
Data Labelling Service
The OCI Data Labelling service is an OCI native service that allows customers and business users to leverage labelling functionality. This includes utilizing built-in functionality to
- Create and browse datasets
- View data records (text, images)
- Apply labels for the purposes of building AI/ML models.
The service also provides interactive user interfaces
designed to aid in the labelling process, with an interactive user interface to
draw bounding boxes used for object detection within images.
Pre-Requisites
Administrators with appropriate privileges must create:
- Create a compartment to be utilized by the Data Labelling Service
- Specific object storage buckets created (utilized to store data labelling outputs)
- Associated policies in IAM for the data labelling service
- Dynamic Group to control access
An example of this is included below:
allow dynamic-group <group_name> to read buckets in
compartment <compartment_name>
allow dynamic-group <group_name> to read objects in
compartment <compartment_name>
allow dynamic-group <group_name> to manage objects in
compartment <compartment_name> where any
{request.permission='OBJECT_CREATE'}
Accessing the Data Labelling Service
In OCI, navigate to Analytics & AI, and under the
Machine Learning subheading, you will find Data Labeling.
Once you click here, providing the necessary pre-requisites have been granted, you should see the overview of the Data Labelling service.
This has lots of useful resources for you to utilize,
including video tutorials introducing the service in simple and easy to consume
terms, through to more detailed and in-depth release notes and documentation.
On the left-hand side, you have access to three areas:
- Overview – the page above
- Datasets – area where datasets are stored
- Work Requests – view any work requests initiated by the service
Datasets
These are critical components of the data labelling service,
and are your data you want to label, whether that’s a set of documents, images,
or text.
There are options available to you when working with datasets, you can either:
- Create a dataset from scratch
- Import a previously annotated dataset
Supported file formats
Dataset Type |
Supported File Types |
Images |
JPEG, JPG and PNG |
Text |
CSV, TEXT and
TXT |
Documents |
PDF, TIF, TIFF, JPEG,
JPG and PNG |
Join the next blog in this series where we will create a dataset and begin to work with the Data Labelling Service.
Comments
Post a Comment