Author: Phil Godfrey
In the
previous blog, we introduced the Data Labelling service, explaining what it is,
any necessary pre-requisites to enable the service, and exploring the key
terms.
The next blog in the series will work through creating a dataset in Oracle Data Labelling.
What is Data Labelling?
Data Labelling is the process of “identifying properties (labels) of documents, text, and images, and annotating them (labelling)”.
What are some examples?
What you can label is almost endless, it could be in the format of text, documents and even images.
- The topic of a news article
- The sentiment of a tweet
- Objects identified within an image
- and many more
Data Labelling Service
The OCI Data Labelling service is an OCI native service that allows customers and business users to leverage labelling functionality. This includes utilizing built-in functionality to
- Create and browse datasets
- View data records (text, images)
- Apply labels for the purposes of building AI/ML models.
The service also provides interactive user interfaces designed to aid in the labelling process, with an interactive user interface to draw bounding boxes used for object detection within images.
Accessing the Data Labelling Service
In OCI, navigate to Analytics & AI, and under the Machine Learning subheading, you will find Data Labeling.
Once you click here, providing the necessary pre-requisites have been granted, you should see the overview of the Data Labelling service.
This has lots of useful resources for you to utilize, including video tutorials introducing the service in simple and easy to consume terms, through to more detailed and in-depth release notes and documentation.
On the left-hand side, you have access to three areas:
- Overview – the page above
- Datasets – area where datasets are stored
- Work Requests – view any work requests initiated by the service
Datasets
These are critical components of the data labelling service, and are your data you want to label, whether that’s a set of documents, images, or text.
There are options available to you when working with datasets, you can either:
- Create a dataset from scratch
- Import a previously annotated dataset
Supported file formats
Dataset Type | Supported File Types |
Images | JPEG, JPG and PNG |
Text | CSV, TEXT and TXT |
Documents | PDF, TIF, TIFF, JPEG, JPG and PNG |
For the purposes of demonstration, we will use Data Labelling to create a dataset for the Data Science department, who are working on a model to predict car park spaces in near real-time for a live-traffic app.
For this we can create a custom dataset, using images of a car park, to allow us to annotate these records to identify:
· Cars
· Spaces
Dataset Details
Name: a suitable name for the dataset
Description: a description for the dataset (optional)
Labelling Instructions: any instructions you’d like the labellers to be aware of (useful for collaboration with others on large datasets)
Dataset Format: Images / Text or Documents
Annotation Class: Single label / Multiple Label / Object Detection
The next stage we add files and labels.
These can be uploaded from a local directory, or
selected from object storage, depending on where your images reside.
Note: if you’re selecting object storage, you will need to
provide the relevant compartment, namespace and bucket
Below this you can provide the relevant images to be included in the dataset. As we’re working with images in this example, we can drag-and-drop these files in the drop-zone.
We also need to provide labels for the identified
objects in the image, in our case we will have two labels, one for a car
and another for spaces.
Review
The final step of the dataset creation is the review stage,
at this point you can edit the dataset details or the files and labels, but if
you’re happy, click Create and this will be created as a dataset in the
labelling service.
The dataset will then be creating - while creating, the
dataset records (images) are generated and added into the dataset.
Once all records have created successfully, the Dataset will
then show in an “Active” state, if everything has worked correctly.
Join the next blog as we begin to explore the labelling
service in more detail by annotating our imagery data to create our labelled
dataset.
Comments
Post a Comment