Author: Philip Godfrey
What is Earth Observation data?
Earth Observation (EO) data, as defined by the EU Science Hub as data that is “used to monitor and assess the status of, and changes in, the natural and manmade environment”
With human civilization having an increasingly powerful influence on the Earth system, now seemed like the perfect time to explore what can be done with EO data in Oracle Cloud.
How is Earth Observation data captured?
The process of gathering observations of the Earth's surface and atmosphere via remote sensing instruments. The data is typically in the form of digital imagery.
There are many ways to gather this type of information, through various remote sensing platforms. Instantly with Earth Observation we think of space, but this isn’t the case. It can be through Drone / Aerial or Satellites.
Using Earth Observation data in Oracle Cloud
In this blog, we will explore all around the Oracle world in terms of technology and will utilise a number of Oracle platforms:
• Our journey begins in the Autonomous Data Warehouse (ADW) - to store the data.
• We then move onto Oracle Data Science – to explore the data and utilise Machine Learning within the Earth Observation satellite imagery data.
• We come back into ADW – to store the results back to the ADW database.
• To visualize results, we can present them in Oracle Analytics Cloud (OAC).
Our aim is to create a Machine Learning model that given an image – it will return a prediction to the end user, to classify a geospatial image into one of the following land-use groups.
Load the data into Autonomous Data Warehouse
For the purposes of the blog, we will be working with readily available data from Kaggle, via the EuroSat Dataset.
This contains 27,000 images from Sentinel-2 satellite, relating to 10 different types of images.
To load the data, we have downloaded in its raw format (2GB) and loaded the data from csv using the “Data Load” utility in ADW. An alternative could have been to using the Kaggle API within Oracle Data Science, but for ease we've used the Data Load functionality.
Analytics in Oracle Data Science
Oracle Data Science is a fully managed and serverless platform for Data Science teams to build, train, and manage machine learning models in Oracle Cloud Infrastructure.
Within Data Science, we create a notebook session and can write and execute Python code using the machine learning libraries in the JupyterLab interface.
We start by importing packages that we’re going to use:
• Cx_Oracle to connect with the ADW database.
• Pandas for data manipulation.
• Tensorflow library for machine learning.
Connect to ADW
We then need to connect to the ADW database, which we can do so using the connection string here.
Now we’re connected to the database, we want to select an image to pass through our machine learning model.
Each image is stored as a "BLOB" in the database and can then be read and presented on screen as an image.
Machine Learning Model
As it’s a supervised Machine Learning model, where we learn from historic data, we need to split our data and will generate a train, validation, and test dataset.
We will use these datasets to help evaluate the performance of our model.
We then compile the machine learning model.
This learns from our training and validation data sets.
We can then use our test set to evaluate the performance of the model to understand the accuracy e.g., labelled outcomes from the machine learning model versus previously unseen data.
Accuracy returns at almost 97% - which is pretty good!
Make sure to keep an eye out for Part 2 of this blog, where we will explore understanding model performance, applying our model against unseen data, and outputting the results into Oracle Analytics Cloud.
Comments
Post a Comment