Aside from buying a home, a car is the second-most expensive purchase you will ever make in your life. With that in mind, it’s a crucial decision to get right!
This is a presentation I’d created last year, and I wanted
to see if could combine my day-to-day work (Data Science) to understand if data
and machine learning could help me decide which car to buy next.
It seems the perfect fit for a blog series on Data Science
and Machine Learning, as it covers a range of analytics techniques and thought
processes you could apply to several use cases.
If you missed any of the previous blogs, don’t worry!
· Part 1
of the series, we covered identifying data and data quality checks,
you can find it here.
· Part 2
of the series covered Data Enrichments and Exploratory Data Analysis;
you can find it here.
But for this blog we’re moving onto Machine Learning
using built-in Oracle Data Mining in an Oracle Machine Learning Notebook.
Step 5: Machine Learning
Machine Learning can be confusing, so it is helpful to
begin by clearly defining the term. As defined by IBM, machine learning is:
“a
branch of artificial
intelligence (AI) and computer science which focuses
on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.”
Machine learning is an
important component of the growing field of data science. Using statistical
methods, algorithms can be trained to make classifications or predictions to
uncover key insights in data mining projects.
There are many services we use daily that rely heavily on
Machine Learning, such as personalised recommendations from websites, or
Netflix, to chatbots as a first port of call to resolving customer queries.
These insights subsequently
drive decision-making within applications and businesses, ideally impacting key
growth metrics.
Oracle database has over 30 fully scalable algorithms that
are commonly used by Data Scientists, including:
- regression
- classification
- time series
- clustering
- feature extraction
- anomaly detection
For the used car data, there are two use cases we might want to explore.
· Feature Extraction (Attribute Importance) – to determine which attributes (or fields) are most likely to be predictors of price of a used car
· Time
Series – to predict the prices of these used cars in the future,
to try and determine if they are likely to hold their value
Feature Extraction - Attribute
Importance
Oracle Data Mining supports
the attribute importance mining
function, which ranks attributes according to their importance in predicting a
target. In this case our target will be Price.
We provide the model
settings several fields, including:
· a MINING_FUNCTION - which Oracle Machine
Learning algorithm to use, in this case its ATTRIBUTE_IMPORTANCE
· a CASE_ID_COLUMN_NAME – which determines
how we want to segment our data; in this case we use one of our data
enrichments (RECORD_ID)
· a TARGET_COLUMN_NAME – which is what we
want to determine attribute importance of (i.e., the predictor) which is Price
The most important attributes determined by the algorithm
are:
•
Model
•
Engine Size
•
Age of Car (data enrichment)
•
MPG Enriched (data enrichment)
What if we
considered a newer car?
From our Exploratory Data Analysis, we know that 2020 is
the latest year in our dataset, so we can use 2020 data to help limit our options
down for which car to select.
The Attribute Importance model identified that key
attributes are, Model, Engine Size and MPG enriched
Let us look at Ford cars, to compare these with Price to see if we can compare these variables with price to see what we can get for our £15k budget:
Model vs Price
Our choices look to be limited to Fiesta, Focus and KA
based on my £15k budget on a newer car (2020)
Engine
Size vs Price
Our choices look to be limited to smaller engines – between 1L and 1.1L
Age vs
Price
There are plenty of choices that come within our budget,
but in this dataset, options have been limited to Ford Fiesta or Focus.
MPG vs Price
MPG for the most part seems to be
similar (around 55mpg). It appears that MPG generally decreases as the price
increases.
This is much better than my current MPG of around 37MPG!
What if we
predicted future price of newer Ford Fiesta/Focus?
We could use a TIME SERIES Forecast to predict the
prices of Focus and Fiestas, to determine if they are likely to hold value in
the future.
To do this, we need to use historic data to learn from:
•
Limit the dataset to cars £15k and under, to
match our budget
•
Include 5 years of historic data to learn from
(2015-20)
•
Predict 4 years into the future of what the
value could be
Ford Fiesta looks
to be slightly cheaper than the Focus and may give us greater options to choose
from, while suggesting it will hold its value, so this will be our choice!
Actions
Now we have undertaken some EDA and Machine Learning, we
now want to action these insights.
Considering my budget is £15k, the analysis suggests I will
be looking for:
•
a MAKE in more economical car ranges -
this will be Ford
•
an ENGINE SIZE of 1L or 1.1L
•
MILEAGE should be under 10k miles
•
a MODEL of Fiesta, using a forecast of future prices
Outcome
The final stage of this project, in our case it will be
using the data driven actions we have identified will be plugged into AutoTrader to help me pick out a used car.
After inputting those attributes into AutoTrader, we are returned with a Ford Fiesta at £14,200 which will leave us with £800 leftover.
This is just one example of a data-driven result using Oracle Machine Learning Notebooks and Data Science to help with a real-life problem.
Comments
Post a Comment