Data Identification and Classification using Machine Learning

A fancy data science example on how to classify flowers

Michael Whittle
Level Up Coding
Published in
12 min readApr 14, 2021

--

I have been studying Applied Machine Learning with Columbia Engineering through Emeritus. It’s a 5 month course which I really have enjoyed and would recommend.

The instructors are all really good and helpful, especially Robert Manriquez and Puneet Saraswat. They run the course “Office Hours” webinars to assist with complex assignments but also to provide practical walkthroughs of what we have been learning.

In my previous two articles “Predicting Titanic Survivors using ML” and “Predicting House Sale Prices using ML”, I used a great data science and machine learning resource called Kaggle. They provide free datasets for data scientists to practice with. There are also competitions to compare analysis and modelling for machine learning. For this tutorial I will use the, “Iris dataset” provided using “scikit-learn”.

In a few of the “Office Hours” webinars, Robert walked us through several datasets with different objectives/problems to solve. In the “Titanic — Machine Learning from Disaster” dataset, the objective was to predict who survived the Titanic disaster. In the “House Prices — Advanced Regression Techniques” dataset, the objective is to predict the house sale prices in The Ames, Iowa. This tutorial will have a classification objective/problem to solve.

I’m going to provide a practical introduction into data science and machine learning without delving into the maths behind the scenes (and the maths is complex!). Although the names may sound fancy, really they are just modern names for statistician and statistics respectively. Let’s not kid ourselves here, it’s mostly complex maths.

Step 1: Scope the project

The Iris dataset has various measurements taken from three different types of flowers: Setosa, Versicolor and Virginica. The scope of this project is to attempt to identify key measurements allowing us to classify the flowers.

Step 2: Gather the data

In this specific case it is very easy. It’s all provided to us in the “scikit-learn” library. We will load the “Iris dataset” from there.

--

--

Solution Architect — CCIE R&S #24223 | Full-Stack / Blockchain / Web3 Developer | Security Specialist | PyCryptoBot Creator