A Guide to EDA in Python

Important questions to ask about your data during exploratory data analysis

Megan Dibble
Level Up Coding
Published in
8 min readApr 7, 2020

--

Creating machine learning models is cool. It’s tempting as a beginner (I know from experience) to jump straight to the cool part — after all, it’s the most important part too right?

What if you skipped straight to the climax of a movie that you’ve never seen before? Would you be confused? Would it even be enjoyable?

Just as the first hour of character development is foundational in a movie, exploratory data analysis (EDA) is a crucial first step towards a good data science project.

Photo by True Agency on Unsplash

It’s time to grab your favorite blanket, snacks, and sweats and cozy up with your data. By the end of EDA, you should know your dataset as well as you know the characters of your favorite movies.

For this article, I made a list of the questions I try to answer when I conduct EDA at the beginning of a data science project. I’ve provided code snippets (all in Python, using the Pandas library) to give examples of some code you can run to answer these questions.

Action! (Photo by Jakob Owens on Unsplash)

0. Questions To Ask Before You…

--

--

Data Journalist @ Alteryx. I mostly write about data science and career advice. Occasionally I’m funny. Find me on LinkedIn!