A Guide to EDA in Python
Important questions to ask about your data during exploratory data analysis
Creating machine learning models is cool. It’s tempting as a beginner (I know from experience) to jump straight to the cool part — after all, it’s the most important part too right?
What if you skipped straight to the climax of a movie that you’ve never seen before? Would you be confused? Would it even be enjoyable?
Just as the first hour of character development is foundational in a movie, exploratory data analysis (EDA) is a crucial first step towards a good data science project.
It’s time to grab your favorite blanket, snacks, and sweats and cozy up with your data. By the end of EDA, you should know your dataset as well as you know the characters of your favorite movies.
For this article, I made a list of the questions I try to answer when I conduct EDA at the beginning of a data science project. I’ve provided code snippets (all in Python, using the Pandas library) to give examples of some code you can run to answer these questions.