Data Science with Python: Getting Started

The bare essentials to get up and running + Helpful Resources

Jason Dsouza
Towards Data Science

--

Photo by M. B. M. on Unsplash

Data Science has become a revolutionary technology that everyone seems to talk about. Hailed as the ‘sexiest job of the 21st century’, Data Science is a buzzword with very few people knowing about the technology in its true sense. While many people wish to become Data Scientists, it is essential to see the real picture.

“Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data ” — Wikipedia

To put it simply, Data Science is a field of study and practice that’s focused on obtaining insights from data.

Data Scientists use math, statistics, and machine learning techniques to mine large data sets for patterns that can be used to analyze the past or even predict the future.

If you’re new to all this deep learning stuff, don’t worry — I’ll take you through it all step by step. If you’re an old hand, then you might want to skip ahead a few posts. I do however assume that you’ve been coding for at least a year, and also that (if you haven’t used Python before) you’ll be putting in the extra time to learn whatever Python you need as you go.

If you have a computer, an internet connection, and the will to put in the work, that’s about all you require. You don’t need much data, you don’t need university-level math, and you don’t need a giant data centre.

You’ll be surprised how easy it is to get started!

Do you need a GPU?

I’ve written a post on What is a GPU and do you need one in Deep Learning — it’s worth a read

GPUs (Graphics Processing Units) are specialized computer hardware created to render images at high frame rates. Since graphics texturing and shading require more matrix and vector operations executed in parallel than a CPU (Central Processing Unit) can reasonably handle, GPUs were made to perform these calculations more efficiently.

It so happens that Deep Learning also requires super-fast matrix computations. So researchers put two and two together and started training models in GPU’s and the rest is history. Deep Learning only cares about the number of Floating Point Operations (FLOPs) per second, and GPUs are highly optimized for that.

Source: Karlrupp

In the chart above, you can see that GPUs (red/green) can theoretically do 10–15x the operations of CPUs (in blue). This speedup very much applies in practice too.

If you would like to train anything meaningful in deep learning, a GPU is what you need — specifically an NVIDIA GPU (it’s the fastest one out there currently).

But despite how lucrative GPUs seem, you DON’T require one as you’re getting started. Unless your project is that advanced and requires a ton of calculations, your CPU can handle it pretty much. However, if you do want a GPU (if your computer doesn’t have one built-in), I would suggest you rent access to a computer that already has everything you need pre-installed and ready to go. Costs can be as little as US$0.25 per hour while you’re using it.

Code Editors & Environments

Visual Studio Code is my go-to code editor

In Data Science, the general advice (especially if you’re a beginner) is to use some sort of a beginner-friendly environment like Jupyter or Anaconda, but I use VS Code having configured it to support my Data Science projects.

Here’s an article to get started with Jupyter Notebook

Prior Knowledge on Python

Source: python.org

This mini-series on Data Science does assume you’ve been coding for at least a year. It doesn’t matter which language — as long as you’ve had good experience with programming, you should be fine. If you aren’t familiar at all with Python, don’t fret! I’ll link helpful resources along the way.

If you haven’t had any experience with code, I’d recommend learning Python. It’s (really very) easy and it’s the programming language we’ll be using in this Data Science Mini-Series.

Helpful Resources

Photo by Ed Robertson on Unsplash

Quick Resources to get gain (or refresh) your Python knowledge

Complete Beginners:

  1. Whirlwind Tour of Python
  2. Real Python
  3. Learn Python the Hard Way

Intermediate programmers:

  1. The Hitchhiker’s Guide to Python
  2. Derek Banas — Python in one video
  3. Design of Computer Programs

Advanced Programmers (but maybe new to Python):

  1. Learn x in y minutes
  2. David Beazley’s courses, tutorials, and books
  3. Raymond Hettinger

Python Numeric Programming:

This is worth a read whether you’re a beginner or advanced programmer. We’ll be using a lot of numeric programming throughout this series.

  1. Stanford NumPy tutorial
  2. Scipy Lecture Notes
  3. Python Data Science Handbook

As always, thanks so much for reading! Please tell me what you think or would like me to write about next in the comments. I’m open to criticism as well!

See you in the next post! 😄

--

--

I write libraries and sometimes blog about them | Top Writer | Creator of Caer, the Vision library for Python