Principal Component Analysis for Dimensionality Reduction in Python
This article will focus on a walkthrough for principal component analysis in Python.
Table of Contents:
- Introduction
- Principal component analysis (Overview)
- Principal component analysis in Python
- Conclusion
Introduction
One of the main reasons for writing this article became my obsession to know the details, logic, and mathematics behind Principal Component Analysis (PCA). A majority of the online tutorials and articles about principal component analysis in Python today focus on showing learners how to apply this technique and visualize the results, rather than starting from the very beginning as to why do we even need it in the first place? What is it with our data that we need to shrink the number of features or group them?
Let’s start from the beginning. What are you going to do with the dataset you have even if you don’t do any dimensionality reduction? I guess you are trying to feed it to a machine learning algorithm right?
So that’s our first step. Our goal is to have an algorithm-friendly dataset. What do we mean by that?
When you have a lot of features, there are a few potential drawbacks:
- Your model will have a high degree of complexity
- They may cause a significant amount of noise