The Role of Mathematics in Machine Learning

Balaka Biswas
Level Up Coding
Published in
6 min readMar 13, 2020

--

“I have my interview tomorrow. Hopefully, I’ll get this internship.”

“What is this internship on ?” I asked. “Machine Learning.”

The next day, I came to know that he got rejected within 2 hours of his interview. Why? Because the interviewer had asked him to derive how Gini impurity or information gain decided the best split in a decision tree. And, he failed to do so. Why? Because, never, during the 4 months he had dedicated to Machine Learning, had he even bothered to understand the underlying mathematics behind the algorithms that he so fluently used in his code.

Most of the times it happens that people who are new to Machine Learning are quite intrigued by the concept of mathematics. Because after all, this is not any form of primary mathematics. But, it isn’t rocket science either.

“What’s the use of learning the mathematics behind machine learning algorithms? Why not use the libraries available in Python and R to build models!”

If this question has crossed your mind, then trust me, you’re not alone. As mentioned, a vast array of libraries exist to perform various machine learning tasks. So, it’s very common to avoid the mathematical part of the field. And it’s obvious that people coming from non-science, non-economics or non-technical backgrounds tend to face this dilemma even more.

Just that. It doesn’t hurt like a speeding taxi.

So, why worry about the Math?

Talking from the aspect of Machine Learning, there are many reasons why the mathematics behind this subject is vital. Here, I’ll talk about a few scenarios where mathematics is an integral part of the topic:

  1. First and foremost, a model/algorithm is the heart of any Machine Learning project. If you know your math, selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features, would be an absolute cakewalk for you.
  2. For a particular algorithm, choosing the parameter settings and validation strategies, and knowing why the values you chose to throw up a different accuracy than your friend’s model.
  3. Understanding the behavior of a Machine Learning model on varying data is one of the chief foundations one should be aware of. Identifying underfitting and overfitting by understanding the Bias-Variance trade-off is impossible without knowing mathematics.
  4. No machine learning model is perfect, because machine learning is a continuous and transient process. So, it is not possible for an ML model to reach 100% accuracy (much like a Carnot engine, you know). If it does, we go back and check if our model or dataset is faulty or not. But how will you measure and eradicate the error? Once again, you must have a good grasp of Mathematics, especially Calculus.
  5. Let us now talk about that aspect of Data Science, without which Machine Learning wouldn’t be possible: Data. Is it always that everything that is present in your data must be fed to your Machine Learning model? How do we know if certain features have no relation or say in your result/prediction? Most of the times when you will work on real life, industrial projects, the data that you will receive won’t be palatable enough.

Suppose you have circulated the feedback form of a workshop you conducted. Isn’t it a bit weird that everyone will fill all the entries ?

If your data is missing values, do you expect your model to run fine? How will you impute these values? How will you find data that doesn’t match the convention (suppose a string in an ‘integer only’ column)? How will you search for data that deviates heavily from all other data points (outliers)?

So many questions. One answer. MATHEMATICS. (Specifically statistics in this case)

I am utterly confused. Where do I start?

I feel you. Been there, felt that :-)

But, just because I said Machine Learning is incomplete without mathematics, doesn’t mean you start solving every mathematics book you can lay your hands on. There are a few spheres of Mathematics that are more significant in your Machine Learning journey.

Let us look at exactly how much each sphere of mathematics matters :

Rough approximation

You can understand that Linear Algebra and Statistics are the most powerful and influencing topics.

Let us explore why we must know these topics and what role they play in Machine Learning.

LINEAR ALGEBRA: “Linear Algebra is the mathematics of the 21st century”. I will talk about very simple foundation topics of Machine Learning which requires you to know Linear Algebra specifically. In ML, Linear Algebra comes up everywhere. Topics such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigen decomposition of a matrix, Symmetric Matrices, Matrix Operations, Projections, Eigenvalues & Eigen vectors, Vector Spaces and Norms are needed for understanding the optimization methods used for Machine Learning.

Principal Component Analysis (PCA)

STATISTICS AND PROBABILITY: The power of statistics in Machine Learning, especially in Data Analytics, can’t be fathomed. How will you understand how well your data will perform when fed to an algorithm? That is exactly what statistics helps you in. What about probability? Suppose, in a dataset, you want to know that, given the condition that you earn more than 20 lacs per annum, what is the possibility that you will buy a car? Does the framing of the question look similar? Correct. It is a question based on Bayes theorem, the underlying mathematics behind the Naive Bayes classification algorithm. This simple question will change the way your classifier will look at your data. And, it is simple Probability. Nothing over the top. To cite some more use cases, some of the fundamental Statistical and Probability Theory needed for Machine Learning are: Combinatorics, Probability Rules & Axioms, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian) etc.

Naive Bayes algorithm

CALCULUS: Honestly, when I was in school, calculus was my favourite topic. It still continues to be my #1. Because once you look at calculus as something to understand and implement, rather than just mug up, you’ll find a different solace in it. The most famous example one could cite about calculus is the wide application of differentiation in regression problems. Let us talk about Linear Regression. You know the main goal in regression is to find the best fit curve by minimizing the cost function, so that the hypothesis, h(x) is closer to y. To achieve this we keep on reducing the value of J(theta) until we end up at a local minimum. This is called Gradient Descent.

Gradient descent algorithm

The derivative term in the gradient descent formula is nothing but the slope of the curve formed by the cost function. The sign of this tangent tells us from which direction our algorithm has started its descent, and allows us to verify whether our algorithm(for minimizing the theta value) is correct or not.

Other major sub domains under Calculus are Laplace Smoothing, Lagrange’s Multipliers(very very important to understand Support Vector Machines), Directional Gradient, Hessian, Jacobian Distribution etc.

Trust me, Mathematics is not as tough as it seems. Once you understand the ultimate potential it holds, there’s no turning back.

Why not prove the above scenario wrong and book a seat in the empty stands? ;-)

--

--

Software Engineer with an unending passion for Data Science and Deep Learning