Essential Techniques to Fine-Tune Your Deep Learning Models

Squeeze Every Drop of Performance:It does not stop after finding the best model…

Published in

Level Up Coding

5 min readMay 5, 2024

When we try to train our machine learning models on some dataset, we would want that our model performs generally well on test data.

But just finding the best model for your use case does not guarantee that you get the best performance out of it.

Here are some of the techniques you can use to ensure that you extract the best out of your models.

Handling Bias and Variance

Suppose you divide your dataset into training and validation sets. In simple terms…

Bias is the training error i.e. the amount of error you get when you predict the output of your training data.

Variance can be thought of as the difference between the training error and the validation error.

Now based on the errors in the training and test set, the following cases can arise:

High Bias: It means that the model has underfit the data as it has a high training error.

High Variance: If the data has low bias, this can mean that the model has overfitted on your training data is not generalising well on new examples in the validation set.

Resolving High Bias

Using bigger or appropriate models: As your current model is underfitting on your training data, you might want to look towards more complex models which could model your decision boundary better.

You can also try to use appropriate models for the tasks, for example Convolutional Neural Networks are proven to perform much better than vanilla Artificial Neural Networks on tasks related to images.

Resolving High Variance

Using Regularisation: Regularisation is a technique through which we can limit overfitting of data by the model. There are many different types of regularisation techniques such as L1 & L2 regularisation, Dropout etc.

Training on large data: As your current model in overfitting on your current data, you can train it on more data if possible as through this, the model will be able to see more examples and will try to generalise better.

As a rule, always try to resolve high bias before resolving high variance

Train , Test and Validation Datasets

Whenever we train a machine learning model, we always try to divide our datasets into train, validation and test set, with…

train — Mainly used for training the model.

validation — Used to run cross validation while the model get trained on a training set.

test — Once the model has finished training, is used to evaluate the performance of the model on unseen data.

In small datasets(~10K examples) its all right to use a split like 70/15/15 on training/validation/test sets.

But on larger datasets(~1M examples) we would want to keep our ratio like 95/2.5/2.5, as we would like to fail fast, i.e building good ML models always involves experimenting with different parameters and models to find out the best fit.

With excessively large validation and test set, this process can slow down hence having 2.5%(~25K examples) is good enough to get an idea about the performance about the model, also decreasing our time to experiment.

Normalising the Inputs

It is advised to normalise the inputs to the machine learning model.

One of the most popular ways to normalise data is to subtract each data by its mean and dividing by its standard deviation.

mu is the mean and sigma is the standard deviatio

This is done because if in case the scale of the different input variables vary by a lot, our loss function could look like this:

This is bulged out and causes to slow down the gradient descent. Instead , when we normalise the data, our loss function could look like:

As we can see, the second curve is a lot steeper and helps the algorithm to reach minimum faster.

Note: Please standardise the test set using the same mean and standard deviation used in the training set as we would want that our data goes through the same transformations in training and testing.

Initialisation of Weights

I have talked about why we should initialise the weights randomly instead of zeros in the below article:

Neural Network Training: Why Random Weights Beat Zeros

A brief intro into Weight Initialisation for Optimal Performance

medium.com

But right weight initialisation can also help partially with the problem of Vanishing and Exploding Gradients!

The intuition behind this follows from the fact that if we consider n inputs coming from the previous layer onto a node in current layer, the equation at node becomes:

Z = Sum(w * x) from 1 to num_neurons

where X are the inputs and W is the weights corresponding to the inputs.

Now we would want a smaller Z and hence smaller w. Hence, a trick to do this is to make variance of W…

variance = (1/(Number of Neurons in previous layer))

For example, in case of ReLU activation function we intialise the weights like:

W = numpy.random.randn((shape))*sqrt(1/num_neurons in previous layer))

In this article , I have glossed over some of the techniques which will help you to get the best results out of your deep learning model. Hope this helps you optimise build better models!

If you like this content, please give a clap. . I will be writing about different things I learn and posting regularly. You can even comment on what you would like to see in the coming weeks. Happy coding…