De-Blurring images using Convolutional Neural Networks along with code

Anurag Pendyala
Level Up Coding
Published in
7 min readJun 19, 2020

--

CNNs can be used to extract some hidden features from an image and “de-blur” them.

Note that this article assumes that the reader is familiar with the working and functioning of CNNs.

Machine Learning algorithms can be used to convert a blurred image(left) to a better quality one(right). The image is taken from https://unsplash.com/photos/c168jRbeIEM by Nathalia Segato.

Theory

Convolutional Neural Networks are a special type of neural network used on images. These networks tend to extract some hidden features from the images which may or may not be visible to the human eye. Therefore, they are widely used in many Computer Vision Applications like object detection, object recognition, object tracking, object localization, and many more.

After understanding how and where CNNs are used, it is important to know how useful are they in this application of de-blurring an image. Auto-Encoders use these CNNs effectively to solve this problem. More about them is coming ahead.

Auto-Encoders are used to take an input image and then store the details of that image into a different format whose size is smaller than the size of the image. These stored details can then later be used to recreate either the same image or a different image based upon the input image. Auto-Encoders is the fundamental concept behind image recreation as well as generating new images and hence, are used in many General Adversarial Networks as well.

An Auto-Encoder, on a whole, has 3 sections, an encoder, a hidden layer, and a decoder. The encoder takes the input, processes it, extracts features, and then stores the data in the hidden layer. A decoder does the opposite of what an encoder does. It takes the data from the hidden layer and then recreates an image using the same.

A basic outline of an Auto Encoder

For this article, the Encoder consists only of various Convolutional Layers of a different number of filters. These layers extract features from the input image, which is a blurred image, and then transfers these features onto a hidden layer.

Again, in this case, the hidden layer consists of a convolution layer and the input to this layer is the feature map or output of the encoder.

The data from the hidden layer is passed on to the Decoder. Since the encoder involves convolution layers, it makes sense to deconvolve to get back an image similar to the input image. In this case, it is the de-blurred image. In deconvolution of images, you take a kernel with weights, similar to a convolution layer, and multiply it with the intensity of a single pixel from the feature map. This new matrix replaces the pixel in the feature map. The weights for the kernels of each layer are learnt during the process of training the overall model.

This GIF gives a hand-waving explanation on how deconvolution works. The feature map is deconvolved to achieve the matrix shown above.

Therefore, the decoder, at least in this case, involves only the Deconvolution Layer or sometimes even called as Convolution Transpose Layer. The Deconvolution layers in the decoder have similar parameters and attributes to those chosen for the encoder.

Now we have the architecture of the autoencoder ready. But to train the model, we need to choose a loss function as well. There are quite a few loss functions available to achieve the goal. But, we shall concentrate on the Mean Squared Error and the Binary Cross-Entropy loss function.

Mean Squared Error is the average of the squares of pixel-wise difference between the predicted image and the ground truth. It is represented by:

Binary Cross-Entropy loss is the sum of cross-entropies of pixels between the predicted image and the ground truth. It is represented by:

Both of these loss functions give us a rough idea of different the ground truth is from the predicted image. Minimizing both of these losses will help solve the adjustment of the weights and biases used in the CNN designed before.

Code

Now, after one can understand the theory, we can dig in into the implementation of the aforementioned concepts to finally achieve a Deep Learning model to de-blur an image. The code is implemented using the Keras package available in Python. You can refer to the complete code at this link: https://github.com/done-n-dusted/deblur-fashionmnist.

First, we need to load the dataset. I have used the Fashion-MNIST dataset that can be taken from the Keras package quite easily.

#loading the dataset
(X_train, y_train), (X_test, y_test) = datasets.fashion_mnist.load_data()
X_train, X_test = X_train/255, X_test/255

Since none of the images in the dataset are blurred, we need to create a new dataset from the existing images by blurring them. To blur them, we can use the GaussianBlur function available in the OpenCV package. The kernel size chosen is 3 x 3.

def add_noise(X):
result = []
for img in X:
noisy = cv2.GaussianBlur(img, (3, 3), 0)
noisy = np.clip(noisy, 0, 1)
result.append(noisy)
return np.array(result)
noise_train = add_noise(X_train)
noise_test = add_noise(X_test)

Now it is time to build our architecture. The architecture that I’ve built looks like the following:

The first conv2d_1, conv2d_2, conv2d_3 represent the encoder, conv2d_4 represent the hidden layer and all the other Conv2DTranspose layers represent the decoder. Remember that, in this case, the output dimension must be same as the input dimension. Hence, choose the kernel sizes carefully. This architecture can be coded as the following:

from keras import models, layers
model = models.Sequential()
#encodemodel.add(layers.Conv2D(64, (2, 2), strides = 1, padding = 'same', input_shape = (28, 28, 1)))
model.add(layers.Conv2D(32, (2, 2), strides = 1, padding = 'same'))
model.add(layers.Conv2D(16, (2, 2), strides = 1, padding = 'same'))
#latentmodel.add(layers.Conv2D(8, (2, 2), strides = 1, padding = 'same'))#decode
model.add(layers.Conv2DTranspose(16, (2, 2), strides = 1, padding = 'same'))
model.add(layers.Conv2DTranspose(32, (2, 2), strides = 1, padding = 'same'))
model.add(layers.Conv2DTranspose(64, (2, 2), strides = 1, padding = 'same'))
model.add(layers.Conv2DTranspose(1, (1, 1), strides = 1, activation = 'sigmoid', padding = 'same'))

Any of the loss functions, as explained above, can be taken. I personally found better results with Mean Squared Error for this dataset. Feel free to choose one that you wish. Now, we need to compile this model and then fit the data.

model.compile(loss = 'mse', optimizer = 'adam')model.fit(noise_train.reshape(-1, 28, 28, 1), 
X_train.reshape(-1, 28, 28, 1),
epochs = 100,
batch_size = 2000,
validation_data = (noise_test.reshape(-1, 28, 28, 1), X_test.reshape(-1, 28, 28, 1)))

After the model is fit to the input data, it is now time to predict the de-blurred images.

#utility function to pick samples to be testeddef get_samples(arr, n):
temp = random.sample(range(len(arr)), n)
result = arr[temp]
return result, temp
num = 15
org, temp = get_samples(X_test, num)
blur = noise_test[temp]
preds = model.predict(blur.reshape(-1, 28, 28, 1))
preds = preds.reshape(-1, 28, 28)
#plotting resultsplt.figure(figsize = (15, 15))
print('Original Images')
for i in range(num):
plt.subplot(1, num, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(org[i], cmap=plt.cm.binary)
plt.show()
plt.figure(figsize = (15, 15))
print('Blurred Images')
for i in range(num):
plt.subplot(1, num, i + 1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(blur[i], cmap=plt.cm.binary)
plt.show()
plt.figure(figsize = (15, 15))
print('Predicted Images')
for i in range(num):
plt.subplot(1, num, i + 1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(preds[i], cmap=plt.cm.binary)
plt.show()

The output of the code shown above looks like the image below. It can be observed that the predicted images are very close to the original images.

The predictions of the model trained to “de-blur” the image.

If you are trying to implement this on your own, I would recommend you to tweak some of the code from using a different dataset, optimizers for model compilation, architecture, activation functions, etc. You might get some better results than what I got. All The Best!

--

--

A Student at Penn State University interested solving in Machine Learning problems biased towards Computer Vision.