Create your own ‘CamScanner’ using Python & OpenCV

Have you ever wondered how a ‘CamScanner’ converts your mobile camera’s fuzzy document picture into a defined, properly lit and scanned image? I have and until recently I thought it was a very difficult task. But it’s not and we can make our own ‘CamScanner’ with relatively few lines of code. (compared to what we have in mind)

Shirish Gupta

Published in

Level Up Coding

7 min readAug 10, 2020

Thanks to Soham Mhatre for contributing significantly towards this article.

Computer Vision and why the buzz?

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. Basically, it’s a scientific field to make the computers understand a photo/video similar to how it will be interpreted by a human being.

So why the buzz

Advancement in AI and Machine Learning has accelerated the developments in computer vision. Earlier these were two separate fields and there were different techniques, coding languages & academic researchers in both of them. But now, the gap has reduced significantly and more and more data scientists are working in the field of computer vision and vice-a-versa. The reason is the simple common denominator in both the fields— Data.

At the end of the day, a computer will learn by consuming data. And AI helps the computers to not only process, but also improve it’s Understanding/Interpretation by trial-and-error. So now, if we can combine the data from images and run complex machine learning algorithms on it, what we get is an actual AI.

One modern company who has pioneered the technology of Computer Vision is Tesla Motors

Tesla Motors is known for pioneering the self-driving vehicle revolution in the world. They are also known for achieving high reliability in autonomous vehicles. Tesla cars depend entirely upon computer vision.

What are we gonna achieve today?

For this article we will concentrate only on Computer Vision and leave Machine Learning for some later time. Also we will just use just one library OpenCV to create the whole thing.

Index

What is OpenCV?
Preprocess the image using different concepts such as blurring, thresholding, denoising (Non-Local Means).
Canny Edge detection & Extraction of biggest contour
Finally — Sharpening & Brightness correction

What is OpenCV

OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage and then Itseez. The library is cross-platform and free for use under the open-source BSD license. It was initially developed in C++ but now it’s available across multiple languages such Python, Java, etc.

Start with Preprocessing

BLURRING

The goal of blurring is to reduce the noise in the image. It removes high frequency content (e.g: noise, edges) from the image — resulting in blurred edges. There are multiple blurring techniques (filters) in OpenCV, and the most common are:

Averaging — It simply takes the average of all the pixels under kernel area and replaces the central element with this average

Gaussian Filter — Instead of a box filter consisting of equal filter coefficients, a Gaussian kernel is used

Median Filter — Computes the median of all the pixels under the kernel window and the central pixel is replaced with this median value

Bilateral Filter — Advanced version of Gaussian blurring. Not only does it removes noise, but also smoothens edges.

THRESHOLDING

In image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images. This is generally done so as to clearly differentiate between different shades of pixel intensities. Most common thresholding techniques in OpenCV are:

Simple Thresholding — If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black)

Adaptive Thresholding — Algorithm calculates the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.

Note:Remember to convert the images to grayscale before thresholding

GreyScaled on Original Vs Adaptive Gaussian

DENOISING

There is another kind of de-noising that we conduct —Non-Local Means Denoising. The principle of the initial denoising methods were to replace the colour of a pixel with an average of the colours of nearby pixels. The variance law in probability theory ensures that if nine pixels are averaged, the noise standard deviation of the average is divided by three. Hence giving us a denoised picture.

But what if there is edge or elongated pattern where denoising by averaging wont work. Therefore, we need to scan a vast portion of the image in search of all the pixels that really resemble the pixel we want to denoise. Denoising is then done by computing the average colour of these most resembling pixels. This is called — Non-Local Means Denoising.

Use cv2.fastNlMeansDenoising for the same.

Original vs Gaussian Blurred vs Non-Local Means Denoised

Canny Edge detection & Extraction of biggest contour

After image blurring & thresholding, the next step is to find the biggest contour (biggest bounding box) and crop out the image. This is done by using Canny Edge Detection followed by extraction of biggest contour using four-point transformation.

CANNY EDGE

Canny edge detection is a multi-step algorithm that can detect edges. We should send a de-noised image to this algorithm so that it is able to detect relevant edges only.

FIND CONTOURS

After finding the edges, pass the image through cv2.findcontours(). It joins all the continuous points (along the edges), having same colour or intensity. After this we will get all contours — rectangles, spheres, etc

Use cv2.convexHull() and cv2.approxPolyDP to find the biggest rectangular contour(approx) in the photo.

Original vs Original with biggest bounding box

EXTRACTING THE BIGGEST CONTOUR

Although we have found the biggest contour which looks like a rectangle, we still need to find the corners so as to find the exact co-ordinates to crop the image.

For this first you pass the co-ordinates of the approx rectangle(biggest contour) and apply an order points transformation on the same. The resultant is an exact (x,y) coordinates of the biggest contour.

Four Point Transformation — Using the above (x,y) coordinates, calculate the width and height of the contour. Pass it through the cv2.warpPerspective()to crop the contour. Voila — you have the successfully cropped out the relevant data from the input image

Notice — How well the image is cropped out even though its a poorly lit and clicked image

Finally — Sharpening & Brightness correction

Now that we have cropped out the relevant info (biggest contour) from the image, the last step is to sharpen the picture so that we get well illuminated and readable document.

— For this we use hue, saturation, value (h,s,v) concept where value represents the brightness. Can play around with this value to increase the brightness of the documents

— Kernel Sharpening - A kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image