Artificial Intelligence: How does the Viola-Jones Algorithm help in object detection?

Nivan Gujral
Level Up Coding
Published in
6 min readMay 15, 2020

--

All humans detect many objects on a daily basis such as cars, trees, and even other humans. Detecting objects are really common for us to do but have you ever thought about computers detecting objects as we do. That might sound crazy, but from phones to cars, some form of object detection is found within them to understand the real world. Object detection is helping AI reach its full potential and the Viola-Jones Algorithm is one way to help with detecting objects.

How does the Viola-Jones Algorithm work?

The Viola-Jones Algorithms’ main objective is to detect a face. For it to do so, it will initially look at the frontal view of a face. The algorithm will chunk the image into sections (the green box). It will examine each section and try to see if it sees any features of a face in there. If it does not see all of the features it will move a little bit further to another section and tries to find the features again. The feature that the Viola-Jones Algorithm mainly looks for in a face are eyes, mouth, nose, cheeks, and so on.

The face can represent a map and the face features can represent the landmarks on the map. The algorithm is like a person trying to find the area that has all the landmarks in one location.

Every time the algorithm detects a face it marks where it found it on the picture and keeps repeating the process until it has checked the entire picture. When the algorithm has finished scanning the face, there are a bunch of boxes that mark where the face is located. The algorithm will then find the average location where the face is located. How does the algorithm detect the features of a face?

What are the Haar-like features?

Haar-like features is a method that allows the algorithm to detect the features of a face. There are two main types of Haar-like features which are Edge features and Line features.

The Edge features look for features that have are light on one side and dark on the other side. An eyebrow could be an example of an edge feature because the eyebrow is darker than the skin above it.

The Line features look for features that have dark line in the middle which is surrounded by light sides around it. A mouth could be an example of a line feature because where the lips meet is darker than the lips themselves.

Detecting the features

When the algorithm looks at an area it first maps out the darkness of the arena from 1–10 where 1 is the lightest and 10 is the darkest. For the algorithm to determine which area is dark and which is light, it finds the average darkness of the area. If the average darkness is over 5, then the area is dark and if the average darkness in less than 5 then the area is light. For example in Edge features, the algorithm looks for one dark area and an area that is light next to it. This is how it finds each of the features on the face.

Finding the average of every single area would take a lot of time and computing power. In order to make this more efficient a system called internal imaging is used. Internal imaging finds the sum of each square starting from each point. In the image above, to find the value of the grey box, the algorithm adds up the values in the area that is above and to the right of it. Internal imaging creates a faster and more efficient way to find the features of a face. In order for the algorithm to identify the various features of a face, it needs to know what features have what percent darkness and lightness. This is accomplished by training the algorithm.

How to train the Viola-Jones Algorithm?

The Viola-Jones Algorithm trains by taking pictures of both faces and non-faces. The algorithm will first take the images of faces and looks to see what is the same between all of the images. For example, in the first image, the person has freckles on their face while the second one does not. This lets the algorithm know that not all faces will have freckles and therefore it will not look for it in the images. The way the algorithm detects these features is by using the Haar-like features and maps that there is a Haar-like feature at the part of the face. The non-face pictures teach the algorithm what is not a face so it can compare to what a face is and find more differences that make a face different than anything else.

What is cascading?

Every time the algorithm looks in each of the boxes to look for the features of a face, it would take a lot of time and computing power to look for the hundreds of features. This is where cascading comes in. Cascading works by looking for one feature and if that feature is not found within the box, the algorithm will move on to the next box. However, if the feature is found within the box, the algorithm will look for the next feature. Let's suppose the algorithm is looking for a nose. In the first box, it does not find the nose so then it starts to look at the next box. In the second box, the algorithm finds a nose so then it starts to look for a mouth. Cascading is helpful because instead of look for every feature even if one is not there, it will move on to the next box if one of the features is not present.

Growing applications of Object Detection

Object detection provides intelligence to applications that most of us use every day and is transforming the way that computers can interact with the world. Currently, Object detection helps people and companies in various ways. Object detection is being used in self-driving cars and drones to help them move without hitting an object. It is now having widespread use in phones for facial recognition. There is a lot that object detection can help with in the future. Though the Viola-Jones Algorithm can be applied to many different fields, it is not the most powerful object detection model that we currently have. The Viola-Jones algorithm does not have the function to figure out an object that it has never seen before as deep learning models can accomplish. Even with the Viola-Jones Algorithm’s limitation, it is one of the foundational tool towards unlocking Object Detection’s full potential.

Hi, I am Nivan Gujral! I am a 13-year-old, who is passionate about the intersections between AI and aerospace. Send me an email at nivangujral@gmail.com if you would like to further discuss this article or just talk.

--

--