You Only Look Once(YOLO): Implementing YOLO in less than 30 lines of Python Code

Garima Nishad
Analytics Vidhya
Published in
5 min readMar 1, 2019

--

You Only Look Once is a real-time object detection algorithm, that avoids spending too much time on generating region proposals.Instead of locating objects perfectly, it prioritises speed and recognition.

Architectures like faster R-CNN are accurate, but the model itself is quite complex, with multiple outputs that are each a potential source of error. Once trained they’re still not fast enough to run in real time.

Consider a self-driving car that sees this image of a street. It’s essential for a self-driving car to be able to detect the location of objects all around it, such as pedestrians cars, and traffic lights. On top of that, this detection has to happen in near real time, so that the car can safely navigate the street. The car doesn’t always need to know what all these objects are? It mostly needs to know not to crash into them but it does need to recognize traffic lights bikes and pedestrians to be able to correctly follow the rules of the road.
In the image below I’ve used the YOLO algorithm to locate and classify different objects, there’s a bounding box that locates each object and a corresponding class label.

YOLO in action

So the next obvious question would be, How does YOLO work?

Say we have a CNN that’s been trained to recognize several classes, including a traffic light, a car, a person, and a truck. We give it two types of anchor boxes, a tall one and a wide one so that it can handle overlapping objects of different shapes. Once CNN has been trained, we can now detect objects in images by feeding at new test images.

Setting Up The Neural Network

What are anchor boxes?
YOLO can work well for multiple objects where each object is associated with one grid cell. But in the case of overlap, in which one grid cell actually contains the centre points of two different objects, we can use something called anchor boxes to allow one grid cell to detect multiple objects.

Anchor Boxes in action

In image above, we see that we have a person and a car overlapping in the image. So, part of the car is obscured. We can also see that the centres of both bounding boxes, the car, and the pedestrian fall in the same grid cell. Since the output vector of each grid cell can only have one class, then it will be forced to pick either the car or the person. But by defining anchor boxes, we can create a longer grid cell vector and associate multiple classes with each grid cell.
Anchor boxes have a defined aspect ratio, and they tried to detect objects that nicely fit into a box with that ratio. For example, since we’re detecting a wide car and a standing person, we’ll define one anchor box that is roughly the shape of a car, this box will be wider than it is tall. And we’ll define another anchor box that can fit a standing person inside of it, which will be taller than it is wide.

The test image is first broken up into a grid and the network then produces output vectors, one for each grid cell. These vectors tell us if a cell has an object in it, what class the object is, and the bounding boxes for the object. Since we’re using two anchor boxes, we’ll get two predicted anchor boxes for each grid cell. Some, in fact most of the predicted anchor boxes will have a very low PC(Probability of object being present in it) value.
After producing these output vectors, we use non-maximal suppression to get rid of unlikely bounding boxes. For each class, non-maximal suppression gets rid of the bounding boxes that have a PC value lower than some given threshold.

What is Non-Maximal Suppression (NMS)?
YOLO uses Non-Maximal Suppression (NMS) to only keep the best bounding box. The first step in NMS is to remove all the predicted bounding boxes that have a detection probability that is less than a given NMS threshold. In the code below, we set this NMS threshold to 0.6. This means that all predicted bounding boxes that have a detection probability less than 0.6 will be removed.

What is Intersection Over Union Threshold(IOU)?
After removing all the predicted bounding boxes that have a low detection probability, the second step in NMS, is to select the bounding boxes with the highest detection probability and eliminate all the bounding boxes whose Intersection Over Union (IOU) value is higher than a given IOU threshold. In the code below, we set this IOU threshold to 0.4. This means that all predicted bounding boxes that have an IOU value greater than 0.4 with respect to the best bounding boxes will be removed.

It then selects the bounding boxes with the highest PC value and removes bounding boxes that are too similar to this. It will repeat this until all of the non-maximal bounding boxes had been removed for every class. The end result will look like the image below, we can see that yellow has effectively detected many objects in the image such as cars and people.

YOLO Object Detection

Now that you know how YOLO works, you can see why it’s one of the most widely used object detection algorithms today!

Check out this code here: YOLO , to get code implementation of the YOLO algorithm, and really see how it detects objects in different scenes and with varying levels of confidence.

--

--

Garima Nishad
Analytics Vidhya

A Machine Learning Research scholar who loves to moonlight as a blogger.