YOLOv1:You Only Look Once: Unified, Real-Time Object Detection
YOLOv2:YOLO9000: Better, Faster, Stronger
YOLOv3:YOLOv3: An Incremental Improvement
YOLO is currently the state-of-the-art network in object detection, and here is how it works, as well as its evolution.
1 YOLO v1
The name YOLO comes form the abbreviation from “You Only Look Once”, which indicates this network only refer to the input image once.
1.1 Previous works
Previous networks propose potential bounding boxes and run a classifier on it. After that, they refine the results to eliminate duplicate objects, rescore and etc.
1.2 YOLO’s method
The author reframe object detection task to a regression task, where coordinate of bounding boxes and class possibilities comes straight form pixels. Here is how it is designed:
- Divides the input image into an S × S S \times S S×S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.
- Each grid cell predicts B B B bounding boxes and confidence scores for those boxes. Each bounding box consists of 5 predictions: x , y , w , h x, y, w, h x,y,w