YOLOv1
First, YOLO is extremely fast.Since we frame detection as a regression problem we don’t need a complex pipeline.Second, YOLO reasons globally about the image when making predictions. Unlike sliding window and region proposal-based techniques, YOLO sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance.Third, YOLO learns generalizable representations of ob- jects. Since YOLO is highly generalizable it is less likely to break down when applied to new domains or unexpected inputs.
- YOLO很快,因为采用回归的方法
- YOLO会基于整个图片进行预测
- YOLO学到的图片特征更通用,更能适应新的领域
网络架构
- 输入图片大小: 448 × 448 448\times 448 448×448
- 24个卷积层+2个全连接层
- 采用
Leaky ReLU
激活函数,最后一层采用线性激活
函数 - 经过卷积层后的输出: [ N , 1024 , 7 , 7 ] [N,1024,7,7] [N,1024,7,7]
- 经过全连接层后的输出: [ N , 7 ∗ 7 ∗ 30 ] [N,7*7*30] [N,7∗7∗30]
- reshape后: [ N , 7 , 7 , 30 ] [N,7,7,30] [N,7,7,30]
对输出的解释:
所谓7x7
是将图片分为了7x7
的网格,对应的每个网格负责两个预测框,那么30
是由 ( 4 + 1 ) ∗ 2 + 20 得 到 (4+1)*2+20得到 (4+1)∗2+20得到,4代表 ( x c e n t e r , y c e n t e r , w , h ) (x_{center},y_{center},w,h) (xcenter,ycenter,w,h),1代表是否处于被检测物体的置信度
,如果没有物体在该框,则值为0,如果有物体在该框,则值的意义为预测框与gt box的IoU,20代表20个类别置信度(一个网格只预测一次类别置信度) P ( c l a s s i ∣ o b j e c t ) P(class_i|object) P(class