【深度学习】【论文阅读】YOLOv1、v2

最新推荐文章于 2023-05-17 11:55:17 发布

Hanawh

最新推荐文章于 2023-05-17 11:55:17 发布

阅读量312

点赞数

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_36530992/article/details/102776359

版权

YOLO（You Only Look Once）是一种实时目标检测系统，以其高速度和全局理解能力而著称。YOLOv1采用回归方法，以全局视角预测物体，具有良好的泛化能力。YOLOv2通过引入Batch Normalization、高分辨率分类器、Anchor Boxes和多尺度训练等改进策略，提升了检测精度。同时，YOLOv2利用k-means聚类优化先验框，进一步提高了模型的学习效率。

摘要由CSDN通过智能技术生成

YOLOv1

First, YOLO is extremely fast.Since we frame detection as a regression problem we don’t need a complex pipeline.Second, YOLO reasons globally about the image when making predictions. Unlike sliding window and region proposal-based techniques, YOLO sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance.Third, YOLO learns generalizable representations of ob- jects. Since YOLO is highly generalizable it is less likely to break down when applied to new domains or unexpected inputs.

YOLO很快，因为采用回归的方法
YOLO会基于整个图片进行预测
YOLO学到的图片特征更通用，更能适应新的领域

网络架构

在这里插入图片描述

输入图片大小： $448\times 448$
24个卷积层+2个全连接层
采用Leaky ReLU激活函数，最后一层采用线性激活函数
经过卷积层后的输出： $[N, 1024, 7, 7]$
经过全连接层后的输出： $[N, 7 * 7 * 30]$
reshape后： $[N, 7, 7, 30]$

对输出的解释：
所谓7x7是将图片分为了7x7的网格，对应的每个网格负责两个预测框，那么30是由 $(4 + 1) * 2 + 20 得到$ ，4代表 $x_{center},y_{center},w,h)$ ，1代表是否处于被检测物体的置信度，如果没有物体在该框，则值为0，如果有物体在该框，则值的意义为预测框与gt box的IoU，20代表20个类别置信度(一个网格只预测一次类别置信度) $P(class_i|object)$

最低0.47元/天解锁文章

Hanawh

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【深度学习】【论文阅读】YOLOv1、v2

【论文阅读】YOLO三部曲YOLOv1网络架构LossYOLOv2改进策略训练YOLOv1First, YOLO is extremely fast.Since we frame detection as a regression problem we don’t need a complex pipeline.Second, YOLO reasons globally about the ...
复制链接

扫一扫

专栏目录