YOLO源码解析,有点用,可惜是tensorflow写的,还是想多接触pytorch的代码

YOLO是基于深度学习的端到端的实时目标检测系统。与大部分目标检测与识别方法(比如Fast R-CNN)将目标识别任务分类目标区域预测和类别预测等多个流程不同,YOLO将目标区域预测和目标类别预测整合于单个神经网络模型中,实现在准确率较高的情况下快速目标检测与识别,更加适合现场应用环境。详情请参见: YOLO:实时快速目标检测YOLO升级版:YOLOv2和YOLO9000解析。本文将对YOLO的tensorflow实现代码进行详解。本文使用的YOLO源码来源于 hizhangp/ yolo_tensorflow

本文结构如下:一,YOLO代码概况;二,train解析;三,test概括;四,总结

1 YOLO代码概况

源代码文件构成如图1-1所示。train.py为训练代码,test.py为测试代码,其它文件夹内的代码为设定参数,建立网络,读取数据等辅助代码。

图1-1 YOLO源代码文件夹

2 train解析

从main()方法,首先读取参数;其次建立YOLONet;然后读取训练数据;最后进行训练。


2.1 建立YOLONet

YOLONet的建立是通过 yolo文件夹中的yolo_net.py文件的代码实现了。yolo_net.py定义了YOLONet类,该类包含了网络初始化(__init__()),建立网络(build_networks())和loss函数(loss_layer())等方法。

网络的所有初始化参数包含于__init__()方法之中。

    def __init__(self, phase):
        self.weights_file = cfg.WEIGHTS_FILE#权重文件
        self.classes = cfg.CLASSES#类别
        self.num_class = len(self.classes)#类别数量,值为20
        self.image_size = cfg.IMAGE_SIZE#图像尺寸,值为448
        self.cell_size = cfg.CELL_SIZE#cell尺寸,值为7
        self.boxes_per_cell = cfg.BOXES_PER_CELL#每个grid cell负责的boxes,默认为2
        self.output_size = (self.cell_size * self.cell_size) * \
            (self.num_class + self.boxes_per_cell * 5)#输出尺寸
        self.scale = 1.0 * self.image_size / self.cell_size
        self.boundary1 = self.cell_size * self.cell_size * self.num_class#7×7×20
        self.boundary2 = self.boundary1 + self.cell_size * \
            self.cell_size * self.boxes_per_cell#7×7×20+7×7×2
        self.object_scale = cfg.OBJECT_SCALE#值为1
        self.noobject_scale = cfg.NOOBJECT_SCALE#值为1
        self.class_scale = cfg.CLASS_SCALE#值为2.0
        self.coord_scale = cfg.COORD_SCALE#值为5.0
        self.learning_rate = cfg.LEARNING_RATE#学习速率LEARNING_RATE = 0.0001
        self.batch_size = cfg.BATCH_SIZE#BATCH_SIZE = 45
        self.alpha = cfg.ALPHA#ALPHA = 0.1
        self.disp_console = cfg.DISP_CONSOLE#DISP_CONSOLE = False
        self.phase = phase#train or test
        self.collection = []#用于储存网络参数
        self.offset = np.transpose(np.reshape(np.array(
            [np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell),
            (self.boxes_per_cell, self.cell_size, self.cell_size)), (1, 2, 0))#偏置
    <span class="bp">self</span><span class="o">.</span><span class="n">build_networks</span><span class="p">()</span>

网络建立是通过build_networks()方法实现的,网络由卷积层-pooling层和全连接层组成,详细结构请参见源代码和YOLO:实时快速目标检测。网络接受输入维度为([None, 448, 448, 3]),输出维度为([None,1470])。

loss函数代码的关键,loss函数定义为:

(参加: YOLO:实时快速目标检测

loss函数是通过loss_layer()实现,代码注释对各个变量的shape进行了注释,结果如下。

#计算iou
def calc_iou(self, boxes1, boxes2):
“”“calculate ious
Args:
boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)
boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)
Return:
iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]
“””
boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,
boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,
boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])
boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])
    <span class="n">boxes2</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">pack</span><span class="p">([</span><span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">,</span>
                      <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">]</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">,</span>
                      <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">,</span>
                      <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">]</span> <span class="o">/</span> <span class="mi">2</span><span class="p">])</span>
    <span class="n">boxes2</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">boxes2</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">0</span><span class="p">])</span>

    <span class="c1"># calculate the left up point &amp; right down point</span>
    <span class="n">lu</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">maximum</span><span class="p">(</span><span class="n">boxes1</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:</span><span class="mi">2</span><span class="p">],</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:</span><span class="mi">2</span><span class="p">])</span>
    <span class="n">rd</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">minimum</span><span class="p">(</span><span class="n">boxes1</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">:],</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">:])</span>

    <span class="c1"># intersection</span>
    <span class="n">intersection</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">maximum</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">rd</span> <span class="o">-</span> <span class="n">lu</span><span class="p">)</span>
    <span class="n">inter_square</span> <span class="o">=</span> <span class="n">intersection</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">intersection</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">]</span>

    <span class="c1"># calculate the boxs1 square and boxs2 square</span>
    <span class="n">square1</span> <span class="o">=</span> <span class="p">(</span><span class="n">boxes1</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">-</span> <span class="n">boxes1</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">])</span> <span class="o">*</span> \
        <span class="p">(</span><span class="n">boxes1</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">]</span> <span class="o">-</span> <span class="n">boxes1</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">])</span>
    <span class="n">square2</span> <span class="o">=</span> <span class="p">(</span><span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">-</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">])</span> <span class="o">*</span> \
        <span class="p">(</span><span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">]</span> <span class="o">-</span> <span class="n">boxes2</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">])</span>

    <span class="n">union_square</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">maximum</span><span class="p">(</span><span class="n">square1</span> <span class="o">+</span> <span class="n">square2</span> <span class="o">-</span> <span class="n">inter_square</span><span class="p">,</span> <span class="mf">1e-10</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">tf</span><span class="o">.</span><span class="n">clip_by_value</span><span class="p">(</span><span class="n">inter_square</span> <span class="o">/</span> <span class="n">union_square</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>

<span class="c1">#loss函数</span>
<span class="c1">#idx=33,predicts为fc_32,labels shape为(45, 7, 7, 25)</span>
<span class="c1">#self.loss = self.loss_layer(33, self.fc_32, self.labels)</span>
<span class="k">def</span> <span class="nf">loss_layer</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">idx</span><span class="p">,</span> <span class="n">predicts</span><span class="p">,</span> <span class="n">labels</span><span class="p">):</span>

    <span class="c1">#将网络输出分离为类别和定位以及box大小,输出维度为7*7*20+7*7*2+7*7*2*4=1470</span>
    <span class="c1">#类别,shape为(45, 7, 7, 20)</span>
    <span class="n">predict_classes</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">predicts</span><span class="p">[:,</span> <span class="p">:</span><span class="bp">self</span><span class="o">.</span><span class="n">boundary1</span><span class="p">],</span>
        <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">num_class</span><span class="p">])</span>
    <span class="c1">#定位,shape为(45, 7, 7, 2)</span>
    <span class="n">predict_scales</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">predicts</span><span class="p">[:,</span> <span class="bp">self</span><span class="o">.</span><span class="n">boundary1</span><span class="p">:</span><span class="bp">self</span><span class="o">.</span><span class="n">boundary2</span><span class="p">],</span>
        <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">boxes_per_cell</span><span class="p">])</span>
    <span class="c1">##box大小,长宽等 shape为(45, 7, 7, 2, 4)</span>
    <span class="n">predict_boxes</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">predicts</span><span class="p">[:,</span> <span class="bp">self</span><span class="o">.</span><span class="n">boundary2</span><span class="p">:],</span>
        <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">boxes_per_cell</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span>

    <span class="c1">#label的类别结果,shape为(45, 7, 7, 1)</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">labels</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">],</span>
        <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
    <span class="c1">#label的定位结果,shape为(45, 7, 7, 1, 4)</span>
    <span class="n">boxes</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">labels</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">],</span>
        <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span>
    <span class="c1">#label的大小结果,shapewei (45, 7, 7, 2, 4)</span>
    <span class="n">boxes</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">tile</span><span class="p">(</span><span class="n">boxes</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">boxes_per_cell</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">image_size</span>
    
    <span class="c1">#shape 为(45, 7, 7, 20)</span>
    <span class="n">classes</span> <span class="o">=</span> <span class="n">labels</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">5</span><span class="p">:]</span>

    <span class="c1">#offset shape为(7, 7, 2)</span>
    <span class="n">offset</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">offset</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
    <span class="c1">#shape为 (1,7, 7, 2)</span>
    <span class="n">offset</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span>
        <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">boxes_per_cell</span><span class="p">])</span>
    <span class="c1">#shape为(45, 7, 7, 2)</span>
    <span class="n">offset</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">tile</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch_size</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
    <span class="c1">#shape为(4, 45, 7, 7, 2)</span>
    <span class="n">predict_boxes_tran</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">pack</span><span class="p">([(</span><span class="n">predict_boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">offset</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span>
                                  <span class="p">(</span><span class="n">predict_boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)))</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span><span class="p">,</span>
                                  <span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">predict_boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]),</span>
                                  <span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">predict_boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">])])</span>
    <span class="c1">#shape为(45, 7, 7, 2, 4)</span>
    <span class="n">predict_boxes_tran</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">predict_boxes_tran</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">0</span><span class="p">])</span>

    <span class="c1">#shape为(45, 7, 7, 2)</span>
    <span class="n">iou_predict_truth</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">calc_iou</span><span class="p">(</span><span class="n">predict_boxes_tran</span><span class="p">,</span> <span class="n">boxes</span><span class="p">)</span>

    <span class="c1"># calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]</span>
    <span class="c1">#shape为 (45, 7, 7, 1)</span>
    <span class="n">object_mask</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_max</span><span class="p">(</span><span class="n">iou_predict_truth</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">keep_dims</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="c1">#shape为(45, 7, 7, 2)</span>
    <span class="n">object_mask</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">cast</span><span class="p">((</span><span class="n">iou_predict_truth</span> <span class="o">&gt;=</span> <span class="n">object_mask</span><span class="p">),</span> <span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span> <span class="o">*</span> <span class="n">response</span>
    <span class="c1"># mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])</span>

    <span class="c1"># calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]</span>
    <span class="c1">#shape为(45, 7, 7, 2)</span>
    <span class="n">noobject_mask</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">ones_like</span><span class="p">(</span><span class="n">object_mask</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span> <span class="o">-</span> <span class="n">object_mask</span>
    

    <span class="c1">#shape为(4, 45, 7, 7, 2)</span>
    <span class="n">boxes_tran</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">pack</span><span class="p">([</span><span class="n">boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span> <span class="o">-</span> <span class="n">offset</span><span class="p">,</span>
                          <span class="n">boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">cell_size</span> <span class="o">-</span> <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)),</span>
                          <span class="n">tf</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">]),</span>
                          <span class="n">tf</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">boxes</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">])])</span>
    <span class="p">(</span><span class="mi">45</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
    <span class="n">boxes_tran</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">boxes_tran</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">0</span><span class="p">])</span>

    <span class="c1"># class_loss</span>
    <span class="n">class_loss</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">response</span> <span class="o">*</span> <span class="p">(</span><span class="n">predict_classes</span> <span class="o">-</span> <span class="n">classes</span><span class="p">)),</span>
        <span class="n">reduction_indices</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span> <span class="n">name</span><span class="o">=</span><span class="s1">'class_loss'</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">class_scale</span>

    <span class="c1"># object_loss</span>
    <span class="n">object_loss</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">object_mask</span> <span class="o">*</span> <span class="p">(</span><span class="n">predict_scales</span> <span class="o">-</span> <span class="n">iou_predict_truth</span><span class="p">)),</span>
        <span class="n">reduction_indices</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span> <span class="n">name</span><span class="o">=</span><span class="s1">'object_loss'</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">object_scale</span>

    <span class="c1"># noobject_loss</span>
    <span class="n">noobject_loss</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">noobject_mask</span> <span class="o">*</span> <span class="n">predict_scales</span><span class="p">),</span>
        <span class="n">reduction_indices</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span> <span class="n">name</span><span class="o">=</span><span class="s1">'noobject_loss'</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">noobject_scale</span>

    <span class="c1"># coord_loss</span>
    <span class="c1">#shape 为 (45, 7, 7, 2, 1)</span>
    <span class="n">coord_mask</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">object_mask</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
    
    <span class="c1">#shape为(45, 7, 7, 2, 4)</span>
    <span class="n">boxes_delta</span> <span class="o">=</span> <span class="n">coord_mask</span> <span class="o">*</span> <span class="p">(</span><span class="n">predict_boxes</span> <span class="o">-</span> <span class="n">boxes_tran</span><span class="p">)</span>
    <span class="n">coord_loss</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">boxes_delta</span><span class="p">),</span>
        <span class="n">reduction_indices</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]),</span> <span class="n">name</span><span class="o">=</span><span class="s1">'coord_loss'</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">coord_scale</span>

    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">scalar</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/class_loss'</span><span class="p">,</span> <span class="n">class_loss</span><span class="p">)</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">scalar</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/object_loss'</span><span class="p">,</span> <span class="n">object_loss</span><span class="p">)</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">scalar</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/noobject_loss'</span><span class="p">,</span> <span class="n">noobject_loss</span><span class="p">)</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">scalar</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/coord_loss'</span><span class="p">,</span> <span class="n">coord_loss</span><span class="p">)</span>

    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/boxes_delta_x'</span><span class="p">,</span> <span class="n">boxes_delta</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">0</span><span class="p">])</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/boxes_delta_y'</span><span class="p">,</span> <span class="n">boxes_delta</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">1</span><span class="p">])</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/boxes_delta_w'</span><span class="p">,</span> <span class="n">boxes_delta</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">2</span><span class="p">])</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/boxes_delta_h'</span><span class="p">,</span> <span class="n">boxes_delta</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">])</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">histogram</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">phase</span> <span class="o">+</span> <span class="s1">'/iou'</span><span class="p">,</span> <span class="n">iou_predict_truth</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">class_loss</span> <span class="o">+</span> <span class="n">object_loss</span> <span class="o">+</span> <span class="n">noobject_loss</span> <span class="o">+</span> <span class="n">coord_loss</span></code></pre></div><br><p>2.2 读取数据</p><p>通过utils文件夹中的pascal_voc.py文件读取数据。<br></p><p>2.3 训练</p><p>模型训练包含于train()方法之中。训练部分只需看懂了初始化参数,整个结构就很清晰了。值得注意的地方是在训练过程中,对变量采用了数平均数(exponential moving average (EMA))来提高训练性能,详情见代码注释。同时,运行train.py时,建议将batch_size改小一些(原参数batch size为45,第一次运行没注意,死机了)。</p><div class="highlight"><pre><code class="language-python"><span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">net</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">net</span> <span class="o">=</span> <span class="n">net</span><span class="c1">#网络</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="c1">#数据</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">weights_file</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">WEIGHTS_FILE</span><span class="c1">#网络权重</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">max_iter</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">MAX_ITER</span><span class="c1">#最大迭代数目</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">initial_learning_rate</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">LEARNING_RATE</span><span class="c1">#学习速率LEARNING_RATE = 0.0001</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">decay_steps</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">DECAY_STEPS</span><span class="c1">#速率延迟步数DECAY_STEPS = 30000</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">decay_rate</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">DECAY_RATE</span><span class="c1">#延迟率DECAY_RATE = 0.1</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">staircase</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">STAIRCASE</span><span class="c1">#???STAIRCASE = True</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">summary_iter</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">SUMMARY_ITER</span><span class="c1">#SUMMARY_ITER = 10</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">save_iter</span> <span class="o">=</span> <span class="n">cfg</span><span class="o">.</span><span class="n">SAVE_ITER</span><span class="c1">#多少轮保存SAVE_ITER = 1000</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">output_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">OUTPUT_DIR</span><span class="p">,</span>
        <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s1">'%Y_%m_</span><span class="si">%d</span><span class="s1">_%H_%M'</span><span class="p">))</span><span class="c1">#输出路径</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">output_dir</span><span class="p">):</span>
        <span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">output_dir</span><span class="p">)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">save_cfg</span><span class="p">()</span>

    <span class="c1">#tf.get_variable 和tf.Variable不同的一点是,前者拥有一个变量检查机制,</span>
    <span class="c1">#会检测已经存在的变量是否设置为共享变量,如果已经存在的变量没有设置为共享变量,</span>
    <span class="c1">#TensorFlow 运行到第二个拥有相同名字的变量的时候,就会报错。</span>

self.global_step = tf.get_variable(‘global_step’, [],
initializer=tf.constant_initializer(0), trainable=False)

    <span class="c1">#速率延迟</span>
    <span class="c1">#tf.train.exponential_decay(learning_rate, global_step, </span>
    <span class="c1">#                            decay_steps, decay_rate, staircase=False, name=None)</span>
    <span class="c1">#Applies exponential decay to the learning rate.</span>
    <span class="c1">#decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">learning_rate</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">exponential_decay</span><span class="p">(</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">initial_learning_rate</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">global_step</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">decay_steps</span><span class="p">,</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">decay_rate</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">staircase</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'learning_rate'</span><span class="p">)</span>
    
    <span class="c1">#优化器</span>
    <span class="c1">#self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.net.loss, global_step=self.global_step)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">optimizer</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">GradientDescentOptimizer</span><span class="p">(</span>
        <span class="n">learning_rate</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">learning_rate</span><span class="p">)</span><span class="o">.</span><span class="n">minimize</span><span class="p">(</span>
          <span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">loss</span><span class="p">,</span><span class="n">global_step</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">global_step</span><span class="p">)</span>
    <span class="c1">#</span>

    
    
    <span class="c1">#指数平均数指标,Some training algorithms, such as GradientDescent and Momentum </span>
    <span class="c1">#often benefit from maintaining a moving average of variables during optimization.</span>
    <span class="c1">#Using the moving averages for evaluations often improve results significantly.</span>
    <span class="c1">#An exponential moving average (EMA) is a type of moving average that is similar to </span>
    <span class="c1">#a simple moving average, except that more weight is given to the latest data</span>
    <span class="c1">#class tf.train.ExponentialMovingAverage</span>
    <span class="c1">#Maintains moving averages of variables by employing and exponential decay.</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">ema</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">ExponentialMovingAverage</span><span class="p">(</span><span class="n">decay</span><span class="o">=</span><span class="mf">0.9999</span><span class="p">)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">averages_op</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">ema</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">trainable_variables</span><span class="p">())</span>
    
    <span class="c1"># Create an op that will update the moving averages after each training</span>
    <span class="c1"># step.  This is what we will use in place of the usual training op.</span>
    <span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="bp">self</span><span class="o">.</span><span class="n">optimizer</span><span class="p">]):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">train_op</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">averages_op</span><span class="p">)</span>

    <span class="c1">#summary</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">summary_op</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">merge_all</span><span class="p">()</span>
    <span class="c1">#saver</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">saver</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">train</span><span class="o">.</span><span class="n">Saver</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">collection</span><span class="p">,</span> <span class="n">max_to_keep</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
    <span class="c1">#writer</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">writer</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">summary</span><span class="o">.</span><span class="n">FileWriter</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">output_dir</span><span class="p">,</span> <span class="n">flush_secs</span><span class="o">=</span><span class="mi">60</span><span class="p">)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">ckpt_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">output_dir</span><span class="p">,</span> <span class="s1">'save.ckpt'</span><span class="p">)</span>

    <span class="bp">self</span><span class="o">.</span><span class="n">sess</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">Session</span><span class="p">()</span>
    <span class="c1">#如果tensorflow版本未升级,会报错,需要改为tf.initialize_all_variables()</span>
    <span class="c1">#self.sess.run(tf.initialize_all_variables())#初始化</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">global_variables_initializer</span><span class="p">())</span>

    <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">weights_file</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
        <span class="k">print</span> <span class="s1">'Restoring weights from: '</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">weights_file</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">saver</span><span class="o">.</span><span class="n">restore</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sess</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">weights_file</span><span class="p">)</span>

    <span class="bp">self</span><span class="o">.</span><span class="n">writer</span><span class="o">.</span><span class="n">add_graph</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sess</span><span class="o">.</span><span class="n">graph</span><span class="p">)</span></code></pre></div><p>3 test概括</p><p>test.py完成读取训练好的网络权重,检测目标,并画出目标所在位置。代码和训练部分类似,略过。要运行test,首先需要下载文章原作者训练好的模型<a href="https://link.zhihu.com/?target=https%3A//drive.google.com/file/d/0B2JbaJSrWLpza08yS2FSUnV2dlE/view%3Fusp%3Dsharing" class=" wrap external" target="_blank" rel="nofollow noreferrer" data-za-detail-view-id="1043">YOLO_small</a>(貌似需要翻墙)。其次,源代码中有一处小bug,直接运行会报错。</p><div class="highlight"><pre><code class="language-python"><span class="n">net_output</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">fc_32</span><span class="p">,</span> <span class="n">feed_dict</span><span class="o">=</span><span class="p">{</span><span class="bp">self</span><span class="o">.</span><span class="n">net</span><span class="o">.</span><span class="n">images</span><span class="p">:</span> <span class="n">inputs</span><span class="p">})</span>

需要改为

net_output = self.sess.run(self.net.fc_32, feed_dict={self.net.x: inputs})

运行结果如图3-1所示,可以看出YOLO能成功识别人和狗,却识别不了马,作者在后续的文章中对YOLO进行了,使之能识别更多的种类,详见YOLO升级版:YOLOv2和YOLO9000解析

图3-1 YOLO检测结果一
或者其它图片,如图3-2所示:

图3-2 YOLO检测结果二

4 总结

YOLO是基于深度学习的端到端的实时目标检测系统,主要的特点是速度非常快,同时还有继续提升精确度的潜力。本文对YOLO的tensorflow实现代码进行了详解,该代码在理解了文章后就很简单。其中涉及到的tensorflow知识有以下几点:

一,tf.get_variable 和tf.Variable的差异。差异点点是,前者拥有一个变量检查机制,会检测已经存在的变量是否设置为共享变量,如果已经存在的变量没有设置为共享变量,TensorFlow 运行到第二个拥有相同名字的变量的时候,就会报错。

二,学习速率延迟的实现。

tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None) #decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)

三, 采用指数平均数(exponential moving average (EMA))提高梯度下降(exponential moving average (EMA))训练方法的效果。

self.ema = tf.train.ExponentialMovingAverage(decay=0.9999)
self.averages_op = self.ema.apply(tf.trainable_variables())

with tf.control_dependencies([self.optimizer]):
self.train_op = tf.group(self.averages_op)

四,tf.pack()函数。

tf.pack(values, name=textquotesingle {}packtextquotesingle {})
#该函数的功能等同于np.asarray
tf.pack([x, y, z]) = np.asarray([x, y, z])

五,tf.tile()函数。该函数在某一维度上进行复制。

tf.tile(input, multiples, name=None)
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: YOLO(You Only Look Once)是一种实时目标检测算法,它以较快的速度在图像或视频中检测和定位物体。YOLO算法采用了全卷积神经网络结构,将图像分割为网格,并在每个网格中预测多个边界框和类别得分。 YOLO源码解析PDF是指对YOLO算法的实现细节进行分析和解读的文档。这个文档可能包含YOLO算法的整体结构、网络架构以及训练和测试过程的详细说明。 在解析YOLO源码时,可能会介绍YOLO的网络结构,如何进行前向传播和反向传播,以及如何计算损失函数。此外,文档还可能讨论YOLO算法中使用的各种技巧和改进,例如使用Anchor Box、多尺度检测和类别平衡等。 在解析YOLO源码的过程中,还可能讲解YOLO算法中一些关键的模块和技术,如Darknet网络结构、卷积层、池化层以及非极大值抑制等。 了解YOLO源码的设计和实现细节,有助于我们深入理解YOLO算法的原理和优缺点,以及在实际应用中如何进行参数调整和算法优化。 需要注意的是,YOLO算法的源码解析可能比较复杂,需要具备一定的计算机视觉和深度学习知识才能进行理解和分析。因此,对于初学者来说,可能需要花费一定的时间和精力才能完全理解和掌握。 ### 回答2: "YOLO源码解析"是一本关于YOLO(You Only Look Once)目标检测算法的PDF书籍。YOLO是一种非常流行的实时目标检测算法,具有快速和高准确率的特点。 该书籍涵盖了YOLO算法的完整源码解析过程,包括算法的核心思、实现细节和技术原理。通过学习这本书籍,读者可以深入了解YOLO算法的设计目标、算法流程、网络结构和训练过程。 书籍首先介绍了YOLO算法的基本原理,即将目标检测问题转化为一个回归问题,并使用单个神经网络来同时进行目标分类和边界框回归。然后详细解释了YOLO算法中所使用的网络结构和各个组件的作用,包括卷积层、池化层、全连接层等。 接下来,该书籍对YOLO算法的具体实现进行了解析。它详细介绍了如何对输入图像进行预处理和数据增强,以及如何训练网络模型和优化损失函数。此外,书籍还讨论了如何处理不同尺度和不同类别的目标,并如何自适应地调整检测框的大小和位置。 除了算法的实现细节,该书籍还涉及了YOLO算法的一些改进和扩展,如YOLOv2和YOLOv3。它介绍了这些改进算法的设计思路和性能提升,并给出了实验结果和比较分析。 总的来说,“YOLO源码解析”这本PDF书籍是一本深入解析YOLO目标检测算法的权威指南。通过阅读此书,读者可以系统地了解YOLO算法的原理、源码实现和改进方法,为进一步的研究和应用打下坚实的基础。 ### 回答3: YOLO源码解析PDF是一份解析YOLO算法源代码的文件,目的是帮助读者深入理解YOLO算法的原理和实现细节。 首先,YOLO(You Only Look Once)是一种实时目标检测算法,它的主要特点是将目标检测任务转化为一个回归问题,通过一个神经网络模型直接在图像上预测目标的位置和类别。 该PDF文件首先会介绍YOLO算法的整体结构和工作原理,包括输入图像的预处理、网络的构建以及输出结果的解码过程。它会详细解释YOLO网络是如何通过卷积和池化层来提取图像的特征,并将这些特征映射到不同尺度的特征图上。同时,该文件还会讲解如何使用anchors来回归预测框的位置。 另外,该PDF还会对YOLO源码的实现细节进行深入解析,包括网络的结构定义、前向传播过程、损失函数的定义和反向传播过程。它会讲解YOLO如何通过多个尺度的特征图来检测不同尺寸的目标,并如何利用置信度来判断预测框的置信度。 此外,该文件还会介绍YOLO源码中一些重要的技术细节,比如数据增强、类别的处理、非极大值抑制等。这些细节对于理解算法的性能提升和调优具有重要意义。 通过对YOLO源码的深入解析,读者可以更全面地理解该算法的原理和实现方法,并有助于读者在实际应用中根据自身需求进行算法改进和优化。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值