助教的班级有人问我YOLOv3的boundingbox 和 anchor区别在哪,我大概知道,但不能系统的解释,下去查了点资料和那个学生讨论了下,记录一下。
因为Faster-RCNN中也引入了anchor,然后很自然的想到yolov3的anchor是不是和Faster-RCNN中是一样的用途,结果发现并不是。
看代码是最直观的,我先search了Faster-RCNN关于产生anchor的代码,下载地址:https://github.com/rbgirshick/py-faster-rcnn/blob/781a917b378dbfdedb45b6a56189a31982da1b43/lib/rpn/generate_anchors.py
Faster-RCNN关于产生anchor的代码贴出来如下:
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
import numpy as np
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2**np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in xrange(ratio_anchors.shape[0])])
return anchors
def _whctrs(anchor):
"""
Return width, height, x center, and y center for an anchor (window).
"""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)))
return anchors
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""
Enumerate a set of anchors for each scale wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
if __name__ == '__main__':
import time
t = time.time()
a = generate_anchors()
print time.time() - t
print a
from IPython import embed; embed()
代码的意思简单来说就是实现: 1)初始的框是(0,0,15,15),之后根据_ratio_enum生成三个中心坐标不变(只改变长宽比,分别为0.5,1,2),但是ratio改变的anchors. 如下图:
2)三个不同ratio的anchors再分别_scale_enum生成不同尺寸的anchors,最后产生三种尺度三种比例的9个anchors。如下图:
Faster-RCNN产生的anchor作用用一张图对照原论文可以很容易的看懂,如下图:
16是网络下采样的倍数,把9个anchor放在原图(600x800)上移动,stride为16(x,y方向都是16),最后得到右边红色的图,一共17100个anchor.
YOLOv3的anchor代码(官网:https://pjreddie.com):贴出来如下
import glob
import os
import sys
import xml.etree.ElementTree as ET
import numpy as np
from kmeans import kmeans, avg_iou
# 根文件夹
ROOT_PATH = '/data/DataBase/YOLO_Data/V3_DATA/'
# 聚类的数目
CLUSTERS = 6
# 模型中图像的输入尺寸,默认是一样的
SIZE = 640
# 需要加载yolo训练数据和lable
def load_dataset(path):
jpegimages = os.path.join(path, 'JPEGImages')
if not os.path.exists(jpegimages):
print('no JPEGImages folders, program abort')
sys.exit(0)
labels_txt = os.path.join(path, 'labels')
if not os.path.exists(labels_txt):
print('no labels folders, program abort')
sys.exit(0)
label_file = os.listdir(labels_txt)
print('label count: {}'.format(len(label_file)))
dataset = []
for label in label_file:
with open(os.path.join(labels_txt, label), 'r') as f:
txt_content = f.readlines()
for line in txt_content:
line_split = line.split(' ')
roi_with = float(line_split[len(line_split)-2])
roi_height = float(line_split[len(line_split)-1])
if roi_with == 0 or roi_height == 0:
continue
dataset.append([roi_with, roi_height])
# print([roi_with, roi_height])
return np.array(dataset)
data = load_dataset(ROOT_PATH)
out = kmeans(data, k=CLUSTERS) #对训练样本聚类
print(out)
print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
print("Boxes:\n {}-{}".format(out[:, 0] * SIZE, out[:, 1] * SIZE))
ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()
print("Ratios:\n {}".format(sorted(ratios)))
可以看到yolov3是直接对你的训练样本进行k-means聚类,由训练样本得来的先验框(anchor),也就是对样本聚类的结果。