Faster Rcnn 逐步实现

最新推荐文章于 2024-04-21 15:46:09 发布

AsEver_

最新推荐文章于 2024-04-21 15:46:09 发布

阅读量3.4k

点赞数

文章标签： python 计算机视觉深度学习

本文链接：https://blog.csdn.net/weixin_45464312/article/details/123359088

版权

Faster Rcnn 逐步实现

一.参数配置

voc_data_dir = './'   # 数据路径

二、得到图片和边界框的信息

2.1原始图片：

原始图片

图片尺寸：（3， 375， 500）

for obj in anno.findall('object'):
    difficult.append(int(obj.find('difficult').text))
    bbox_anno = obj.find('bndbox')
    bbox.append([int(bbox_anno.find(tag).text) - 1 for tag in ('ymin', 'xmin', 'ymax', 'xmax')])
    name = obj.find('name').text.lower().strip()
    label.append(VOC_BBOX_LABEL_NAMES.index(name))

difficult [0, 0, 1, 0, 1]
bbox [[210, 262, 338, 323], [263, 164, 371, 252], [243, 4, 373, 66], [193, 240, 298, 294], [185, 276, 219, 311]]
label [8, 8, 8, 8, 8]
name chair

边界框：左上角（xmin，ymin），右下角（xmax，ymax）

初始边界框：

在这里插入图片描述

2.2对图片做缩放和正则化和水平翻转处理：图片尺寸(3, 600, 800)

# 图片缩放预处理
min_size = 600
max_size = 1000
scale1 = min_size/ min(H, W)
scale2 = max_size / max(H, W)
scale = min(scale1, scale2)
img = img / 255.
img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect',anti_aliasing=False)
normalize = tvtsf.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
img = normalize(t.from_numpy(img))
img = img.numpy()
_, o_H, o_W = img.shape
# 记录放缩比
scale = o_H / H
# 重新调整边界框大小
bbox = bbox.copy()
y_scale = float(o_H) / H    # 高的缩放比例
x_scale = float(o_W) / W    # 宽的缩放比例
bbox[:, 0] = y_scale * bbox[:, 0]
bbox[:, 2] = y_scale * bbox[:, 2]
bbox[:, 1] = x_scale * bbox[:, 1]
bbox[:, 3] = x_scale * bbox[:, 3]
# 图片随机翻转
img = img[:, :, ::-1]
# 边界框翻转
bbox = bbox.copy()
x_max = o_W - bbox[:, 1]
x_min = o_W - bbox[:, 3]
bbox[:, 1] = x_min
bbox[:, 3] = x_max

在这里插入图片描述

图片与处理结束。

三得到模型

vgg16

3.1删除dropout，冻结前四个卷积层，得到特征提取层和分类层

model = vgg16(not None)
features = list(model.features)[:30]
classifier = list(model.classifier)
del classifier[6]
del classifier[5]
del classifier[2]
# 冻结前四个卷积层
for layer in features[:10]:
    for p in layer.parameters():
        p.requires_grad = False
# 特征提取层
extractor = nn.Sequential(*features)
# 分类层
classifier = classifier

3.2 rpn网络提取候选框

3.2.1生成一组边界框 anchor_base 共9个，每个边界框从图片的左上角开始

ratios和anchor_scales，面积是什么

# 生成anchors
base_size = 16
ratios = [0.5, 1, 2]    # 比例
anchor_scales = [8, 16, 32] # 
py = base_size / 2.
px = base_size / 2.
anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4), dtype=np.float32)
# 兼容python2和python3，和range方法一样
for i in six.moves.range(len(ratios)):
    for j in six.moves.range(len(anchor_scales)):
        h = base_size * anchor_scales[j] * np.sqrt(ratios[i])       # sqrt  平方根
        w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

        index = i * len(anchor_scales) + j
        # 得到anchor的坐标
        anchor_base[index, 0] = py - h / 2.
        anchor_base[index, 1] = px - w / 2.
        anchor_base[index, 2] = py + h / 2.
        anchor_base[index, 3] = px + w / 2.

array([[ -37.254833,  -82.50967 ,   53.254833,   98.50967 ],
       [ -82.50967 , -173.01933 ,   98.50967 ,  189.01933 ],
       [-173.01933 , -354.03867 ,  189.01933 ,  370.03867 ],
       [ -56.      ,  -56.      ,   72.      ,   72.      ],
       [-120.      , -120.      ,  136.      ,  136.      ],
       [-248.      , -248.      ,  264.      ,  264.      ],
       [ -82.50967 ,  -37.254833,   98.50967 ,   53.254833],
       [-173.01933 ,  -82.50967 ,  189.01933 ,   98.50967 ],
       [-354.03867 , -173.01933 ,  370.03867 ,  189.01933 ]],
      dtype=float32)

在这里插入图片描述

3.2.2 图片尺寸 torch.Size([1, 3, 600, 800])，将图片放入特征提取层，得到特征图尺寸：torch.Size([1, 512, 37, 50])

features = extractor(img_)

总共有512个特征图，每个特征图的尺寸为37x50：

在这里插入图片描述

3.2.3 生成一系列anchor

feat_stride为什么是16

对于每张37x50的特征图，每个像素放置一组anchor，所以最终得到的anchor数就是(37x50x9)

shift_y = np.arange(0, hh * feat_stride, feat_stride) # array([  0,  16,  32,  48,  64,  80,  96, 
shift_x = np.arange(0, ww * feat_stride, feat_stride)   
shift_x, shift_y = np.meshgrid(shift_x, shift_y)    # 转换成二维的矩阵坐标  (37, 50)
# 沿着x轴连接， ravel和flatten作用一样 伸平数组
shift = np.stack((shift_y.ravel(), shift_x.ravel(), shift_y.ravel(), shift_x.ravel()), axis=1)  # (1850, 4)
A = anchor_base.shape[0]    # (9)
K = shift.shape[0]      # (1850)
anchor = anchor_base.reshape((1, A, 4)) + shift.reshape((1, K, 4)).transpose((1, 0, 2)) # (1850, 9, 4)
anchor = anchor.reshape((K * A), 4).astype(np.float32)      # (16650, 4)

现在共得到了16650个anchor，

3.2.4 rpn网络分两步位置回归和二分类判断是否正样本

n_anchor = anchor_base.shape[0]
# rpn的网络架构
# 特征图先经过一个卷积层
conv1 = nn.Conv2d(512, 512, 3, 1, 1)
# 判断是否为正样本 每个框类别有两种
score = nn.Conv2d(512, n_anchor * 2, 1, 1, 0)
# 位置回归  每个anchor四个坐标
loc = nn.Conv2d(512, n_anchor * 4, 1, 1, 0)
normal_init(conv1, 0, 0.01)
normal_init(score, 0, 0.01)
normal_init(loc, 0, 0.01)

n_anchor = anchor.shape[0] // (hh * ww)     # 9
h = F.relu(conv1(features))       # torch.Size([1, 512, 37, 50])
# 位置
rpn_locs = loc(h)      # torch.Size([1, 36, 37, 50])
rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4) # torch.Size([1, 16650, 4])
# 是否正样本
rpn_scores = score(h)   # torch.Size([1, 18, 37, 50])
rpn_scores = rpn_scores.permute(0, 2, 3, 1).contiguous()  # torch.Size([1, 37, 50, 18])
# 经过softmax得到和为1的概率值
rpn_softmax_scores = F.softmax(rpn_scores.view(n, hh, ww, n_anchor, 2), dim=4)  # torch.Size([1, 37, 50, 9, 2])
rpn_fg_scores = rpn_softmax_scores[:, :, :, :, 1].contiguous()  # torch.Size([1, 37, 50, 9])
rpn_fg_scores = rpn_fg_scores.view(n, -1)   # torch.Size([1, 16650])
rpn_scores = rpn_scores.view(n, -1, 2)      # torch.Size([1, 16650, 2])

在这里插入图片描述

此时，通过loc卷积层得到的rpn_locs再reshape之后得到的维度是([1, 16650, 4])，也就是对应16650个anchor的偏移量

而rpn_scores经过reshape之后得到的维度是[1, 16650, 2]，对应16650个anchor是否是真样本的概率

疑问：为什么经过softmax之后的第二个值对应前景概率

permute函数与contiguous、view函数之关联

3.2.5 提取候选框

python列表的双冒号

loc的四个值是卷积之后得到的偏移量，目的是通过神经网络对loc进行训练，得到最符合gt的loc

anchor是最原始的候选框

# loc2bbox(anchor, loc) => loc2bbox(src_bbox, loc)
src_bbox = anchor.astype(anchor.dtype, copy=False)

# 边界框的宽高和中心点坐标
src_height = src_bbox[:, 2] - src_bbox[:, 0]
src_width = src_bbox[:, 3] - src_bbox[:, 1]
src_ctr_y = src_bbox[:, 0] + 0.5 * src_height
src_ctr_x = src_bbox[:, 1] + 0.5 * src_width

# loc的四个值是卷积之后得到的偏移量， 目的是通过神经网络对loc进行训练，得到最符合gt的loc
dy = loc[:, 0::4]
dx = loc[:, 1::4]
dh = loc[:, 2::4]
dw = loc[:, 3::4]
# np.newaxis的功能:插入新维度
# 经过平移和缩放的到新的候选框中心点和宽高
ctr_y = dy * src_height[:, np.newaxis] + src_ctr_y[:, np.newaxis]
ctr_x = dx * src_width[:, np.newaxis] + src_ctr_x[:, np.newaxis]
h = np.exp(dh) * src_height[:, np.newaxis]
w = np.exp(dw) * src_width[:, np.newaxis]

# 得到修改之后的候选框
dst_bbox = np.zeros(loc.shape, dtype=loc.dtype)
dst_bbox[:, 0::4] = ctr_y - 0.5 * h 
dst_bbox[:, 1::4] = ctr_x - 0.5 * w 
dst_bbox[:, 2::4] = ctr_y + 0.5 * h 
dst_bbox[:, 3::4] = ctr_x + 0.5 * w

得到这些候选框之后，对候选框做bbox回归，并去掉过多的候选框

roi = dst_bbox
# img_size   (600, 800)
# slice(0, 4, 2)  (0, 2) 也就是y值， 截取坐标中小于0 大于600的部分，并变为0和600
roi[:, slice(0, 4, 2)] = np.clip(roi[:, slice(0, 4, 2)], 0, img_size[0])
roi[:, slice(1, 4, 2)] = np.clip(roi[:, slice(1, 4, 2)], 0, img_size[1])

min_size = 16 * scale
hs = roi[:, 2] - roi[:, 0]
ws = roi[:, 3] - roi[:, 1]
# 去掉比例小于min_size的候选框
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
roi = roi[keep, :]
score = score[keep]
# 对候选框分数从大到小排序并取前12000个
order = score.ravel().argsort()[::-1]
order = order[:n_pre_nms]
roi = roi[order, :]
score = score[order]
# 非极大值抑制
keep = nms(torch.from_numpy(roi).cuda(), torch.from_numpy(score).cuda(), 0.7)
keep = keep[:n_post_nms]
roi = roi[keep.cpu().numpy()]
batch_index = 0 * np.ones((len(roi), ), dtype=np.int32)
rois.append(roi)
roi_indices.append(batch_index)
roi = np.concatenate(rois, axis=0)
roi_indices = np.concatenate(roi_indices, axis=0)

至此，rpn网络结束，得到了两千个候选框

在这里插入图片描述

Faster Rcnn 代码问题

##### 配置参数部分的加载模型：Config.py

代码是什么意思？？items（）???

def _state_dict(self):
    return {k: getattr(self, k) for k, _ in Config.__dict__.items() \
            if not k.startswith('_')}

数据集部分：dataset.py

img/255之后全黑了，，，正则化的作用

skimage.transform.resize(image, output_shape)

image: 需要改变尺寸的图片

output_shape: 新的图片尺寸

C, H, W = img.shape
scale1 = min_size / min(H, W)
scale2 = max_size / max(H, W)
scale = min(scale1, scale2)
img = img / 255.
img = sktsf.resize(img, (C, H * scale, W * scale),mode='reflect',anti_aliasing=False)

正则化的意义是什么？

怎么正则化，正则化之后怎么全变白了

anchors 是怎么生成的

具体是什么意思

py = base_size / 2.
px = base_size / 2.

anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                       dtype=np.float32)
for i in six.moves.range(len(ratios)):
    for j in six.moves.range(len(anchor_scales)):
        h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
        w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

        index = i * len(anchor_scales) + j
        anchor_base[index, 0] = py - h / 2.
        anchor_base[index, 1] = px - w / 2.
        anchor_base[index, 2] = py + h / 2.
        anchor_base[index, 3] = px + w / 2.

meshgrid的作用？？？ .ravel()的作用是什么？？np.stack（）的作用是什么？？

permute的作用？contiguous()的作用？？

rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4)

softmax的作用？？？
view的作用

Python常见函数

Python strip() 方法用于移除字符串头尾指定的字符（默认为空格）或字符序列。
astype：转换数组的数据类型。
hasattr(f, 'close'):判断f是否又close这个属性

保存数组格式的图片：

img1 = Image.fromarray(np.uint8(img))
img1.save('1.png')

用？？

rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4)

softmax的作用？？？
view的作用

Python常见函数

Python strip() 方法用于移除字符串头尾指定的字符（默认为空格）或字符序列。
astype：转换数组的数据类型。
hasattr(f, 'close'):判断f是否又close这个属性

保存数组格式的图片：

img1 = Image.fromarray(np.uint8(img))
img1.save('1.png')

AsEver_

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Faster Rcnn 逐步实现

Faster Rcnn 逐步实现一.参数配置voc_data_dir = './' # 数据路径二、得到图片和边界框的信息2.1原始图片：图片尺寸：（3， 375， 500）for obj in anno.findall('object'): difficult.append(int(obj.find('difficult').text)) bbox_anno = obj.find('bndbox') bbox.append([int(bbox_anno.find
复制链接

扫一扫