既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!
由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新
代码实现可以参考下面两篇文章:
RCNN代码简单实现
RCNN算法(github代码复现理解)–学习记录2
优点:
- 相比于传统算法精度mAP大幅提升
缺点:
- 训练时间特别长(84小时)
- 测试阶段很慢,VGG16一张图像47s
- 复杂的多阶段训练
2.SPP-Net
SPP-Net是出自2015年发表在IEEE上的论文,在此之前,所有的神经网络都是需要输入固定尺寸的图片,比如224224(ImageNet)、3232(LenNet)、96*96等。这样对于我们希望检测各种大小的图片的时候,需要经过crop,或者warp等一系列操作,这都在一定程度上导致图片信息的丢失和变形,限制了识别精确度。而且,从生理学角度出发,人眼看到一个图片时,大脑会首先认为这是一个整体,而不会进行crop和warp,所以更有可能的是,我们的大脑通过搜集一些浅层的信息,在更深层才识别出这些任意形状的目标。
论文链接:《Spatial Pyramid Pooling in Deep ConvolutionalNetworks for Visual Recognition》
与RCNN对比,两大改进:
- 直接输入整幅图像,所有区域共享卷积计算,在Conv5层输出基础上提取所有区域特征
- 引入空间金字塔池化SPP(Spatial Pyramid Pooling)
SPP-Net算法流程如下:
- 首先通过选择性搜索,对待检测的图片进行搜索出2000个候选窗口。这一步和R-CNN一样。
- 特征提取阶段。这一步就是和R-CNN最大的区别了,这一步骤的具体操作如下:把整张待检测的图片,输入CNN中,进行一次性特征提取,得到feature maps,然后在feature maps中找到各个候选框的区域,再对各个候选框采用金字塔空间池化,提取出固定长度的特征向量。而R-CNN输入的是每个候选框,然后在进入CNN,因为SPP-Net只需要一次对整张图片进行特征提取,速度会大大提升。
- 最后一步也是和R-CNN一样,采用SVM算法进行特征向量分类识别。
缺点:
- 需要存储大量特征
- 训练时间长(25.5小时)
- SPP层之前的所有卷积层不能fine tune
- 复杂的多阶段训练
代码实现可以参考下面的文章:
SPP-Net代码实现
3.Fast R-CNN
受SPPnet启发,rbg在15年发表Fast R-CNN,它的构思精巧,流程更为紧凑,大幅提高目标检测速度。在同样的最大规模网络上,Fast R-CNN和R-CNN相比,训练时间从84小时减少为9.5小时,测试时间从47秒减少为0.32秒。在PASCAL VOC 2007上的准确率相差无几,约在66%-67%之间。
论文链接:Fast R-CNN
与RCNN、SPP-Net对比的改进:
- 更快的train和test
- 更高的mAP
- 现实end-to-end(端到端)单阶段训练
- 所有层参数可以fine tune
- 不需要离线存储特征文件
在SPP-Net的基础上引入2个新技术:
- 感兴趣区域池化
- 多任务损失函数
Fast R-CNN算法流程如下:
- 输入图像。
- 通过深度网络中的卷积层(VGG、Alexnet、Resnet等中的卷积层)对图像进行特征提取,得到图片的特征图;
- 通过选择性搜索算法得到图像的感兴趣区域(通常取2000个)。
- 对得到的感兴趣区域进行ROI pooling(感兴趣区域池化):即通过坐标投影的方法,在特征图上得到输入图像中的感兴趣区域对应的特征区域,并对该区域进行最大值池化,这样就得到了感兴趣区域的特征,并且统一了特征大小。
- 对ROI pooling层的输出(及感兴趣区域对应的特征图最大值池化后的特征)作为每个感兴趣区域的特征向量。
将感兴趣区域的特征向量与全连接层相连,并定义了多任务损失函数,分别与softmax分类器和boxbounding回归器相连,分别得到当前感兴趣区域的类别及坐标包围框。 - 对所有得到的包围框进行非极大值抑制(NMS),得到最终的检测结果。
Fast R-CNN性能提升:
代码实现可以参考下面的文章:
fast rcnn 代码解析
4.Faster R-CNN
经过R-CNN和Fast RCNN的积淀,Ross B. Girshick在2016年提出了新的Faster RCNN,在结构上,Faster RCNN已经将特征抽取(feature extraction),proposal提取,bounding box regression(rect refine),classification都整合在了一个网络中,使得综合性能有较大提高,在检测速度方面尤为明显。
论文地址:Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks
改进点:
- 集成Region Proposal Network(RPN)网络
- Faster R-CNN = Fast RCNN + RPN
- 取代离线Selective Search模块
- 进一步共享卷积层计算
- 基于Attention注意机制
- Region proposals量少质优(300左右)
Faster RCNN其实可以分为4个主要内容:
- Conv layers。作为一种CNN网络目标检测方法,Faster RCNN首先使用一组基础的conv+relu+pooling层提取image的feature maps。该feature maps被共享用于后续RPN层和全连接层。
- Region Proposal Networks。RPN网络用于生成region proposals。该层通过softmax判断anchors属于positive或者negative,再利用bounding box regression修正anchors获得精确的proposals。
- Roi Pooling。该层收集输入的feature maps和proposals,综合这些信息后提取proposal feature maps,送入后续全连接层判定目标类别。
- Classification。利用proposal feature maps计算proposal的类别,同时再次bounding box regression获得检测框最终的精确位置。
算法整体架构可以阅读:Faster RCNN 实现思路详解
Faster R-CNN性能提升:
部分代码实现:
FasterRCNN.py:
import tensorflow as tf
import numpy as np
from model.rpn import RegionProposalNetwork, Extractor
from model.roi import RoIHead
from utils.anchor import loc2bbox, AnchorTargetCreator, ProposalTargetCreator
def \_smooth\_l1\_loss(pred_loc, gt_loc, in_weight, sigma):
# pred\_loc, gt\_loc, in\_weight
sigma2 = sigma \*\* 2
sigma2 = tf.constant(sigma2, dtype=tf.float32)
diff = in_weight \* (pred_loc - gt_loc)
abs_diff = tf.math.abs(diff)
abs_diff = tf.cast(abs_diff, dtype=tf.float32)
flag = tf.cast(abs_diff.numpy() < (1./sigma2), dtype=tf.float32)
y = (flag \* (sigma2 / 2.) \* (diff \*\* 2) + (1 - flag) \* (abs_diff - 0.5 / sigma2))
return tf.reduce_sum(y)
def \_fast\_rcnn\_loc\_loss(pred_loc, gt_loc, gt_label, sigma):
"""
:param pred\_loc: 1,38,50,36
:param gt\_loc: 17100,4
:param gt\_label: 17100
"""
idx = gt_label > 0
idx = tf.stack([idx, idx, idx, idx], axis=1)
idx = tf.reshape(idx, [-1, 4])
in_weight = tf.cast(idx, dtype=tf.int32)
loc_loss = _smooth_l1_loss(pred_loc, gt_loc, in_weight.numpy(), sigma)
# Normalize by total number of negative and positive rois.
loc_loss /= (tf.reduce_sum(tf.cast(gt_label >= 0, dtype=tf.float32))) # ignore gt\_label==-1 for rpn\_loss
return loc_loss
class FasterRCNN(tf.keras.Model):
def \_\_init\_\_(self, n_class, pool_size):
super(FasterRCNN, self).__init__()
self.n_class = n_class
self.extractor = Extractor()
self.rpn = RegionProposalNetwork()
self.head = RoIHead(n_class, pool_size)
self.score_thresh = 0.7
self.nms_thresh = 0.3
def \_\_call\_\_(self, x):
img_size = x.shape[1:3]
feature_map, rpn_locs, rpn_scores, rois, roi_score, anchor = self.rpn(x)
roi_cls_locs, roi_scores = self.head(feature_map, rois, img_size)
return roi_cls_locs, roi_scores, rois
def predict(self, imgs):
bboxes = []
labels = []
scores = []
img_size = imgs.shape[1:3]
# (2000,84) (2000,21) (2000,4)
roi_cls_loc, roi_score, rois = self(imgs)
prob = tf.nn.softmax(roi_score, axis=-1)
prob = prob.numpy()
roi_cls_loc = roi_cls_loc.numpy()
roi_cls_loc = roi_cls_loc.reshape(-1, self.n_class, 4) # 2000, 21, 4
for label_index in range(1, self.n_class):
cls_bbox = loc2bbox(rois, roi_cls_loc[:, label_index, :])
# clip bounding box
cls_bbox[:, 0::2] = tf.clip_by_value(cls_bbox[:, 0::2], clip_value_min=0, clip_value_max=img_size[0])
cls_bbox[:, 1::2] = tf.clip_by_value(cls_bbox[:, 1::2], clip_value_min=0, clip_value_max=img_size[1])
cls_prob = prob[:, label_index]
mask = cls_prob > 0.05
cls_bbox = cls_bbox[mask]
cls_prob = cls_prob[mask]
keep = tf.image.non_max_suppression(cls_bbox, cls_prob, max_output_size=-1, iou_threshold=self.nms_thresh)
if len(keep) > 0:
bboxes.append(cls_bbox[keep.numpy()])
# The labels are in [0, self.n\_class - 2].
labels.append((label_index - 1) \* np.ones((len(keep),)))
scores.append(cls_prob[keep.numpy()])
if len(bboxes) > 0:
bboxes = np.concatenate(bboxes, axis=0).astype(np.float32)
labels = np.concatenate(labels, axis=0).astype(np.float32)
scores = np.concatenate(scores, axis=0).astype(np.float32)
return bboxes, labels, scores
class FasterRCNNTrainer(tf.keras.Model):
def \_\_init\_\_(self, faster_rcnn):
super(FasterRCNNTrainer, self).__init__()
self.faster_rcnn = faster_rcnn
self.rpn_sigma = 3.0
self.roi_sigma = 1.0
# target creator create gt\_bbox gt\_label etc as training targets.
self.anchor_target_creator = AnchorTargetCreator()
self.proposal_target_creator = ProposalTargetCreator()
def \_\_call\_\_(self, imgs, bbox, label, scale, training=None):
_, H, W, _ = imgs.shape
img_size = (H, W)
features = self.faster_rcnn.extractor(imgs, training=training)
rpn_locs, rpn_scores, roi, anchor = self.faster_rcnn.rpn(features, img_size, scale, training=training)
rpn_score = rpn_scores[0]
rpn_loc = rpn_locs[0]
sample_roi, gt_roi_loc, gt_roi_label = self.proposal_target_creator(roi, bbox.numpy(), label.numpy())
roi_cls_loc, roi_score = self.faster_rcnn.head(features, sample_roi, img_size, training=training)
# RPN losses
gt_rpn_loc, gt_rpn_label = self.anchor_target_creator(bbox.numpy(), anchor, img_size)
gt_rpn_label = tf.constant(gt_rpn_label, dtype=tf.int32)
gt_rpn_loc = tf.constant(gt_rpn_loc, dtype=tf.float32)
rpn_loc_loss = _fast_rcnn_loc_loss(rpn_loc, gt_rpn_loc, gt_rpn_label, self.rpn_sigma)
idx_ = gt_rpn_label != -1
rpn_cls_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)(gt_rpn_label[idx_], rpn_score[idx_])
# ROI losses
n_sample = roi_cls_loc.shape[0]
roi_cls_loc = tf.reshape(roi_cls_loc, [n_sample, -1, 4])
idx_ = [[i, j] for i, j in zip(tf.range(n_sample), tf.constant(gt_roi_label))]
roi_loc = tf.gather_nd(roi_cls_loc, idx_)
gt_roi_label = tf.constant(gt_roi_label)
gt_roi_loc = tf.constant(gt_roi_loc)
roi_loc_loss = _fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label, self.roi_sigma)
idx_ = gt_roi_label != 0
roi_cls_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)(gt_roi_label[idx_], roi_score[idx_])
return rpn_loc_loss, rpn_cls_loss, roi_loc_loss, roi_cls_loss
RPN网络:
import tensorflow as tf
import numpy as np
from utils.anchor import generate_anchor_base, ProposalCreator, _enumerate_shifted_anchor
class Extractor(tf.keras.Model):
def \_\_init\_\_(self):
super(Extractor, self).__init__()
# conv1
self.conv1_1 = tf.keras.layers.Conv2D(32, 3, activation='relu', padding='same')
self.conv1_2 = tf.keras.layers.Conv2D(32, 3, activation='relu', padding='same')
self.pool1 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')
# conv2
self.conv2_1 = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')
self.conv2_2 = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')
self.pool2 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')
# conv3
self.conv3_1 = tf.keras.layers.Conv2D(128, 3, activation='relu', padding='same')
self.conv3_2 = tf.keras.layers.Conv2D(128, 3, activation='relu', padding='same')
self.conv3_3 = tf.keras.layers.Conv2D(128, 3, activation='relu', padding='same')
self.pool3 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')
# conv4
self.conv4_1 = tf.keras.layers.Conv2D(256, 3, activation='relu', padding='same')
self.conv4_2 = tf.keras.layers.Conv2D(256, 3, activation='relu', padding='same')
self.conv4_3 = tf.keras.layers.Conv2D(256, 3, activation='relu', padding='same')
self.pool4 = tf.keras.layers.MaxPooling2D(2, strides=2, padding='same')
# conv5
self.conv5_1 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')
self.conv5_2 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')
self.conv5_3 = tf.keras.layers.Conv2D(512, 3, activation='relu', padding='same')
def \_\_call\_\_(self, imgs, training=None):
h = self.pool1(self.conv1_2(self.conv1_1(imgs)))
h = self.pool2(self.conv2_2(self.conv2_1(h)))
h = self.pool3(self.conv3_3(self.conv3_2(self.conv3_1(h))))
h = self.pool4(self.conv4_3(self.conv4_2(self.conv4_1(h))))
h = self.conv5_3(self.conv5_2(self.conv5_1(h)))
return h
class RegionProposalNetwork(tf.keras.Model):
def \_\_init\_\_(self, ratios=[0.5, 1, 2], anchor_scales=[8, 16, 32]):
super(RegionProposalNetwork, self).__init__()
# region\_proposal\_conv
self.region_proposal_conv = tf.keras.layers.Conv2D(512, kernel_size=3, activation=tf.nn.relu, padding='same')
# Bounding Boxes Regression layer
self.loc = tf.keras.layers.Conv2D(36, kernel_size=1, padding='same')
# Output Scores layer
self.score = tf.keras.layers.Conv2D(18, kernel_size=1, padding='same')
self.anchor = generate_anchor_base(anchor_scales=anchor_scales, ratios=ratios)
self.proposal_layer = ProposalCreator()
def \_\_call\_\_(self, x, img_size, scale, training=None):
n, hh, ww, _ = x.shape
anchor = _enumerate_shifted_anchor(np.array(self.anchor), 16, hh, ww)
n_anchor = anchor.shape[0] // (hh \* ww)
h = self.region_proposal_conv(x)
rpn_loc = self.loc(h) # [1, 38, 50, 36]
rpn_loc = tf.reshape(rpn_loc, [n, -1, 4])
rpn_score = self.score(h) # [1, 38, 50, 18]
# [1, 38, 50, 9, 2]
rpn_softmax_score = tf.nn.softmax(tf.reshape(rpn_score, [n, hh, ww, n_anchor, 2]), axis=-1)
rpn_fg_score = rpn_softmax_score[:, :, :, :, 1]
rpn_fg_score = tf.reshape(rpn_fg_score, [n, -1])
rpn_score = tf.reshape(rpn_score, [n, -1, 2])
roi = self.proposal_layer(rpn_loc[0].numpy(), rpn_fg_score[0].numpy(), anchor, img_size, scale)
return rpn_loc, rpn_score, roi, anchor
ROI.py:
import tensorflow as tf
def roi\_pooling(feature, rois, img_size, pool_size):
"""
用tf.image.crop\_and\_resize实现roi\_align
:param feature: 特征图[1, hh, ww, c]
:param rois: 原图的rois
:param img\_size: 原图的尺寸
:param pool\_size: align后的尺寸
"""
# 所有需要pool的框在batch中的对应图片序号,由于batch\_size为1,因此box\_ind里面的值都为0
box_ind = tf.zeros(rois.shape[0], dtype=tf.int32)
# ROI box coordinates. Must be normalized and ordered to [y1, x1, y2, x2]
# 在这里取到归一化框的坐标时需要的图片尺度
normalization = tf.cast(tf.stack([img_size[0], img_size[1], img_size[0], img_size[1]], axis=0), dtype=tf.float32)
# 归一化框的坐标为原图的0~1倍尺度
boxes = rois / normalization
# 进行ROI pool,之所以需要归一化框的坐标是因为tf接口的要求
![img](https://i-blog.csdnimg.cn/blog_migrate/83030a3e51522b00b5442803cd25fe9f.png)
![img](https://img-blog.csdnimg.cn/img_convert/eb8ae912115b1dc6d24222ccefb442d6.png)
![img](https://img-blog.csdnimg.cn/img_convert/ba9eff068bd5cf945117b391cbb1e566.png)
**既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!**
**由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新**
**[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.csdn.net/forums/4f45ff00ff254613a03fab5e56a57acb)**
ion
# 进行ROI pool,之所以需要归一化框的坐标是因为tf接口的要求
[外链图片转存中...(img-aC4PThLB-1715304845616)]
[外链图片转存中...(img-hBaYSsYN-1715304845617)]
[外链图片转存中...(img-Yr5KVKHR-1715304845617)]
**既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!**
**由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新**
**[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.csdn.net/forums/4f45ff00ff254613a03fab5e56a57acb)**