这里”C++版本的代码”是指: https://github.com/galian123/cpp_faster_rcnn_detect .
py-faster-rcnn中demo.py代码, 是指 https://github.com/rbgirshick/py-faster-rcnn/blob/master/tools/demo.py 以及
https://github.com/rbgirshick/py-faster-rcnn/tree/master/lib 目录下的一些代码.
涉及到的.py文件都是 https://github.com/rbgirshick/py-faster-rcnn/ 中的.
★ 读取图片并检测
♦ python代码
demo.py
from fast_rcnn.test import im_detect
im_file = os.path.join(imagedir, image_name)
im = cv2.imread(im_file)
scores, boxes = im_detect(net, im)
im_detect 在 py-faster-rcnn/lib/fast_rcnn/test.py 中.
♦ C++代码
cv::Mat im = cv::imread(fullname);
vector<vector<float> > ans;
ans = det.Detect(im);
检测函数是封装好的, 需要继续对比检测函数中的细节.
★ 颜色值减掉平均值
这是detect的细节处理, 后续几个小节都是在说明detect的处理.
♦ python代码
py-faster-rcnn/lib/fast_rcnn/test.py
def im_detect(net, im, boxes=None):
blobs, im_scales = _get_blobs(im, boxes)
def _get_blobs(im, rois):
blobs['data'], im_scale_factors = _get_image_blob(im)
def _get_image_blob(im):
# im 是BGR顺序的3维数组, 第一维是高, 第二维是宽, 第三维是3,即Blue,Green,Red三种颜色值.
# copy为true,表示复制一份原始图片中的颜色值到新的空间im_orig中, 并将BGR颜色值转换成float类型
# 注意之后的减掉平均值的操作不会影响im
im_orig = im.astype(np.float32, copy=True)
# cfg.PIXEL_MEANS 定义在 py-faster-rcnn/lib/fast_rcnn/config.py
# __C = edict()
# cfg = __C
# __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
# 每一个颜色值减掉102.9801, 115.9465, 122.7717, 即蓝色减掉102.xx, 绿色减掉115.xx, 红色减掉122.xx
im_orig -= cfg.PIXEL_MEANS
♦ C++代码
vector<vector<int> > Detector::Detect(cv::Mat & cv_img)
{
// minus means
cv::Mat cv_new(cv_img.rows, cv_img.cols, CV_32FC3, cv::Scalar(0,0,0));
for (int h = 0; h < cv_img.rows; ++h ){
for (int w = 0; w < cv_img.cols; ++w){
cv_new.at<cv::Vec3f>(cv::Point(w, h))[0] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[0])-float(102.9801);
cv_new.at<cv::Vec3f>(cv::Point(w, h))[1] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[1])-float(115.9465);
cv_new.at<cv::Vec3f>(cv::Point(w, h))[2] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[2])-float(122.7717);
}
}
}
Mat的构造函数: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html#mat-mat
C++: Mat::Mat(int rows, int cols, int type, const Scalar& s)
参数含义:
* rows – Number of rows in a 2D array.
* cols – Number of columns in a 2D array.
* type – Array type. Use CV_8UC1, …, CV_64FC4 to create 1-4 channel matrices, or CV_8UC(n), …, CV_64FC(n) to create multi-channel (up to CV_CN_MAX channels) matrices.
* s – An optional value to initialize each matrix element with. To set all the matrix elements to the particular value after the construction, use the assignment operator Mat::operator=(const Scalar& value) .
CV_32FC3 的含义: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html?highlight=cv_32fc3#datatype
Any primitive type from the list can be defined by an identifier in the form CV_{U|S|F}C(), for example: uchar ~ CV_8UC1, 3-element floating-point tuple ~ CV_32FC3, and so on.
- 补充: caffe的samples中 cpp_classification/classification.cpp 减去mean的操作是通过
cv::subtract()
完成的.
cv::Mat sample_normalized;
cv::subtract(sample_float, mean_, sample_normalized);
★ 计算图片宽高中的最大值, 最小值
♦ python代码
def im_detect(net, im, boxes=None):
blobs, im_scales = _get_blobs(im, boxes)
def _get_blobs(im, rois):
blobs['data'], im_scale_factors = _get_image_blob(im)
def _get_image_blob(im):
# im_shape是im_orig的维度, 是一个元组: (高, 宽, 3)
im_shape = im_orig.shape
# 计算高, 宽中的最小值, 放入im_size_min, 最大值放入im_size_max
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
♦ C++代码
src/util/faster_rcnn.cpp
vector<vector<int> > Detector::Detect(cv::Mat & cv_img)
{
int max_side = max(cv_img.rows, cv_img.cols);
int min_side = min(cv_img.rows, cv_img.cols);
}
include/faster_rcnn.hpp
#define max(a, b) (((a)>(b)) ? (a):(b))
#define min(a, b) (((a)<(b)) ? (a):(b))
★ resize 图片
♦ python代码
def im_detect(net, im, boxes=None):
blobs, im_scales = _get_blobs(im, boxes)
def _get_blobs(im, rois):
blobs['data'], im_scale_factors = _get_image_blob(im)
def _get_image_blob(im):
# 用来保存resize之后的图片数据
processed_ims = []
# 保存一张图片不同的放大或缩小倍率, 目前只处理一种倍率,即里面只有一个数值(倍率)
im_scale_factors = []
# cfg.TEST.SCALES 定义在py-faster-rcnn/lib/fast_rcnn/config.py
# __C.TEST.SCALES = (600,)
# cfg.TEST.MAX_SIZE 的大小是1000 (__C.TEST.MAX_SIZE = 1000)
for target_size in cfg.TEST.SCALES: # cfg.TEST.SCALES 是个元组,只有一个元素
im_scale = float(target_size) / float(im_size_min) # 尝试把宽高中的小值缩放到600
# 如果宽高中的大值缩放同样倍率时, 超过了1000, 则倍率以大值(im_size_max)缩放到1000为准
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
# 宽高的缩放倍率, 保证resize之后的图片,宽和高不会超过1000*600(或600*1000)
# 例1, 宽800, 高300的图片, 将高放大到600, 倍率为600/300=2, 此时宽会超过1000, 需要重新计算缩放倍率
# 放大倍数为1000/800=1.25倍, 当宽放大到1000时, 高为375
# 例2, 宽500, 高800的图片, 放大倍数为600/500=1.2, 高为960
# 例3, 宽1000, 高800的图片,缩小倍率为600/800=0.75, 当高为600时,宽为750
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_scale_factors.append(im_scale) # 保存缩放倍率
processed_ims.append(im) # 保存resize之后的图片数据
- 关于resize的说明
https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=resize#cv2.resize
Python: cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) → dst
参数说明:
* src – input image.
* dst – output image; it has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy; the type of dst is the same as of src.
* dsize – output image size; if it equals zero, it is computed as:
dsize=Size(round(fx∗src.cols),round(fy∗src.rows))
Either dsize or both fx and fy must be non-zero.
* fx – scale factor along the horizontal axis; when it equals 0, it is computed as
(double)dsize.width/src.cols
* fy – scale factor along the vertical axis; when it equals 0, it is computed as
(double)dsize.height/src.rows
* interpolation – interpolation method:
* INTER_NEAREST - a nearest-neighbor interpolation
* INTER_LINEAR - a bilinear interpolation (used by default)
* INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
* INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
* INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
To shrink an image, it will generally look best with CV_INTER_AREA interpolation, whereas to enlarge an image, it will generally look best with CV_INTER_CUBIC (slow) or CV_INTER_LINEAR (faster but still looks OK).
♦ C++代码
vector<vector<int> > Detector::Detect(cv::Mat & cv_img)
{
const int MAX_SIZE = 1000;
const int SCALE_SIZE = 600;
int max_side = max(cv_img.rows, cv_img.cols);
int min_side = min(cv_img.rows, cv_img.cols);
float img_scale = float(SCALE_SIZE) / float(min_side);
if (round(float(max_side) * img_scale) > MAX_SIZE) {
img_scale = float(MAX_SIZE) / float(max_side);
}
int height = int(cv_img.rows * img_scale);
int width = int(cv_img.cols * img_scale);
cv::Mat cv_resized;
cv::resize(cv_new, cv_resized, cv::Size(width, height));
}
resize的说明, 在https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=resize#cv2.resize
与python的参数说明一样.
C++: void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR )
————– 分割线 ————–
本系列文章如下:
- (1) py-faster-rcnn中demo.py代码与C++版本的代码对比: part01 铺垫, demo.py引入的模块
- (2) py-faster-rcnn中demo.py代码与C++版本的代码对比: part02 初始化, 创建Net
- (3) py-faster-rcnn中demo.py代码与C++版本的代码对比: part03 处理图片:减掉平均值, resize
- (4) py-faster-rcnn中demo.py代码与C++版本的代码对比: part04 图片转存为blob
- (5) py-faster-rcnn中demo.py代码与C++版本的代码对比: part05 Reshape
- (6) py-faster-rcnn中demo.py代码与C++版本的代码对比: part06 forward, rois boxes transform
- (7) py-faster-rcnn中demo.py代码与C++版本的代码对比: part07 nms, 获取符合条件的boxes