py-faster-rcnn中demo.py代码与C++版本的代码对比: part03 处理图片:减掉平均值, resize

这里”C++版本的代码”是指: https://github.com/galian123/cpp_faster_rcnn_detect .

py-faster-rcnn中demo.py代码, 是指 https://github.com/rbgirshick/py-faster-rcnn/blob/master/tools/demo.py 以及
https://github.com/rbgirshick/py-faster-rcnn/tree/master/lib 目录下的一些代码.

涉及到的.py文件都是 https://github.com/rbgirshick/py-faster-rcnn/ 中的.

★ 读取图片并检测

♦ python代码

demo.py

from fast_rcnn.test import im_detect

    im_file = os.path.join(imagedir, image_name)
    im = cv2.imread(im_file)
    scores, boxes = im_detect(net, im)    

im_detect 在 py-faster-rcnn/lib/fast_rcnn/test.py 中.

♦ C++代码

    cv::Mat im = cv::imread(fullname);
    vector<vector<float> > ans;
    ans = det.Detect(im);

检测函数是封装好的, 需要继续对比检测函数中的细节.

★ 颜色值减掉平均值

这是detect的细节处理, 后续几个小节都是在说明detect的处理.

♦ python代码

py-faster-rcnn/lib/fast_rcnn/test.py

def im_detect(net, im, boxes=None):
    blobs, im_scales = _get_blobs(im, boxes)

def _get_blobs(im, rois):
    blobs['data'], im_scale_factors = _get_image_blob(im)

def _get_image_blob(im):
    # im 是BGR顺序的3维数组, 第一维是高, 第二维是宽, 第三维是3,即Blue,Green,Red三种颜色值.
    # copy为true,表示复制一份原始图片中的颜色值到新的空间im_orig中, 并将BGR颜色值转换成float类型
    # 注意之后的减掉平均值的操作不会影响im
    im_orig = im.astype(np.float32, copy=True)

    # cfg.PIXEL_MEANS 定义在 py-faster-rcnn/lib/fast_rcnn/config.py
    # __C = edict()
    # cfg = __C
    # __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
    # 每一个颜色值减掉102.9801, 115.9465, 122.7717, 即蓝色减掉102.xx, 绿色减掉115.xx, 红色减掉122.xx
    im_orig -= cfg.PIXEL_MEANS

♦ C++代码

vector<vector<int> > Detector::Detect(cv::Mat & cv_img)
{
    // minus means
    cv::Mat cv_new(cv_img.rows, cv_img.cols, CV_32FC3, cv::Scalar(0,0,0));
    for (int h = 0; h < cv_img.rows; ++h ){
        for (int w = 0; w < cv_img.cols; ++w){
            cv_new.at<cv::Vec3f>(cv::Point(w, h))[0] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[0])-float(102.9801);
            cv_new.at<cv::Vec3f>(cv::Point(w, h))[1] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[1])-float(115.9465);
            cv_new.at<cv::Vec3f>(cv::Point(w, h))[2] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[2])-float(122.7717);
        }
    }
}

Mat的构造函数: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html#mat-mat

 C++: Mat::Mat(int rows, int cols, int type, const Scalar& s)

参数含义:
* rows – Number of rows in a 2D array.
* cols – Number of columns in a 2D array.
* type – Array type. Use CV_8UC1, …, CV_64FC4 to create 1-4 channel matrices, or CV_8UC(n), …, CV_64FC(n) to create multi-channel (up to CV_CN_MAX channels) matrices.
* s – An optional value to initialize each matrix element with. To set all the matrix elements to the particular value after the construction, use the assignment operator Mat::operator=(const Scalar& value) .

CV_32FC3 的含义: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html?highlight=cv_32fc3#datatype

Any primitive type from the list can be defined by an identifier in the form CV_{U|S|F}C(), for example: uchar ~ CV_8UC1, 3-element floating-point tuple ~ CV_32FC3, and so on.

  • 补充: caffe的samples中 cpp_classification/classification.cpp 减去mean的操作是通过cv::subtract()完成的.
  cv::Mat sample_normalized;
  cv::subtract(sample_float, mean_, sample_normalized);

★ 计算图片宽高中的最大值, 最小值

♦ python代码

def im_detect(net, im, boxes=None):
    blobs, im_scales = _get_blobs(im, boxes)

def _get_blobs(im, rois):
    blobs['data'], im_scale_factors = _get_image_blob(im)

def _get_image_blob(im):
    # im_shape是im_orig的维度, 是一个元组: (高, 宽, 3)
    im_shape = im_orig.shape
    # 计算高, 宽中的最小值, 放入im_size_min, 最大值放入im_size_max
    im_size_min = np.min(im_shape[0:2])
    im_size_max = np.max(im_shape[0:2])

♦ C++代码

src/util/faster_rcnn.cpp

vector<vector<int> > Detector::Detect(cv::Mat & cv_img)
{
    int max_side = max(cv_img.rows, cv_img.cols);
    int min_side = min(cv_img.rows, cv_img.cols);
}

include/faster_rcnn.hpp

#define max(a, b) (((a)>(b)) ? (a):(b))
#define min(a, b) (((a)<(b)) ? (a):(b))

★ resize 图片

♦ python代码

def im_detect(net, im, boxes=None):
    blobs, im_scales = _get_blobs(im, boxes)

def _get_blobs(im, rois):
    blobs['data'], im_scale_factors = _get_image_blob(im)

def _get_image_blob(im):
    # 用来保存resize之后的图片数据
    processed_ims = []
    # 保存一张图片不同的放大或缩小倍率, 目前只处理一种倍率,即里面只有一个数值(倍率)
    im_scale_factors = []  

    # cfg.TEST.SCALES 定义在py-faster-rcnn/lib/fast_rcnn/config.py
    # __C.TEST.SCALES = (600,)
    # cfg.TEST.MAX_SIZE 的大小是1000 (__C.TEST.MAX_SIZE = 1000)
    for target_size in cfg.TEST.SCALES: # cfg.TEST.SCALES 是个元组,只有一个元素
        im_scale = float(target_size) / float(im_size_min)      # 尝试把宽高中的小值缩放到600
        # 如果宽高中的大值缩放同样倍率时, 超过了1000, 则倍率以大值(im_size_max)缩放到1000为准
        if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:  
            im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
        # 宽高的缩放倍率, 保证resize之后的图片,宽和高不会超过1000*600(或600*1000)
        # 例1, 宽800, 高300的图片, 将高放大到600, 倍率为600/300=2, 此时宽会超过1000, 需要重新计算缩放倍率
        # 放大倍数为1000/800=1.25倍, 当宽放大到1000时, 高为375
        # 例2, 宽500, 高800的图片, 放大倍数为600/500=1.2, 高为960
        # 例3, 宽1000, 高800的图片,缩小倍率为600/800=0.75, 当高为600时,宽为750
        im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
                        interpolation=cv2.INTER_LINEAR)
        im_scale_factors.append(im_scale)  # 保存缩放倍率
        processed_ims.append(im) # 保存resize之后的图片数据
Python: cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) → dst

参数说明:
* src – input image.
* dst – output image; it has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy; the type of dst is the same as of src.
* dsize – output image size; if it equals zero, it is computed as:
dsize=Size(round(fxsrc.cols),round(fysrc.rows))
Either dsize or both fx and fy must be non-zero.
* fx – scale factor along the horizontal axis; when it equals 0, it is computed as (double)dsize.width/src.cols
* fy – scale factor along the vertical axis; when it equals 0, it is computed as (double)dsize.height/src.rows
* interpolation – interpolation method:
* INTER_NEAREST - a nearest-neighbor interpolation
* INTER_LINEAR - a bilinear interpolation (used by default)
* INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.
* INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
* INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
To shrink an image, it will generally look best with CV_INTER_AREA interpolation, whereas to enlarge an image, it will generally look best with CV_INTER_CUBIC (slow) or CV_INTER_LINEAR (faster but still looks OK).

♦ C++代码

vector<vector<int> > Detector::Detect(cv::Mat & cv_img)
{
    const int  MAX_SIZE = 1000;
    const int  SCALE_SIZE = 600;
    int max_side = max(cv_img.rows, cv_img.cols);
    int min_side = min(cv_img.rows, cv_img.cols);

    float img_scale = float(SCALE_SIZE) / float(min_side);
    if (round(float(max_side) * img_scale) > MAX_SIZE) {
        img_scale = float(MAX_SIZE) / float(max_side);
    }

    int height = int(cv_img.rows * img_scale);
    int width = int(cv_img.cols * img_scale);
    cv::Mat cv_resized;
    cv::resize(cv_new, cv_resized, cv::Size(width, height));
}

resize的说明, 在https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=resize#cv2.resize
与python的参数说明一样.

C++: void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR )

————– 分割线 ————–
本系列文章如下:

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值