<1> 背景:
先介绍写这篇博客的目的,因为本人是个gayhub搬运工,在搜索如标题的代码发现好难找得到,而且几乎好难找到,找了好久,找了好多大佬的代码,再加上本人的辣鸡代码(自己都看不下去)终于跑通了如标题所示功能,虽然本人的代码有点辣鸡,但是想用来跑一跑还是ok的。由于本人是第一次接触tensorflow(pytorch,caffe脑残粉),keras只会调用,加上看得的论文不多,可能有些理解不到位,所以代码写的很辣鸡,请多包涵,因为毕竟自己是搬运了很多大佬的工作才能完成,所以也希望能帮上其他人,毕竟踩坑踩多了很累。。。。
<2>首先说下里面涉及到的操作:
1.keras模型->tensorflow。因为模型是keras训练的(gayhub:GitHub - matterport/Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow),c++调用keras模型有人做,但当时做的时候感觉很多坑,所以我的做法还是倾向keras转tensorflow,再用c++调用
2.c++调用mask rcnn tensorflow模型,主要涉及到输入tensor和输出tensor的操作,还有矩阵操作库Eigen3(这个矩阵库用好了对以后很有帮助)的操作
3.注意事项,这里面用的tensorflow c++库是gpu还是cpu,以及模型batchsize的大小,都要跟你用python tensorflow训练好后保存的模型时用的batchsize以及是gpu还是cpu等操作保持一致。比如你训练好后保存的模型batchsize是32,那么用c++推理时就用batchsize=32
<3>环境配置:
1 系统:windows64位(很少在windows下编程),最好用64位的,因为用的tensorflow1.8好像只支持64位系统,不知道其他的版本怎样。编译器是使用msvc 2015 x64(感觉这个好难用。因为以前在linux下写程序多,一下子转不过来。。) ,IDE是qt4.8
2.安装好protobuf库,这一步主要是为了能够配置gpu,比如选择显卡号,如何使用显存等操作,如果没有这个库会导致出错,显示没有tensorflow::**protobuf**(具体忘了什么名字)的错误,如果你没有使用到前面描述的操作就可以不用安装。(我安装的是protobuf 3.6.1版本的,之前试了几个版本都安装出错,也是挺坑的,如果我编译好的库不能使用,请自行搜索安装
3.安装好显卡驱动和cuda(我的是9.1),正如我刚才所说,我身为一个汽车维修员,有一个锤子在身边,也很合逻辑。。走错片场,,这一步主要是我使用的是gpu版本的tensorflow库,并且涉及到配置显卡,如果你使用的是cpu版本的tensorflow就不用进行2,3步了
4.因为我用的是gpu版本的tensorflow库,大佬编译好的(GitHub - fo40225/tensorflow-windows-wheel: Tensorflow prebuilt binary for Windows)我的是1.8.0 avx2 gpu
5.opencv,版本3.3.0(这个自己安装应该没问题),主要是测试读图,以及从mat转到tensor时用到,不过如果你要读取超大图像,比如数字病理图像,推荐大家使用libvips(这个库很强大),还有openslide(这个用python操作还不错,c++的没试过,因为编译看起来有很多坑的样子)
6.linux系统安装好tensorflow和keras前端,因为训练一般是在服务器上的linux系统.这一步主要给keras转tensorflow模型用
<4>开始踩坑
-
在训练模型的机器上把keras模型转tensorflow模型:
- 下载并按照提示安装使用keras前端的Mask rcnn GitHub - matterport/Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow,以及转换keras模型到tensorflow模型用的 GitHub - parai/Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
- 保存训练好的模型:先修改matterport的Mask_RCNN/samples/coco/coco.py,主要是三步
- 1
- 2 修改inference config配置,这一步很重要,比如GPU_COUNT 和IMAGES_PER_GPU这两个参数必须与parai的Mask_RCNN-master/samples/demo.py的CocoConfig一致,否则会出错,IMAGES_PER_GPU这个参数涉及到预测时图片的张数,我的是32张,是批预测,显卡是1080ti测了下512*512的图片可以跑32张
- 3 修改类别数目
- 4执行coco.py文件,这样就保存为keras模型(模型+权重),我的文件名是mask_rcnn_whole_batch32_new20.h5
- 下面是我的CocoConfig,没有大改,主要是IMAGE_MIN_DIM = 512 和IMAGE_MAX_DIM = 512
class CocoConfig(Config): """Configuration for training on MS COCO. Derives from the base Config class and overrides values specific to the COCO dataset. """ # Give the configuration a recognizable name NAME = "coco" # # # We use a GPU with 12GB memory, which can fit two images. # # Adjust down if you use a smaller GPU. # IMAGES_PER_GPU = 2 # Uncomment to train on 8 GPUs (default is 1) # GPU_COUNT = 8 # Number of classes (including background) #NUM_CLASSES = 1 + 6 # 6 # COCO has 80 classes NUM_CLASSES = 1 + 20 # 6 # COCO has 80 classes # NUMBER OF GPUs to use. For CPU training, use 1 GPU_COUNT = 1 # Number of images to train with on each GPU. A 12GB GPU can typically # handle 2 images of 1024x1024px. # Adjust based on your GPU memory and image sizes. Use the highest # number that your GPU can handle for best performance. IMAGES_PER_GPU = 2 # Number of training steps per epoch # This doesn't need to match the size of the training set. Tensorboard # updates are saved at the end of each epoch, so setting this to a # smaller number means getting more frequent TensorBoard updates. # Validation stats are also calculated at each epoch end and they # might take a while, so don't set this too small to avoid spending # a lot of time on validation stats. STEPS_PER_EPOCH = 2000# 16962 # Number of validation steps to run at the end of every training epoch. # A bigger number improves accuracy of validation stats, but slows # down the training. VALIDATION_STEPS = 4241 # Backbone network architecture # Supported values are: resnet50, resnet101. # You can also provide a callable that should have the signature # of model.resnet_graph. If you do so, you need to supply a callable # to COMPUTE_BACKBONE_SHAPE as well BACKBONE = "resnet101" # Only useful if you supply a callable to BACKBONE. Should compute # the shape of each layer of the FPN Pyramid. # See model.compute_backbone_shapes COMPUTE_BACKBONE_SHAPE = None # The strides of each layer of the FPN Pyramid. These values # are based on a Resnet101 backbone. BACKBONE_STRIDES = [4, 8, 16, 32, 64] # Size of the fully-connected layers in the classification graph FPN_CLASSIF_FC_LAYERS_SIZE = 1024 # Size of the top-down layers used to build the feature pyramid TOP_DOWN_PYRAMID_SIZE = 256 # Length of square anchor side in pixels RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512) # Ratios of anchors at each cell (width/height) # A value of 1 represents a square anchor, and 0.5 is a wide anchor RPN_ANCHOR_RATIOS = [0.5, 1, 2] # Anchor stride # If 1 then anchors are created for each cell in the backbone feature map. # If 2, then anchors are created for every other cell, and so on. RPN_ANCHOR_STRIDE = 1 # Non-max suppression threshold to filter RPN proposals. # You can increase this during training to generate more propsals. RPN_NMS_THRESHOLD = 0.8 # How many anchors per image to use for RPN training RPN_TRAIN_ANCHORS_PER_IMAGE = 256 # ROIs kept after tf.nn.top_k and before non-maximum suppression PRE_NMS_LIMIT = 2000 # ROIs kept after non-maximum suppression (training and inference) POST_NMS_ROIS_TRAINING = 2000 POST_NMS_ROIS_INFERENCE = 1000 # If enabled, resizes instance masks to a smaller size to reduce # memory load. Recommended when using high-resolution images. USE_MINI_MASK = True MINI_MASK_SHAPE = (56, 56) # (height, width) of the mini-mask # Input image resizing # Generally, use the "square" resizing mode for training and predicting # and it should work well in most cases. In this mode, images are scaled # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is # padded with zeros to make it a square so multiple images can be put # in one batch. # Available resizing modes: # none: No resizing or padding. Return the image unchanged. # square: Resize and pad with zeros to get a square image # of size [max_dim, max_dim]. # pad64: Pads width and height with zeros to make them multiples of 64. # If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales # up before padding. IMAGE_MAX_DIM is ignored in this mode. # The multiple of 64 is needed to ensure smooth scaling of feature # maps up and down the 6 levels of the FPN pyramid (2**6=64). # crop: Picks random crops from the image. First, scales the image based # on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of # size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only. # IMAGE_MAX_DIM is not used in this mode. IMAGE_RESIZE_MODE = "square" IMAGE_MIN_DIM = 512 #我的图片是512*512 IMAGE_MAX_DIM = 512 # Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further # up scaling. For example, if set to 2 then images are scaled up to double # the width and height, or more, even if MIN_IMAGE_DIM doesn't require it. # Howver, in 'square' mode, it can be overruled by IMAGE_MAX_DIM. IMAGE_MIN_SCALE = 0 # Image mean (RGB) MEAN_PIXEL = np.array([123.7, 116.8, 103.9]) # Number of ROIs per image to feed to classifier/mask heads # The Mask RCNN paper uses 512 but often the RPN doesn't generate # enough positive proposals to fill this and keep a positive:negative # ratio of 1:3. You can increase the number of proposals by adjusting # the RPN NMS threshold. TRAIN_ROIS_PER_IMAGE = 200 # Percent of positive ROIs used to train classifier/mask heads ROI_POSITIVE_RATIO = 0.33 # Pooled ROIs POOL_SIZE = 7 MASK_POOL_SIZE = 14 # Shape of output mask # To change this you also need to change the neural network mask branch MASK_SHAPE = [28, 28] # Maximum number of ground truth instances to use in one image MAX_GT_INSTANCES = 100 # Bounding box refinement standard deviation for RPN and final detections. RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2]) BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2]) # Max number of final detections DETECTION_MAX_INSTANCES = 100 # Minimum probability value to accept a detected instance # ROIs below this threshold are skipped DETECTION_MIN_CONFIDENCE = 0.8 # Non-maximum suppression threshold for detection DETECTION_NMS_THRESHOLD = 0.3 # Learning rate and momentum # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes # weights to explode. Likely due to differences in optimizer # implementation. LEARNING_RATE = 0.001 LEARNING_MOMENTUM = 0.9 # Weight decay regularization WEIGHT_DECAY = 0.0001 # Loss weights for more precise optimization. # Can be used for R-CNN training setup. LOSS_WEIGHTS = { "rpn_class_loss": 1., "rpn_bbox_loss": 1., "mrcnn_class_loss": 1., "mrcnn_bbox_loss": 1., "mrcnn_mask_loss": 1. } # Use RPN ROIs or externally generated ROIs for training # Keep this True for most situations. Set to False if you want to train # the head branches on ROI generated by code rather than the ROIs from # the RPN. For example, to debug the classifier head without having to # train the RPN. USE_RPN_ROIS = True # Train or freeze batch normalization layers # None: Train BN layers. This is the normal mode # False: Freeze BN layers. Good when using a small batch size # True: (don't use). Set layer in training mode even when predicting TRAIN_BN = False # Defaulting to False since batch size is often small # Gradient norm clipping GRADIENT_CLIP_NORM = 5.0
-
keras模型转tensorflow模型,主要是让parai的Mask_RCNN-master/samples/demo.py的几个参数与matterport的Mask_RCNN/samples/coco/coco.py的参数一致
- 修改parai的Mask_RCNN-master/samples/demo.py的CocoConfig里的类别数目和图像尺寸
- 修改parai的Mask_RCNN-master/samples/demo.py的InferenceConfig以及一些路径
- 修改parai的Mask_RCNN-master/scripts/export_model.py文件
- 最后执行parai的Mask_RCNN-master/samples/demo.py会得到转换好的tensorflow模型,我的是mask_rcnn_batch32_new20.pb,测试pb模型可以用parai的Mask_RCNN-master/infere_from_pb.py进行测试,,到此第一个阶段就完成了
- 修改parai的Mask_RCNN-master/samples/demo.py的CocoConfig里的类别数目和图像尺寸
-
c++调用 mask rcnn tensorflow模型,这一步主要是对tensorflow::tensor以及Eigen::tensor(eigen3)等一系列操作
- 先来看看mask rcnn输入用到的tensor,从parai的Mask_RCNN-master/infere_from_pb.py即下图可以看到主要对应三个输入,分别是img_ph,img_anchors_ph,img_meta_ph这个三个键,它们的值分别是molded_images, image_metas, image_anchors,,,对应的tensorflow::tensor名称分别是input_image_1,input_anchors_1,input_image_meta_1,那么我们就要c++构建这三个tensorflow::tensor,可以根据infere_from_pb.py的def mold_inputs(images) 来查看molded_images, image_metas这两个值如何来的,根据def get_anchors(image_shape, config):函数查看image_anchors如何来的, ,注意以后对tensor的操作加上'tensorflow::'这个域操作符号,因为eigen3里面也有个tensor,防止混淆
-
- 构建input_image_1的tensorflow::tensor,input_image_1这个键对应的是值molded_images,molded_images就是把图像转换成tensorflow::tensor,这一步是cv::Mat->tensor,因为我用的是批预测,所以图片是存在cv::Mat容器里面的,但是不是每个批次都是满的,所以多了个imgNum_actual 参数控制实际中要转换的cv::mat参数
-
void detectBatch::CVMats_to_Tensor(std::vector<cv::Mat> &imgs, tensorflow::Tensor *input_tensor, size_t &imgNum_actual) { /* *Function: CVMats_to_Tensor *Description: cv::mat图像容器转到tensorflow::tensor *Calls: 1. **** *Called By: 1. **** *InputList: 1. imgs 存储cv::mat的图像容器 std::vector<cv::Mat> & 2. input_tensor 要存储数据的tensor tensorflow::Tensor * 3. imgNum_actual 实际要转换cv::mat张数 size_t & *OutPut: 1. NULL */ auto outputMap =input_tensor->tensor<float,4>();//获取tensor指针,注意这里outputMap是Eigen::tensor类型 for(size_t b=0;b<imgNum_actual;b++)//遍历图像张数 { for(int r=0;r<outputMap.dimension(1);r++)//遍历行数 { for(int c=0;c<outputMap.dimension(2);c++)//遍历列数 { //note that opencv mat image channel is B G R //减去均值 outputMap(b,r,c,0)=imgs[b].at<cv::Vec3b>(r,c)[2]-MEAN_PIXEL[0];//R outputMap(b,r,c,1)=imgs[b].at<cv::Vec3b>(r,c)[1]-MEAN_PIXEL[1];//G outputMap(b,r,c,2)=imgs[b].at<cv::Vec3b>(r,c)[0]-MEAN_PIXEL[2];//B } } } }
-
- 构建input_image_meta_1 的tensorflow::tensor,先来看看input_image_meta_1是什么东西,它对应infere_from_pb.py中的img_meta_ph这个键,而值是image_metas,可以从def mold_inputs(images)函数中看到image_metas只是对图像数据信息的包装,比如长宽通道数等等。可以看到image_metas数据就是N*(length of meta data)的二维列表,第一个维度N是预测的batch_size,然后第二个维度是单张图像meta的数据,因为送进去batch里面的图像都是尺寸一样的,所以我们只需构建一个meta数据就行了,其他直接复制后构成一个N*(length of meta data)二维列表就行了
-
def mold_inputs(images): """Takes a list of images and modifies them to the format expected as an input to the neural network. images: List of image matricies [height,width,depth]. Images can have different sizes. Returns 3 Numpy matricies: molded_images: [N, h, w, 3]. Images resized and normalized. image_metas: [N, length of meta data]. Details about each image.#可以看到image_metas数据就是N*(length of meta data)的列表,N是预测的batch_size, windows: [N, (y1, x1, y2, x2)]. The portion of the image that has the original image (padding excluded). """ molded_images = [] image_metas = [] windows = [] for image in images: # Resize image to fit the model expected size # TODO: move resizing to mold_image() molded_image, window, scale, padding, corp = utils.resize_image( image, min_dim=inference_config.IMAGE_MIN_DIM, min_scale=inference_config.IMAGE_MIN_SCALE, max_dim=inference_config.IMAGE_MAX_DIM, mode=inference_config.IMAGE_RESIZE_MODE) print(image.shape) print('Image resized at: ', molded_image.shape) print(window) print(scale) """Takes RGB images with 0-255 values and subtraces the mean pixel and converts it to float. Expects image colors in RGB order.""" molded_image = mold_image(molded_image, inference_config) print('Image molded') #print(a) """Takes attributes of an image and puts them in one 1D array.""" inference_config.NUM_CLASSES = 81 #下面这个函数开始构造image_meta,我们仿造这个函数构建就行了 image_meta = compose_image_meta( 0, image.shape, molded_image.shape, window, scale, np.zeros([inference_config.NUM_CLASSES], dtype=np.int32)) print('Meta of image prepared') image_anchor = [] # TODO # Append molded_images.append(molded_image) windows.append(window) image_metas.append(image_meta) # Pack into arrays molded_images = np.stack(molded_images) image_metas = np.stack(image_metas) windows = np.stack(windows) return molded_images, image_metas, windows
可以看到上面的函数里面真正起作用的是compose_image_meta这个函数,我们可以看一下它的实现
def compose_image_meta(image_id, original_image_shape, image_shape, window, scale, active_class_ids): """Takes attributes of an image and puts them in one 1D array. image_id: An int ID of the image. Useful for debugging. original_image_shape: [H, W, C] before resizing or padding. image_shape: [H, W, C] after resizing and padding window: (y1, x1, y2, x2) in pixels. The area of the image where the real image is (excluding the padding) scale: The scaling factor applied to the original image (float32) active_class_ids: List of class_ids available in the dataset from which the image came. Useful if training on images from multiple datasets where not all classes are present in all datasets. """ meta = np.array( [image_id] + # size=1 list(original_image_shape) + # size=3 list(image_shape) + # size=3 list(window) + # size=4 (y1, x1, y2, x2) in image cooredinates [scale] + # size=1 list(active_class_ids) # size=num_classes ) return meta
可以看到meta数据结构如下
[image_id] + # size=1 长度为1,值的话可以赋值为0
list(original_image_shape) + # size=3 长度为3,主要是原始图像的h,w,c这个三个参数
list(image_shape) + # size=3 长度为3,主要是resize后图像的h,w,c这个三个参数,注意因为我是裁剪好送进去,所以实际上这里original_image_shape和image_shape是一样的
list(window) + # size=4 (y1, x1, y2, x2) in image cooredinates 窗口的坐标,就是显示用的,x1,y1都是0,x2,y2是窗口的大小,但因为是裁剪好送进去,所以x2,y2和图像的h,w一样,这里偷懒了,我不想预测得到坐标后还得经过计算映射到窗口或原图
[scale] + # size=1 缩放比例,主要是resize后图像的长边/resize前图像的长边,如上所诉因为是裁剪好的,resize前后的图像一致,所以这里其实是比例是1
list(active_class_ids) #类别(包括背景)构成列表,里面的元素根据def mold_inputs(images)的操作全部赋值为0
那么构成的c++代码如下
void detectBatch::compose_image_meta() { /* *Function: compose_image_meta *Description: 计算图像meta数据 *Calls: 1. **** *Called By: 1. **** *InputList: 1. NULL *OutPut: 1. NULL */ int imglongSide,inputlongSide; image_meta[0]=0; //original_image_shape: [H, W, C] before resizing or padding. image_meta[1]=inputImg_h; image_meta[2]=inputImg_w; image_meta[3]=inputImg_c; imglongSide=image_meta[1]>=image_meta[2]?image_meta[1]:image_meta[2]; //image_shape: [H, W, C] after resizing and padding image_meta[4]=input_height; image_meta[5]=input_width; image_meta[6]=input_channels; inputlongSide=image_meta[4]>=image_meta[5]?image_meta[4]:image_meta[5]; //window: (y1, x1, y2, x2) in pixels. The area of the image where the real image is (excluding the padding) image_meta[7]=0; image_meta[8]=0; image_meta[9]=input_height;//因为我的图像都是裁剪好再送进去的,所以窗口的长宽与实际图像长宽一致 image_meta[10]=input_width; //scale: The scaling factor applied to the original image (float32) image_meta[11]=inputlongSide/imglongSide; //active_class_ids: List of class_ids available in the dataset from which the image came. for(int i=TF_MASKRCNN_IMAGE_METADATA_LENGTH-num_classes;i<TF_MASKRCNN_IMAGE_METADATA_LENGTH;i++) { image_meta[i]=0; } inputMetadataTensor=tensorflow::Tensor(tensorflow::DT_FLOAT, {batch_size, TF_MASKRCNN_IMAGE_METADATA_LENGTH}); auto inputMetadataTensorMap=inputMetadataTensor.tensor<float,2>(); for(int j=0;j<batch_size;j++) { for(int i=0;i<TF_MASKRCNN_IMAGE_METADATA_LENGTH;i++) { //std::cout<<"image_meta["<<i<<"] is "<<image_meta[i]<<std::endl; inputMetadataTensorMap(j,i)=image_meta[i]; } } }
-
- 构建input_anchors_1的tensorflow::tensor,先来看看input_anchors_1是什么东西,它对应infere_from_pb.py中的img_anchors_ph这个键,而值是image_anchors,可以从def get_anchors(image_shape, config)函数中看到image_anchors如何来的。这一步是最复杂的,同时也是最重要的,了解这一部分的代码对mask rcnn的理解很有帮助。
可以从infere_from_pb.py看def get_anchors(image_shape, config)函数def get_anchors(image_shape, config): """Returns anchor pyramid for the given image size.""" backbone_shapes = compute_backbone_shapes(config, image_shape)#计算 #输入图像经过backbone的每一个阶段(可能是pooling或者conv等down sample操作导 #致feature map缩小后的尺寸,这一部分没细看)后feature图的形状(长宽) # Cache anchors and reuse if image shape is the same _anchor_cache = {} if not tuple(image_shape) in _anchor_cache:#先判断,如果之前计算 #过同样尺寸图像的的anchor,就不用重新计算,直接取_anchor_cache里面存储好的 #上一次的anchor就行,感觉作者这个操作很细腻,以前我都没想过这种操作,不过我的 #输入图像是因为事先裁剪好的图像,即我的图像都是512*512的,所以我这边只要计算 #好anchor后,后面不用判断也不用再次计算,这样感觉再处理超大图像时可以省时间, #不过我没测试过。。 # Generate Anchors a = utils.generate_pyramid_anchors( config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, backbone_shapes, config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE) # Keep a copy of the latest anchors in pixel coordinates because # it's used in inspect_model notebooks. # TODO: Remove this after the notebook are refactored to not use it anchors = a # Normalize coordinates _anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2]) return _anchor_cache[tuple(image_shape)]
- 从上的代码可以看出构成anchors主要是utils.generate_pyramid_anchors这个函数,里面的五个参数:
主要看backbone_shapes这个参数以及utils.generate_pyramid_anchors函数本身,其余四个参数在class CocoConfig(Config)里面可以看到只是简单的数组列表- 第一先看backbone_shapes怎么来的,它是从 compute_backbone_shapes(config, image_shape)得来的,代码如下
可以看到上面代码,先看image_shape这个参数,这个其实是输入图像(resize后的)的尺寸,主要是行数列数(高度和宽度),返回的是一个[N*(height,width)]的数组,N就是backbone网络的阶段数目(这里指的是提取出anchors的那一层featuremap,一层为一个阶段,这个可以从maskrcnn论文看出,因为mask rcnn就是从backbone网络的其中几个阶段的featuremap提取anchors的,论文设置是从5个阶段featuremap提取anchors的,这点从class CocoConfig(Config)的BACKBONE_STRIDES = [4, 8, 16, 32, 64]看到就是5个阶段),至于后面的image_shape[0] / stride等除法操作,其实就是计算该阶段featuremap的长宽,所以这里的BACKBONE_STRIDES = [4, 8, 16, 32, 64]其实就是表示图像到达这个阶段时尺寸缩小的倍数,比如是如下图像是512*512,那么经过第一个阶段就变成128*128,以此类推就是64*64,32*32,16*16,8*8, ,那么[N*(height,width)]数组返回的就是[ [128,128],[64,64],[32,32],[16,16],[8,8] ]def compute_backbone_shapes(config, image_shape): """Computes the width and height of each stage of the backbone network. Returns: [N, (height, width)]. Where N is the number of stages """ # Currently supports ResNet only assert config.BACKBONE in ["resnet50", "resnet101"] return np.array( [[int(math.ceil(image_shape[0] / stride)), int(math.ceil(image_shape[1] / stride))] for stride in config.BACKBONE_STRIDES])
那么生成[N*(height,width)]数组的c++部分代码如下:float BACKBONE_STRIDES[5]={4, 8, 16, 32, 64};//用于计算输入图像经过backbone的每一个阶段(可能是pooling或者conv等down sample操作导致feature map缩小后的尺寸,这一部分没细看)后feature图的长宽 int backbone_strides_num =5; int backbone_shape[5][2];//for backbone_shape for(int i=0;i<backbone_strides_num;i++) { backbone_shape[i][0]=ceil(inputImg_h/BACKBONE_STRIDES[i]); backbone_shape[i][1]=ceil(inputImg_w/BACKBONE_STRIDES[i]); }
- 第二步,看utils.generate_pyramid_anchors这个函数做了什么操作,这个函数在matterport和pari的Mask_RCNN/mrcnn的utils文件都有,因为我们一开始安装的是matterport的mask rcnn,所以看他的就行了,这个函数代码如下
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides, anchor_stride): """Generate anchors at different levels of a feature pyramid. Each scale is associated with a level of the pyramid, but each ratio is used in all levels of the pyramid. Returns: anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted with the same order of the given scales. So, anchors of scale[0] come first, then anchors of scale[1], and so on. """ # Anchors # [anchor_count, (y1, x1, y2, x2)] anchors = [] for i in range(len(scales)): anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i], feature_strides[i], anchor_stride)) return np.concatenate(anchors, axis=0)
可以看出里面主要是generate_anchors函数起作用,那么我们就去看generate_anchors函数,这个函数也在utils文件里面,函数如下def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride): """ scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128] ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2] shape: [height, width] spatial shape of the feature map over which to generate anchors. feature_stride: Stride of the feature map relative to the image in pixels. anchor_stride: Stride of anchors on the feature map. For example, if the value is 2 then generate anchors for every other feature map pixel. """ # Get all combinations of scales and ratios #对所有的尺度和缩放因子进行组合 #配置文件中尺度和缩放因子如下 #RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512) #RPN_ANCHOR_RATIOS = [0.5, 1, 2] #可以去看下np.meshgrid(x1,x2)函数的操作,其实就是为了x1,x2的元素进行两两配对(仅仅是位置)而返回一个矩阵组合,假设外围循环传进来的是scales参数是RPN_ANCHOR_SCALES [0] scales, ratios = np.meshgrid(np.array(scales), np.array(ratios)) ''' 经过np.meshgrid后,scales= array([32], [32], [32]) ratios = array( [0.5], [1], [ 2]) ''' ''' 对scales和ratios先进行平铺,例如操作后,scales=([32,32,32]) , ''' scales = scales.flatten() ratios = ratios.flatten() ''' 多个尺寸的anchors进行多个ratio的缩放,得到多个不同比例组合anchors的宽高 ''' # Enumerate heights and widths from scales and ratios heights = scales / np.sqrt(ratios) widths = scales * np.sqrt(ratios) ''' 计算得到多个不同偏移量y,x坐标,就是anchors中心点位置,shape[0],shape[1]是featuremap的宽 高(注意是经过上面的以config.BACKBONE_STRIDES={4, 8, 16, 32, 64}为比例缩放后的), anchor_stride在配置中是1,就是anchors之间的间隔为1, feature_stride就是config.BACKBONE_STRIDES={4, 8, 16, 32, 64}的元素遍历,乘以 feature_stride相当于把np.arange(0, shape[0], anchor_stride)的元素映射回原图(我的是 512*512)的坐标 ''' # Enumerate shifts in feature space shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride ''' 中心点位置进行两两组合就可以得到不同组合的bbox的中心点组合(仅仅是位置),真正配对在下面 ''' shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y) # Enumerate combinations of shifts, widths, and heights ''' 宽高分别与中心点的x,y坐标进行两两组合就可以得到不同组合的bbox的中心点组合 这里只简单给出原理,结合meshgrid操作来看 比如:由上述可知widths是一维数组,widths=[e,f,g] shifts_x=[[0,1],[2,3],[4,5]] 那么经过np.meshgrid(widths, shifts_x)后,box_widths=[[e,f,g], [e,f,g], [e,f,g], [e,f,g], [e,f,g], [e,f,g] ] box_widths的shape就是6x3, 同理 box_center_x=[[0,0,0], [1,1,1], [2,2,2], [3,3,3], [4,4,4], [5,5,5] ] 可以看出就是x与width两两配对,y与height两两配对(仅仅是位置) ''' box_widths, box_centers_x = np.meshgrid(widths, shifts_x) box_heights, box_centers_y = np.meshgrid(heights, shifts_y) ''' 接下来是np.stack,np.stack用法可以参加官网或者 https://blog.csdn.net/wgx571859177/article/details/80987459,如果自己实践一下更能体会它 的用法 假设box_center_x还是上面说的形状,也就是6x3,box_center_y也是同样的形状(假设其元素是6,7,8,9,10,11),那么经过下面的np.stack 后,box_centers形状就是6x3x2,这里就是真正的配对了,要这样看,6x(3x2),那么box_centers的形式 如下 box_centers=[ [[0,6],[0,6],[0,6]], [[1,7],[1,7],[1,7]], [[2,8],[2,8],[2,8]], [[3,9],[3,9],[3,9]], [[4,10],[4,10],[4,10]], [[5,11],[5,11],[5,11]] ] 最后再进行reshape([-1,2])后box_centers=[[0,6],[0,6],[0,6],[1,7],...]的形式,形状是 18x2, box_sizes同理,只不过里面元素是宽高, ''' # Reshape to get a list of (y, x) and a list of (h, w) box_centers = np.stack( [box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) ''' box_centers - 0.5 * box_sizes,box_centers - 0.5 * box_sizes就是得出左上右下角点的x,y坐 标,进行np.concatenate(axis=1)后,boxes的形状就是18x4,4代表左上右下角点的x,y坐标,形式是 18x(y1, x1, y2, x2),这里的18就是anchor的个数(这里是假设的,论文不是这个数值,得看配置然后 经过上面的计算就知道个数了,只是为了方便演示),(y1, x1, y2, x2)是不同形式的anchor, ''' # Convert to corner coordinates (y1, x1, y2, x2) boxes = np.concatenate([box_centers - 0.5 * box_sizes, box_centers + 0.5 * box_sizes], axis=1) return boxes
而这一段的c++代码如下:
int finalBoxesRows=0;//用于统计五个RPN_ANCHOR_SCALES尺度对应的所有boxes的行数,可以先不看这个 //generate_pyramid_anchors //生成不同尺度(配置参数中是5个)的anchor for(int j=0;j<rpn_anchor_scales_num;j++) { //generate_anchors //Get all combinations of scales and ratios Eigen::RowVectorXf scalesVec(1);//遍历并且临时存储RPN_ANCHOR_SCALES[5]={32, 64, 128, 256, 512}的每个元素,主要给scalesMat赋值用 Eigen::VectorXf ratiosVec(rpn_anchor_ratios_num); Eigen::MatrixXf scalesMat=Eigen::MatrixXf(rpn_anchor_ratios_num, 1);//(); Eigen::MatrixXf ratiosMat=Eigen::MatrixXf(rpn_anchor_ratios_num, 1);//(); Eigen::MatrixXf heightsMat;//=Eigen::MatrixXf(rpn_anchor_ratios_num, 1);//(); Eigen::MatrixXf widthsMat;//=Eigen::MatrixXf(rpn_anchor_ratios_num, 1);//(); //以下步骤主要是实现python中的 /* scales, ratios = np.meshgrid(np.array(scales), np.array(ratios)) */ scalesVec(0)=(RPN_ANCHOR_SCALES[j]); //构造np.array(ratios) for(int i=0;i<rpn_anchor_ratios_num;i++) { ratiosVec(i)=RPN_ANCHOR_RATIOS[i]; } for(int i=0;i<ratiosMat.cols();i++) { ratiosMat.col(i)<<ratiosVec; } //构造np.array(scales) //std::cout<<"scalesMat is <<"<<scalesMat.cols()<<std::endl; for(int i=0;i<scalesMat.rows();i++) { scalesMat.row(i)<<scalesVec; } //构造heights,widths,这两个在python里面是长度为3的向量,但为了后面的点乘等操作换成了3*1的矩阵 //python代码如下 /* heights = scales / np.sqrt(ratios) widths = scales * np.sqrt(ratios) */ //Enumerate heights and widths from scales and ratios heightsMat=scalesMat.cwiseQuotient(ratiosMat.cwiseSqrt()); widthsMat=scalesMat.cwiseProduct(ratiosMat.cwiseSqrt()); //构造shifts_x, shifts_y //python代码如下 /* shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y) */ //Enumerate shifts in feature space //先进行 shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride // shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride int step=RPN_ANCHOR_STRIDE,low=0,hight_y=backbone_shape[j][0],hight_x=backbone_shape[j][1];//获取shape[0],shape[1],anchor_stride, Eigen::RowVectorXf shifts_y;//行向量 Eigen::RowVectorXf shifts_x; int realsize_y=((hight_y-low)/step); int realsize_x=((hight_x-low)/step); shifts_y.setLinSpaced(realsize_y,low,low+step*(realsize_y-1)); shifts_x.setLinSpaced(realsize_x,low,low+step*(realsize_x-1)); shifts_y*=BACKBONE_STRIDES[j];//获取feature_stride,这里的feature_stride其实是python代码中外围循环送进的参数BACKBONE_STRIDES[j] shifts_x*=BACKBONE_STRIDES[j];//获取feature_stride,这里的feature_stride其实是python代码中外围循环送进的参数BACKBONE_STRIDES[j] /*再进行 shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y), 构造出最终的shifts_x,shifts_y矩阵,注意经过np.meshgrid后shifts_x,shifts_y是二维的矩阵 */ //构造shifts_x,shifts_y矩阵 Eigen::MatrixXf shifts_xMat(shifts_y.cols(),shifts_x.cols()) ,shifts_yMat(shifts_y.cols(),shifts_x.cols()); for(int i=0;i<shifts_xMat.rows();i++) { shifts_xMat.row(i)=shifts_x; } for(int i=0;i<shifts_yMat.cols();i++) { shifts_yMat.col(i)=shifts_y; } //进行python代码 /* box_widths, box_centers_x = np.meshgrid(widths, shifts_x) box_heights, box_centers_y = np.meshgrid(heights, shifts_y) # Reshape to get a list of (y, x) and a list of (h, w) box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2]) # Convert to corner coordinates (y1, x1, y2, x2) boxes = np.concatenate([box_centers - 0.5 * box_sizes, box_centers + 0.5 * box_sizes], axis=1) return boxes */ //Enumerate combinations of shifts, widths, and heights //先进行 box_widths, box_centers_x = np.meshgrid(widths, shifts_x) // box_heights, box_centers_y = np.meshgrid(heights, shifts_y) //先把heightsMat,widthsMat换成行向量方便赋值, Eigen::RowVectorXf heightsMatFlat(Eigen::Map<Eigen::VectorXf>(heightsMat.data(),heightsMat.rows()*heightsMat.cols())); Eigen::RowVectorXf widthsMatFlat(Eigen::Map<Eigen::VectorXf>(widthsMat.data(),widthsMat.rows()*widthsMat.cols())); /*因为上面的np.meshgrid(widths, shifts_x) 中widths是长度为3的向量,shifts_x是二维矩阵,所以np.meshgrid(widths, shifts_x)生成的矩阵列数是widths的长度 生成的矩阵行数是--shifts_x按照行方向平铺后的长度,假如shifts_x是2*3矩阵,那么就是6.而后面 box_widths, box_centers_x = np.meshgrid(widths, shifts_x)生成的box_centers_x的行数是shifts_x的行数*列数,box_centers_x每一列是shifts_x矩阵的元素按照行方向平铺后构成的, 但是因为eigen里面的矩阵是列优先存储,所以要在c++代码中对shifts_xMat(shift_x)进行转置,这样通过Eigen::Map映射到shifts_yMatFlat就是相当于把shifts_x矩阵的元素按照行方向平铺后构成的向量 同理对shifts_yMat进行同样的操作得到shifts_yMatFlat. 而box_widths,box_heights可以通过widthsMatFlat,和heightsMatFlat赋值得到,因为heightsMatFlat和box_heights可以通过widthsMatFlat 本身是一维的向量 */ shifts_xMat.transposeInPlace(); shifts_yMat.transposeInPlace(); Eigen::RowVectorXf shifts_yMatFlat(Eigen::Map<Eigen::VectorXf>(shifts_yMat.data(),shifts_yMat.rows()*shifts_yMat.cols())); //Eigen::RowVectorXf shifts_xMatFlat(Eigen::Map<Eigen::VectorXf>(shifts_xMat.data(),shifts_xMat.rows()*shifts_xMat.cols(),Eigen::ColMajor)); Eigen::RowVectorXf shifts_xMatFlat(Eigen::Map<Eigen::VectorXf>(shifts_xMat.data(),shifts_xMat.rows()*shifts_xMat.cols())); Eigen::MatrixXf box_widthsMat=Eigen::MatrixXf(shifts_xMatFlat.cols(),widthsMatFlat.cols());//(); Eigen::MatrixXf box_center_xMat=Eigen::MatrixXf(shifts_xMatFlat.cols(),widthsMatFlat.cols());//(); Eigen::MatrixXf box_heightsMat=Eigen::MatrixXf(shifts_yMatFlat.cols(),heightsMatFlat.cols());//(); Eigen::MatrixXf box_center_yMat=Eigen::MatrixXf(shifts_yMatFlat.cols(),heightsMatFlat.cols());//(); for(int i=0;i<box_widthsMat.rows();i++) { box_widthsMat.row(i)=widthsMatFlat; box_heightsMat.row(i)=heightsMatFlat; } for(int i=0;i<box_heightsMat.cols();i++) { box_center_xMat.col(i)=shifts_xMatFlat; box_center_yMat.col(i)=shifts_yMatFlat; } //Convert to corner coordinates (y1, x1, y2, x2) // 'e for 's element abbreviation //note that ,in the bellow,matrix's element which to be add or substract, is In the corresponding position //python method: box_centers_y mat ,box_centers_x mat stack to mat A whose unit format is (box_center_y'e,box_center_x'e) //then reshape to [-1,2],so the result is mat whose col format is (box_center_y'e,box_center_x'e),box_sizes mat B is the same,col format is (box_height'e,box_width'e) //then A-B ,A+B get the mat C,D whose col format are respectively (box_center_y'e-box_height'e,box_center_x'e-box_width'e) and (box_center_y'e+box_height'e,box_center_x'e+box_width'e) //then concat C and D get mat E whose col format is (box_center_y'e-box_height'e,box_center_x'e-box_width'e ,box_center_y'e+box_height'e,box_center_x'e+box_width'e) //and that is (y1,x1,y2,x2) //in eigen3,different to python //first we have got the matrix box_center_yMat box_center_xMat box_heightsMat box_widthsMat //for abbreviation is center_yMat,center_xMat,heightMat,widthMat //center_yMat-0.5*heightMat=y1Mat //center_yMat+0.5*heightMat=y2Mat //center_xMat-0.5*widthMat=x1Mat //center_xMat+0.5*widthMat=x2Mat //then generate the matrix boxes whose col format is (y1Mat's e,x1Mat's e,y2Mat's e ,x2Mat's e),rows in the num //进行如下操作 //boxes = np.concatenate([box_centers - 0.5 * box_sizes, //box_centers + 0.5 * box_sizes], axis=1) //boxes形式如[(y1, x1, y2, x2),...,...] Eigen::MatrixXf y1Mat=box_center_yMat-box_heightsMat*0.5; Eigen::MatrixXf x1Mat=box_center_xMat-box_widthsMat*0.5; Eigen::MatrixXf y2Mat=box_center_yMat+box_heightsMat*0.5; Eigen::MatrixXf x2Mat=box_center_xMat+box_widthsMat*0.5; y1Mat.transposeInPlace(); x1Mat.transposeInPlace(); y2Mat.transposeInPlace(); x2Mat.transposeInPlace(); Eigen::RowVectorXf y1MatFlat(Eigen::Map<Eigen::VectorXf>(y1Mat.data(),y1Mat.rows()*y1Mat.cols())); Eigen::RowVectorXf x1MatFlat(Eigen::Map<Eigen::VectorXf>(x1Mat.data(),x1Mat.rows()*x1Mat.cols())); Eigen::RowVectorXf y2MatFlat(Eigen::Map<Eigen::VectorXf>(y2Mat.data(),y2Mat.rows()*y2Mat.cols())); Eigen::RowVectorXf x2MatFlat(Eigen::Map<Eigen::VectorXf>(x2Mat.data(),x2Mat.rows()*x2Mat.cols())); Eigen::MatrixXf boxes(y1Mat.rows()*y1Mat.cols(),4);//注意这里的boxes不是python代码里面对应的boxes boxes.col(0)=y1MatFlat; boxes.col(1)=x1MatFlat; boxes.col(2)=y2MatFlat; boxes.col(3)=x2MatFlat; //到此已经完成单独一个RPN_ANCHOR_SCALES[i]尺度对应的boxes了 //把它放进容器里 boxesVec.push_back(boxes); finalBoxesRows+=boxes.rows();//统计五个RPN_ANCHOR_SCALES尺度对应的所有boxes的行数 //break; } //以上一步得到的boxes的finalBoxesRows为行数,4为列数创建二维矩阵finalBox(对应python代码的boxes), //其实就是用上面所有的boxes构建形式如[(y1, x1, y2, x2),...,...]的矩阵 finalBox=Eigen::MatrixXf (finalBoxesRows,4); //Eigen::VectorXf a(3); //Eigen::VectorXf b(4); //Eigen::VectorXf c(7); //取出boxesVec容器里面每个boxes构建最终的finalBox矩阵(对应boxes) //至此完成了boxes(python代码中)的构建 int beginX=0; for(int i=0;i<boxesVec.size();i++) { //mat1.block<rows,cols>(i,j) //矩阵块赋值 finalBox.block(beginX,0,boxesVec[i].rows(),boxesVec[i].cols())=boxesVec[i]; beginX+=boxesVec[i].rows(); //tensorflow::Tensor matTensor(tensorflow::DT_FLOAT,{boxesVec[i].rows(),boxesVec[i].cols()}); }
- 第三步,进行归一化
# Normalize coordinates _anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2]) return _anchor_cache[tuple(image_shape)]
#python代码 ''' 大概作用就是把上一步得到的boxes(形状如[(y1,x1,y2,x2),...,...])先进行对应位置元素减去偏移量[0,0,1,1]后再除以[h - 1, w - 1, h - 1, w - 1] ''' def norm_boxes(boxes, shape): """Converts boxes from pixel coordinates to normalized coordinates. boxes: [N, (y1, x1, y2, x2)] in pixel coordinates shape: [..., (height, width)] in pixels Note: In pixel coordinates (y2, x2) is outside the box. But in normalized coordinates it's inside the box. Returns: [N, (y1, x1, y2, x2)] in normalized coordinates """ h, w = shape scale = np.array([h - 1, w - 1, h - 1, w - 1]) shift = np.array([0, 0, 1, 1]) return np.divide((boxes - shift), scale).astype(np.float32)
c++代码如下
/*get normalization finalbox 归一化finalBox python代码如下: scale = np.array([h - 1, w - 1, h - 1, w - 1]) shift = np.array([0, 0, 1, 1]) return np.divide((boxes - shift), scale).astype(np.float32) */ //先创建scale,shift两个向量 Eigen::MatrixXf scaleMat_1r(1,finalBox.cols()); Eigen::MatrixXf shiftMat_1r(1,finalBox.cols()); scaleMat_1r<<float(inputImg_h-1),float(inputImg_w-1),float(inputImg_h-1),float(inputImg_w-1); shiftMat_1r<<0.f,0.f,1.f,1.f; //因为上一步得到是scaleMat_1r,shiftMat_1r是向量,接下来创建对应的矩阵,该矩阵与finalBox有相同的 //形状 Eigen::MatrixXf scaleMat=scaleMat_1r.colwise().replicate(finalBox.rows());//通过重复与finalBox同样的行数构建scaleMat Eigen::MatrixXf shiftMat=shiftMat_1r.colwise().replicate(finalBox.rows());//同上 Eigen::MatrixXf tmpMat=finalBox-shiftMat;//finalBox对应位置元素减去偏移量 finalBox_norm=tmpMat.cwiseQuotient(scaleMat);//finalBox对应位置元素处以scale //至此完成了python代码中的boxes(finalBox_norm),下一步把finalBox_norm矩阵弄成Eigen::tensor类型的inputAnchorsTensor_temp //再通过inputAnchorsTensor_temp填充到tensorflow::tensor类型的inputAnchorsTensor构建最后送入模型的anchor boxes inputAnchorsTensor=tensorflow::Tensor(tensorflow::DT_FLOAT,{batch_size,finalBox_norm.rows(),finalBox_norm.cols()});//初始化inputAnchorsTensor //float *p=inputAnchorsTensor.flat<float>().data(); //通finalBox_norm矩阵构建Eigen::tensor类型的inputAnchorsTensor_temp Eigen::Tensor<float,3>inputAnchorsTensor_temp(1,finalBox_norm.rows(),finalBox_norm.cols()); for(int i=0;i<finalBox_norm.rows();i++){ Eigen::Tensor<float,1>eachrow(finalBox_norm.cols());//用于临时存储finalBox_norm矩阵的的每一行 //把finalBox_norm矩阵的一行放进eachrow eachrow.setValues({finalBox_norm.row(i)[0],finalBox_norm.row(i)[1],finalBox_norm.row(i)[2],finalBox_norm.row(i)[3]}); //把eachrow放进inputAnchorsTensor_temp的每一行 inputAnchorsTensor_temp.chip(i,1)=eachrow; } //把inputAnchorsTensor_temp赋值给inputAnchorsTensor,注意它们两个的类型是不同的 auto showMap=inputAnchorsTensor.tensor<float,3>(); for(int b=0;b<showMap.dimension(0);b++) { for(int r=0;r<showMap.dimension(1);r++) { for(int c=0;c<showMap.dimension(2);c++) { showMap(b,r,c)=inputAnchorsTensor_temp(0,r,c);//这里为0是因为 //我的batch里面的图片都是同样尺寸的,所以它们最终的anchor boxes都是一样, //只要赋值一个就行了,建议batch里面图片尺寸都是一样的,这样好处理 } } }
- 第一先看backbone_shapes怎么来的,它是从 compute_backbone_shapes(config, image_shape)得来的,代码如下
- 从上的代码可以看出构成anchors主要是utils.generate_pyramid_anchors这个函数,里面的五个参数:
- 构建input_image_1的tensorflow::tensor,input_image_1这个键对应的是值molded_images,molded_images就是把图像转换成tensorflow::tensor,这一步是cv::Mat->tensor,因为我用的是批预测,所以图片是存在cv::Mat容器里面的,但是不是每个批次都是满的,所以多了个imgNum_actual 参数控制实际中要转换的cv::mat参数
- 至此我们已经完成所有送进模型的tensor了,那么网络进行前向推理后,得到的结果也是以tensor的形式保存,怎么提取我们想要的结果能,可以先看下parai/Mask_RCNN的python infere_from_pb.py代码
detections, mrcnn_class, mrcnn_bbox, mrcnn_mask, rois = \ sess.run([detectionsT, mrcnn_classT, mrcnn_bboxT, mrcnn_maskT, roisT], feed_dict={img_ph: molded_images, img_meta_ph: image_metas, img_anchors_ph:image_anchors}) //上面是模型推理的代码 //目标的坐标,类型和置信度结果存储在detections,分割结果存储在mrcnn_mask里面 //我们只要看unmold_detections函数怎么实现就行了 results = [] for i, image in enumerate(images): final_rois, final_class_ids, final_scores, final_masks =\ unmold_detections(detections[i], mrcnn_mask[i], image.shape, molded_images[i].shape, windows[i]) results.append({ "rois": final_rois, "class_ids": final_class_ids, "scores": final_scores, "masks": final_masks, })
可以看到里面起主要作用的是unmold_detections(...)这个函数,这个函数在infere_from_pb.py和mrcnn/model.py都有,是一样的。
接下来看这个函数的python代码:如下
def unmold_detections(detections, mrcnn_mask, original_image_shape, image_shape, window): """Reformats the detections of one image from the format of the neural network output to a format suitable for use in the rest of the application. detections: [N, (y1, x1, y2, x2, class_id, score)] in normalized coordinates 这个是网络输出的目标框结果,N是检测到目标的个数,(y1, x1, y2, x2, class_id, score)分别 是四个坐标+类别id+分数,形式是N个(y1, x1, y2, x2, class_id, score)构成的数组 mrcnn_mask: [N, height, width, num_classes] 这个是网络输出的分割结果, original_image_shape: [H, W, C] Original image shape before resizing 原始图像尺寸 image_shape: [H, W, C] Shape of the image after resizing and padding 这个是送入网络的图像的尺寸 window: [y1, x1, y2, x2] Pixel coordinates of box in the image where the real image is excluding the padding. 这个是显示窗口的尺寸 Returns: 返回目标框+每个目标框对应的类别+每个目标框对应的分数+每个目标对应的分割 boxes: [N, (y1, x1, y2, x2)] Bounding boxes in pixels class_ids: [N] Integer class IDs for each bounding box scores: [N] Float probability scores of the class_id masks: [height, width, num_instances] Instance masks """ # How many detections do we have? # Detections array is padded with zeros. Find the first class_id == 0. # 获取类别为0的索引,因为0是背景 zero_ix = np.where(detections[:, 4] == 0)[0]#这里的[0]是因为np.where的操作结果是放在元组 的第一个元素里面的 获取第一个为0的元素索引 N = zero_ix[0] if zero_ix.shape[0] > 0 else detections.shape[0] #提取第N个元素前的所有元素,也就是那些类别不为0的检测框,这一步可能是网络输出时,detections数 #组是按类别从大到0排列的,所以当取得第一个为0的元素的索引,该索引前面都是类别非0的检测框, #不知道是不是这样,这一步没有深究,c++代码里面可以直接遍历detections数组每个元素判断类别是否 #为0来取舍 # Extract boxes, class_ids, scores, and class-specific masks boxes = detections[:N, :4] class_ids = detections[:N, 4].astype(np.int32) scores = detections[:N, 5] masks = mrcnn_mask[np.arange(N), :, :, class_ids] #进行归一化,因为我是裁剪好图片送进网络的,所以这一步可以不做 # Translate normalized coordinates in the resized image to pixel # coordinates in the original image before resizing window = utils.norm_boxes(window, image_shape[:2]) wy1, wx1, wy2, wx2 = window shift = np.array([wy1, wx1, wy1, wx1]) wh = wy2 - wy1 # window height ww = wx2 - wx1 # window width scale = np.array([wh, ww, wh, ww]) # Convert boxes to normalized coordinates on the window boxes = np.divide(boxes - shift, scale) # Convert boxes to pixel coordinates on the original image boxes = utils.denorm_boxes(boxes, original_image_shape[:2]) # Filter out detections with zero area. Happens in early training when # network weights are still random #从boxes里面找出宽高小于0的索引 exclude_ix = np.where( (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) <= 0)[0] #如果宽高小于0的索引个数不为0,就从boxe里面删除这些宽高小于0的索引 if exclude_ix.shape[0] > 0: boxes = np.delete(boxes, exclude_ix, axis=0) class_ids = np.delete(class_ids, exclude_ix, axis=0) scores = np.delete(scores, exclude_ix, axis=0) masks = np.delete(masks, exclude_ix, axis=0) N = class_ids.shape[0] # #经过上一步的处理已经获取到类别不为0(不是背景)且尺寸不小于0的索引 #下一步计算这些索引对应的mask,因为我只要目标框,所以c++代码中我没有计算mask,想计算mask #可以根据下面的python代码用c++实现相同的效果就行了 # Resize masks to original image size and set boundary threshold. full_masks = [] for i in range(N): # Convert neural network mask to full size mask full_mask = utils.unmold_mask(masks[i], boxes[i], original_image_shape) full_masks.append(full_mask) full_masks = np.stack(full_masks, axis=-1)\ if full_masks else np.empty(masks.shape[1:3] + (0,)) return boxes, class_ids, scores, full_masks
里面两个主要函数是utils.norm_boxes和utils.denorm_boxes ,如下:
def norm_boxes(boxes, shape): """Converts boxes from pixel coordinates to normalized coordinates. boxes: [N, (y1, x1, y2, x2)] in pixel coordinates #boxes数据格式 shape: [..., (height, width)] in pixels #图像尺寸,我这边是固定好的 Note: In pixel coordinates (y2, x2) is outside the box. But in normalized coordinates it's inside the box. Returns: [N, (y1, x1, y2, x2)] in normalized coordinates """ h, w = shape scale = np.array([h - 1, w - 1, h - 1, w - 1]) shift = np.array([0, 0, 1, 1]) return np.divide((boxes - shift), scale).astype(np.float32) def denorm_boxes(boxes, shape): """Converts boxes from normalized coordinates to pixel coordinates. boxes: [N, (y1, x1, y2, x2)] in normalized coordinates #boxes数据格式 shape: [..., (height, width)] in pixels #图像尺寸,我这边是固定好的 Note: In pixel coordinates (y2, x2) is outside the box. But in normalized coordinates it's inside the box. Returns: [N, (y1, x1, y2, x2)] in pixel coordinates """ h, w = shape scale = np.array([h - 1, w - 1, h - 1, w - 1]) shift = np.array([0, 0, 1, 1]) return np.around(np.multiply(boxes, scale) + shift).astype(np.int32)
这一部分c++代码:
struct boxInfo{ int y1,x1,y2,x2; int classId=0; float scores=0.f; int boxNum=-1; }; struct imageDetectInfo{ int imageWidth=0;//not yet int imageHeight=0;//not yet int imageNum=-1; std::vector<boxInfo> detectInfo; }; //std::vector<tensorflow::Tensor>&output_tensors 网络输出最终的结果 //std::vector<imageDetectInfo> &output_vec 用于存储最终的结果,这是本人的格式,可以修改为你自己想要的格式 void detectBatch::unmold_detections(std::vector<tensorflow::Tensor>&output_tensors, std::vector<imageDetectInfo> &output_vec) { //获取网络输出的检测框结果,为output_tensors容器的第一个元素 tensorflow::Tensor &detections_tensor=output_tensors[0]; //获取detections_tensor的eigen型tensor,boxes_tensor和detections_tensor是指向同个内存的 auto boxes_tensor=detections_tensor.tensor<float,3>(); //Extract boxes, class_ids, scores, and class-specific masks //whose classId in not 0 ,because 0 is background //std::cout<<"resized_tensor is "<<resized_tensor.shape()<<std::endl; //std::cout<<"inputAnchorsTensor is "<<inputAnchorsTensor.shape()<<std::endl; //std::cout<<"inputMetadataTensor is "<<inputMetadataTensor.shape()<<std::endl; //std::cout<<"detections_tensor is "<<detections_tensor.shape()<<std::endl; //boxes_tensor和detections_tensor的格式是[N, (y1, x1, y2, x2, class_id, score)] //遍历检测框,首先boxes_tensor的第一个维度是图片数 //遍历一个batch for(int imgNum=0;imgNum<boxes_tensor.dimension(0);imgNum++) { std::vector<Eigen::RowVectorXf> noZeroRow;//声明一个行向量容器存放非背景检测框的数据 //struct imageDetectInfo imageDetectInfotmp ; //遍历第二个维度,即每张图片的检测框的数目 for(int boxNum=0;boxNum<boxes_tensor.dimension(1);boxNum++) { //判断类别是否大于0,大于0表示该检测框不是背景 if (boxes_tensor(imgNum,boxNum,4)>0) { //把检测框的数据放进行向量 Eigen::RowVectorXf eachrow(boxes_tensor.dimension(2)); eachrow<<boxes_tensor(imgNum,boxNum,0), boxes_tensor(imgNum,boxNum,1), boxes_tensor(imgNum,boxNum,2), boxes_tensor(imgNum,boxNum,3), boxes_tensor(imgNum,boxNum,4), boxes_tensor(imgNum,boxNum,5); noZeroRow.push_back(eachrow); } } //创建一个noZeroMat矩阵来存放上面的noZeroRow向量 Eigen::MatrixXf noZeroMat (noZeroRow.size(),6); for(int r=0;r<noZeroRow.size();r++) { noZeroMat.row(r)=noZeroRow[r]; } //上面的noZeroMat矩阵存的不是最终的检测框数据,还要逆归一化,裁剪等操作 Eigen::MatrixXf boxMat(noZeroMat.rows(),4); Eigen::MatrixXf classSoresMat(noZeroMat.rows(),2); //noZeroMat的格式是[(y1, x1, y2, x2, class_id, score),...,...] //获取noZeroMat的(y1, x1, y2, x2)部分, boxMat.block(0,0,boxMat.rows(),4)=noZeroMat.block(0,0,noZeroMat.rows(),4); //获取noZeroMat的(class_id, score)部分, classSoresMat.block(0,0,classSoresMat.rows(),2)=noZeroMat.block(0,4,classSoresMat.rows(),2); //std::cout<<"noZeroMat "<<noZeroMat<<std::endl; //std::cout<<"boxMat "<<boxMat<<std::endl; //get the window in image meta //获取图像的meta数据,这个是之前计算好了的 auto metaTensor=inputMetadataTensor.tensor<float,2>(); '''这一部分是模拟 python的norm_boxes(boxes, shape): //根据显示窗口来获取对应的windowMat和scale_rMat缩放矩阵,这一部分操作其实可以不用,因为图 //像都是裁剪好的,而且我也不用在窗口上显示 Eigen::MatrixXf windowMat(1,4); Eigen::MatrixXf scale_rMat(1,4); windowMat<<metaTensor(0,7),metaTensor(0,8), metaTensor(0,7),metaTensor(0,8); scale_rMat<<metaTensor(0,9)-metaTensor(0,7), metaTensor(0,10)-metaTensor(0,8), metaTensor(0,9)-metaTensor(0,7), metaTensor(0,10)-metaTensor(0,8); //get shiftmat //boxMat=tmpMat.cwiseQuotient(scaleMat);//that is unnecessary //because in my case ,shiftmat is [0,0,0,0],scale is [1,1,1,1] //逆归一化 //denorm_boxes //模拟 python的denorm_boxes(boxes, shape) Eigen::MatrixXf shiftNorm_rMat(1,4);//空的偏移矩阵 Eigen::MatrixXf scaleNorm_rMat(1,4);//空的尺度缩放矩阵 shiftNorm_rMat<<0,0,1,1;//矩阵赋值,其实shiftNorm_rMat现在是一个行向量 scaleNorm_rMat<<metaTensor(0,1)-1,//同上是一个行向量 metaTensor(0,2)-1, metaTensor(0,1)-1, metaTensor(0,2)-1; //在列方向对shiftNorm_rMat和scaleNorm_rMat向量进行复制,行数与boxMat相同 Eigen::MatrixXf shiftNormMat=shiftNorm_rMat.colwise().replicate(boxMat.rows()); Eigen::MatrixXf scaleNormMat=scaleNorm_rMat.colwise().replicate(boxMat.rows()); boxMat=boxMat.cwiseProduct(scaleNormMat);//矩阵点乘 boxMat=boxMat+shiftNormMat; finalboxMat=boxMat;//返回最终的boxMat //std::cout<<"final box mat is "<<finalboxMat<<std::endl; //将最终的数据放进自己的构建的结构化数据里面,可以根据的自己的需要修改 struct imageDetectInfo imageDetectInfoTmp; for(int i=0;i<finalboxMat.rows();i++) { struct boxInfo boxInfoTmp; boxInfoTmp.y1=(int)(finalboxMat(i,0)); boxInfoTmp.x1=(int)(finalboxMat(i,1)); boxInfoTmp.y2=(int)(finalboxMat(i,2)); boxInfoTmp.x2=(int)(finalboxMat(i,3)); boxInfoTmp.classId=(int)(classSoresMat(i,0)); boxInfoTmp.scores=classSoresMat(i,1); boxInfoTmp.boxNum=i; imageDetectInfoTmp.detectInfo.push_back(boxInfoTmp); } imageDetectInfoTmp.imageNum=imgNum; output_vec[imgNum]=imageDetectInfoTmp; //outputsInfo.push_back(imageDetectInfoTmp); } }
至此就完成所有的操作了,放上一波检测的结果(界面为qt做的),模型不是训练得不太完善,主要是
数据增强和网络参数还没完善
-
- 先来看看mask rcnn输入用到的tensor,从parai的Mask_RCNN-master/infere_from_pb.py即下图可以看到主要对应三个输入,分别是img_ph,img_anchors_ph,img_meta_ph这个三个键,它们的值分别是molded_images, image_metas, image_anchors,,,对应的tensorflow::tensor名称分别是input_image_1,input_anchors_1,input_image_meta_1,那么我们就要c++构建这三个tensorflow::tensor,可以根据infere_from_pb.py的def mold_inputs(images) 来查看molded_images, image_metas这两个值如何来的,根据def get_anchors(image_shape, config):函数查看image_anchors如何来的, ,注意以后对tensor的操作加上'tensorflow::'这个域操作符号,因为eigen3里面也有个tensor,防止混淆
至此整个流程算是走完了,
想下载源码的兄弟可以下载这个链接:
没有积分的话可以联系微信号anuntilforever1314 ,加好友时请备注行业+姓名以便交流 -。-与大家一起交流学习。
另外车辆,车牌,反光衣,安全帽等数据集,链接,有兴趣的朋友可以看下
链接:https://pan.baidu.com/s/1mG7X71rngtWqP2tsfFm26A 提取码:5555
代码写的稀烂,论文也不是读得很仔细,如果有大佬们发现有问题的地方请不吝赐教。
总结:虽然mask rcnn是几年前的东西,最近也有好多目标检测,分割的新论文出现,但实践中在多个数据集中mask rcnn效果还是可以的,个人觉得原因之一就是其中密集的anchor,不过未来应该是anchor free的,下一步进行anchor free的研究....
下一步工作: ssd pytorch 转 torchscript 再用 libtorch c++ 调用
补上qt.pro文件
#-------------------------------------------------
#
# Project created by QtCreator 2018-12-18T13:01:09
#
#-------------------------------------------------
QT += core gui
greaterThan(QT_MAJOR_VERSION, 4): QT += widgets
TARGET = codeShow
TEMPLATE = app
qtHaveModule(opengl): QT += opengl
# The following define makes your compiler emit warnings if you use
# any feature of Qt which has been marked as deprecated (the exact warnings
# depend on your compiler). Please consult the documentation of the
# deprecated API in order to know how to port your code away from it.
#DEFINES += QT_DEPRECATED_WARNINGS
# You can also make your code fail to compile if you use deprecated APIs.
# In order to do so, uncomment the following line.
# You can also select to disable deprecated APIs only up to a certain version of Qt.
#DEFINES += QT_DISABLE_DEPRECATED_BEFORE=0x060000 # disables all the APIs deprecated before Qt 6.0.0
DEFINES += COMPILER_MSVC NOMINMAX COMPILER_MSVC QT_DEPRECATED_WARNINGS
CONFIG += c++11 thread
SOURCES += \
main.cpp \
mainwindow.cpp \
detectbatch.cpp
HEADERS += \
mainwindow.h \
data_format.h \
detectbatch.h
FORMS += \
mainwindow.ui
# Default rules for deployment.
#qnx: target.path = /tmp/$${TARGET}/bin
#else: unix:!android: target.path = /opt/$${TARGET}/bin
#!isEmpty(target.path): INSTALLS += target
##cuda
#CUDA_DIR = "E:\thirdParty_lib\cuda\install" # Path to cuda toolkit install
#SYSTEM_NAME = x64 # Depending on your system either 'Win32', 'x64', or 'Win64'
#SYSTEM_TYPE = 64 # '32' or '64', depending on your system
#CUDA_ARCH = compute_61 # Type of CUDA architecture
#CUDA_CODE = sm_61
#NVCC_OPTIONS = --use_fast_math
## include paths
#INCLUDEPATH += "$$CUDA_DIR/include" \
#"D:\software\cuda_install\common\inc"
## library directories
#QMAKE_LIBDIR += "$$CUDA_DIR/lib/x64"
## The following makes sure all path names (which often include spaces) are put between quotation marks
#CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
## Add the necessary libraries
#CUDA_LIB_NAMES += \
#cuda \
#cudart \
#MSVCRT
##CUDA_LIB_NAMES += \
##cublas \
##cublas_device \
##cuda \
##cudadevrt \
##cudart \
##cudart_static \
##cufft \
##cufftw \
##curand \
##cusolver \
##cusparse \
##nppc \
##nppial \
##nppicc \
##nppicom \
##nppidei \
##nppif \
##nppig \
##nppim \
##nppist \
##nppisu \
##nppitc \
##npps \
##nvblas \
##nvcuvid \
##nvgraph \
##nvml \
##nvrtc \
##OpenCL \
##kernel32 \
##user32 \
##gdi32 \
##winspool \
##comdlg32 \
##advapi32 \
##shell32 \
##ole32 \
##oleaut32 \
##uuid \
##odbc32 \
##odbccp32 \
##ucrt \
##MSVCRT
#for(lib, CUDA_LIB_NAMES) {
# CUDA_LIBS += $$lib.lib
#}
#for(lib, CUDA_LIB_NAMES) {
# NVCC_LIBS += -l$$lib
#}
#LIBS += $$NVCC_LIBS
## The following library conflicts with something in Cuda
#QMAKE_LFLAGS_RELEASE = /NODEFAULTLIB:msvcrt.lib
#QMAKE_LFLAGS_DEBUG = /NODEFAULTLIB:msvcrtd.lib
## MSVCRT link option (static or dynamic, it must be the same with your Qt SDK link option)
#MSVCRT_LINK_FLAG_DEBUG = "/MDd"
#MSVCRT_LINK_FLAG_RELEASE = "/MD"
##MSVCRT_LINK_FLAG_DEBUG = "/MTd"
##MSVCRT_LINK_FLAG_RELEASE = "/MT"
## Configuration of the Cuda compiler
#CONFIG(debug, debug|release) {
# # Debug mode
# DESTDIR = debug
# OBJECTS_DIR = debug/obj
# CUDA_OBJECTS_DIR = debug/cuda
# cuda_d.input = CUDA_SOURCES
# cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
# cuda_d.commands = $$CUDA_DIR/bin/nvcc.exe -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$LIBS \
# --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -code=$$CUDA_CODE \
# --compile -cudart static -g -DWIN32 -D_MBCS \
# -Xcompiler "/wd4819,/EHsc,/W3,/nologo,/Od,/Zi,/RTC1" \
# -Xcompiler $$MSVCRT_LINK_FLAG_DEBUG \
# -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
# cuda_d.dependency_type = TYPE_C
# QMAKE_EXTRA_COMPILERS += cuda_d
#}
#else {
# # Release mode
# DESTDIR = release
# OBJECTS_DIR = release/obj
# CUDA_OBJECTS_DIR = release/cuda
# cuda.input = CUDA_SOURCES
# cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
# cuda.commands = $$CUDA_DIR/bin/nvcc.exe $$NVCC_OPTIONS $$CUDA_INC $$LIBS \
# --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -code=$$CUDA_CODE \
# --compile -cudart static -D_MBCS \
# -Xcompiler "/wd4819,/EHsc,/W3,/nologo,/O2,/Zi" \
# -Xcompiler $$MSVCRT_LINK_FLAG_RELEASE \
# -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
# cuda.dependency_type = TYPE_C
# QMAKE_EXTRA_COMPILERS += cuda
#}
LIBS += -LE:\Maidipu\code\tensorflow_1_8_gpu\lib -ltensorflow
INCLUDEPATH +=D:\Code-software\opencv_3_3_0\build\install\include \
D:\Code-software\opencv_3_3_0\build\install\include\opencv2\
D:\Code-software\opencv_3_3_0\build\install\include\opencv \
E:\Maidipu\code\tensorflow_1_8_gpu\include
LIBS += -LD:\Code-software\opencv_3_3_0\build\install\x64\vc12\lib -lopencv_core330 -lopencv_imgproc330 \
-lopencv_imgcodecs330 -lopencv_highgui330