【转载】yolov5 tensorrt 精度对齐总结

最近用yolo做项目需要用到tensorrt加速。但是测试发现,engine格式文件进行C++ tensorrt推理的结果,与yolo里面用pt推理的结果相差颇大,有的图片结果一样,有的置信度相差大,有的目标在trt没有预测出来。
我的engine是按照 pt --> wts --> engine 方式生成的。


注:我只改了两个地方:C++参数改为跟torch一样、修改BN层eps为1e-3;
我输入的图片都是正方形,所以其它的坐标处理没有改。博主用的是长方形图片,所以改的地方比较多。


下面这位大佬真的牛,感谢这篇文章帮我解决了问题,感激不尽!
博客链接:https://blog.csdn.net/qq_35756383/article/details/126787282



/// 华丽的分割线 /

本文对c++推理的yolov5 v6.1代码进行精度对齐实现,以yolov5-l为例。

yolov5:https://github.com/ultralytics/yolov5

tensorrtx:GitHub - wang-xinyu/tensorrtx: Implementation of popular deep learning networks with TensorRT network definition API

本文代码:yolov5-tenssort: yolov5 v6.1 的tensorrt c++推理精度对齐

实验环境

  • Ubuntu20.04
  • TensorRT-7.2.3.4
  • OpenCV3.4.8(c++)、4.6.0(torch)
  • CUDA11.1
  • RTX3060

tensorrt跑通


   
   
  1. git clone https: / /github.com /wang-xinyu /tensorrtx.git
  2. cd tensorrtx /yolov 5
  3. mkdir build
  4. cd build

修改CMakeLists.txt中cuda和tensorrt的路径,以及opencv的版本:

进行cmake:

cmake ..
   
   

修改网络中对应参数以适应自己数据集要求:

yolov5.cpp:

yololayer.h:

编译,会在build路径下新生成一个libmyplugins.so文件和yolov5文件:

make
   
   

参考README.md文件,在yolov5下将训练得到的权重文件best.pt通过get_wts.py转化为best.wts文件,并放至tenosrrtx/yolov5/build路径下:


   
   
  1. git clone https: //github.com/ultralytics/yolov5
  2. cd yolov5
  3. // 修改gen_wts.py中p28的cpu为gpu:
  4. device = select_device( '0')
  5. cp <path>/tensorrtx/yolov5/gen_wts.py ./
  6. python gen_wts.py -w best.pt -o best.wts
  7. cp best.wts <dir_path>/tensorrtx/yolov5/build/
  8. cd <path>/tensorrtx/yolov5/build

生成engine,会在build路径下生成tensorrt的best.engine模型:

./yolov5 -s best.wts best.engine l
   
   

读取.engine文件,并根据指定路径下的图片来推理:

./yolov5 -d best.engine <imgs_dir>
   
   

将在build路径下生成推理结果,并打印推理时间:

torch与tensorrt精度对比

1. tensorrt推理结果

增加c++的txt输出:


   
   
  1. // -------yolov5.cpp main(~)
  2. std::string out_path;
  3. // cv::putText(~)下方
  4. out_path = "_" + file_names[f - fcount + 1 + b];
  5. write2txt(out_path. replace(out_path. find( "."), 4, ".txt"), std:: to_string(( int)res[j].class_id), std:: to_string(res[j].conf), r);
  6. // -------common.hpp
  7. void write2txt(std::string txtpath, std::string cls, std::string conf, cv::Rect r){
  8. std::ofstream ofs;
  9. ofs. open(txtpath, std::ios::app); // std::ios::app不覆盖
  10. // 对坐标进行处理
  11. int xmin, xmax, ymin, ymax;
  12. xmin = ( int)r.x;
  13. ymin = ( int)r.y;
  14. xmax = ( int)(r.x + r.width);
  15. ymax = ( int)(r.y + r.height);
  16. ofs << cls << " " << conf << " " << xmin << " " << ymin << " " << xmax << " " << ymax << std::endl; //endl用于换行
  17. ofs. close();
  18. }

将c++的参数值修改为与torch一致:


   
   
  1. // yolov5.cpp
  2. #define NMS_THRESH 0.45
  3. #define CONF_THRESH 0.25
  4. // yololayer.h
  5. static constexpr float IGNORE_THRESH = 0.25f;

对图像进行推理,输出结果:

0 0.926520 52 408 214 874
0 0.906347 214 412 321 860
0 0.870304 676 483 810 872
0 0.863786 0 621 63 868
45 0.950376 -50 101 883 817
55 0.904248 1 253 34 327

2. torch推理结果

通过yolov5/detect.py,进行推理输出:

python detect.py --weights best.pt --source bus.jpg --save-txt --save-conf
   
   

结果保存在run/detect/exp/下:

3 0.832716 0.618981 0.0382716 0.0824074 0.291247
0 0.041358 0.687963 0.082716 0.246296 0.602841
0 0.0240741 0.386111 0.045679 0.109259 0.658574
0 0.919136 0.618056 0.159259 0.369444 0.77239
55 0.0209877 0.268056 0.0419753 0.0694444 0.893587
0 0.327778 0.588426 0.166667 0.417593 0.907808
0 0.164815 0.592593 0.196296 0.431481 0.932
45 0.5 0.418519 1 0.681481 0.981999

为了方便对比,修改detect.py保存txt的格式:


   
   
  1. # Write results
  2. for *xyxy, conf, cls in reversed(det):
  3. c = int(cls) # integer class
  4. if save_txt: # Write to file
  5. line = (c, conf, *xyxy) if save_conf else (cls, *xyxy)
  6. with open( f'{txt_path}.txt', 'a') as f:
  7. f.write(( '%s ') % line[ 0])
  8. f.write(( '%g ' * ( len(line) - 1)).rstrip() % line[ 1:] + '\n')
  9. if save_img or save_crop or view_img: # Add bbox to image
  10. label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
  11. annotator.box_label(xyxy, label, color=colors(c, True))
  12. if save_crop:
  13. save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR= True)
3 0.291247 659 624 690 713
0 0.602841 0 610 67 876
0 0.658574 1 358 38 476
0 0.77239 680 468 809 867
55 0.893587 0 252 34 327
0 0.907808 198 410 333 861
0 0.932 54 407 213 873
45 0.981999 0 84 810 820

3. 结果对比

可以发现,对于同一张图片,c++和torch的结果不论是在目标数量上还是在各项数值上均不相同,需要进行排查。

问题排查与解决

1. 图像预处理

根据代码可知,torch使用的是640x*的矩形推理,填充部分为144:


   
   
  1. # utils/augmentations.py
  2. def letterbox( im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
  3. # Resize and pad image while meeting stride-multiple constraints
  4. shape = im.shape[: 2] # current shape [height, width]
  5. if isinstance(new_shape, int):
  6. new_shape = (new_shape, new_shape)
  7. # Scale ratio (new / old)
  8. r = min(new_shape[ 0] / shape[ 0], new_shape[ 1] / shape[ 1])
  9. if not scaleup: # only scale down, do not scale up (for better val mAP)
  10. r = min(r, 1.0)
  11. # Compute padding
  12. ratio = r, r # width, height ratios
  13. new_unpad = int( round(shape[ 1] * r)), int( round(shape[ 0] * r))
  14. dw, dh = new_shape[ 1] - new_unpad[ 0], new_shape[ 0] - new_unpad[ 1] # wh padding
  15. if auto: # minimum rectangle
  16. dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
  17. elif scaleFill: # stretch
  18. dw, dh = 0.0, 0.0
  19. new_unpad = (new_shape[ 1], new_shape[ 0])
  20. ratio = new_shape[ 1] / shape[ 1], new_shape[ 0] / shape[ 0] # width, height ratios
  21. dw /= 2 # divide padding into 2 sides
  22. dh /= 2
  23. if shape[::- 1] != new_unpad: # resize
  24. im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
  25. top, bottom = int( round(dh - 0.1)), int( round(dh + 0.1))
  26. left, right = int( round(dw - 0.1)), int( round(dw + 0.1))
  27. im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
  28. return im, ratio, (dw, dh)

首先将输入图像按照长边rezise至640x*,方式为双线性插值,然后将短边padding到32的最小倍数。

而c++使用640x640的letterbox,填充为128:


   
   
  1. // preprocess.cu
  2. __global__ void warpaffine_kernel(~){
  3. ...
  4. float src_x = m_x1 * dx + m_y1 * dy + m_z1 + 0.5f;
  5. float src_y = m_x2 * dx + m_y2 * dy + m_z2 + 0.5f;
  6. ...
  7. }
  8. void preprocess_kernel_img(~){
  9. ...
  10. warpaffine_kernel<<<blocks, threads, 0, stream>>>(
  11. src, src_width* 3, src_width,
  12. src_height, dst, dst_width,
  13. dst_height, 128, d2s, jobs);
  14. }

鉴于c++修改为动态输入比较复杂,这里只将两者的640x640输入结果进行对齐。

关闭torch的矩形推理:


   
   
  1. # utils/augmentations.py --> letterbox(~)
  2. if auto: # minimum rectangle
  3. pass
  4. # dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding

修改c++的padding为114:


   
   
  1. // preprocess.cu
  2. void preprocess_kernel_img(~){
  3. ...
  4. warpaffine_kernel<<<blocks, threads, 0, stream>>>(
  5. src, src_width* 3, src_width,
  6. src_height, dst, dst_width,
  7. dst_height, 114, d2s, jobs);
  8. }

添加输出两者图片预处理后结果的代码进行查看:


   
   
  1. # utils/datasets.py
  2. class LoadImages:
  3. ...
  4. # Padded resize
  5. img = letterbox(img0, self.img_size, stride=self.stride, auto=self.auto)[ 0]
  6. # 输出从(400,400)位置开始的10x10区域的像素点rgb值
  7. for i in range( 400, 410):
  8. for j in range( 400, 410):
  9. print( "{}, {}, {}; ". format(img[i][j][ 0], img[i][j][ 1], img[i][j][ 2]), end= '')
  10. print()
  11. ...

   
   
  1. // yolov5.cpp
  2. // 图像预处理
  3. preprocess_kernel_img(img_device, img.cols, img.rows, buffer_idx, INPUT_W, INPUT_H, stream);
  4. // 预处理结果存到CPU
  5. float* recvCPU=( float*) malloc(size_image_dst* sizeof( float));
  6. CUDA_CHECK( cudaMemcpy(recvCPU, buffer_idx,size_image_dst* sizeof( float),cudaMemcpyDeviceToHost));
  7. cv::Mat resize_img(INPUT_H,INPUT_W,CV_8UC3);
  8. for ( int i = 0; i < INPUT_H; ++i){
  9. cv::Vec3b *p2 = resize_img. ptr<cv::Vec3b>(i);
  10. for ( int j = 0; j < INPUT_W; ++j){
  11. p2[j][ 2] = round(recvCPU[i*INPUT_W+j]* 255);
  12. p2[j][ 1] = round(recvCPU[INPUT_W*INPUT_H+i*INPUT_W+j]* 255);
  13. p2[j][ 0] = round(recvCPU[ 2*INPUT_W*INPUT_H+i*INPUT_W+j]* 255);
  14. }
  15. }
  16. for ( int i = 400; i < 410; i++) {
  17. uchar *data = resize_img. ptr<uchar>(i); //ptr函数访问任意一行像素的首地址,特别方便图像地一行一行的横向访问
  18. for ( int j = 400* 3; j < 400* 3+ 10* 3; j++) { // //在循环体内,应该避免多次运算,应该提前算cols*channels
  19. std::cout<<( int)data[j]<< ", ";
  20. }
  21. std::cout<< ""<<std::endl;
  22. }

对比两者图片预处理后输出结果:


   
   
  1. # torch
  2. 25, 1, 0; 25, 1, 0; 24, 1, 0; 25, 2, 0; 25, 2, 0; 24, 1, 0; 24, 1, 0; 25, 1, 0; 25, 2, 0; 26, 2, 1;
  3. 26, 0, 0; 27, 0, 3; 26, 1, 2; 25, 2, 0; 24, 1, 0; 24, 1, 0; 24, 1, 0; 27, 2, 0; 26, 0, 0; 26, 0, 0;
  4. 27, 0, 3; 26, 2, 4; 26, 0, 1; 26, 0, 0; 28, 2, 2; 27, 1, 1; 27, 1, 1; 27, 1, 1; 28, 2, 2; 28, 2, 2;
  5. 24, 0, 0; 25, 1, 1; 27, 1, 1; 28, 2, 2; 28, 2, 2; 27, 1, 1; 27, 1, 1; 27, 2, 1; 27, 2, 0; 27, 2, 0;
  6. 23, 2, 0; 23, 2, 0; 24, 1, 1; 25, 1, 1; 26, 2, 2; 25, 1, 1; 25, 2, 0; 24, 1, 0; 25, 2, 0; 26, 3, 1;
  7. 25, 1, 1; 25, 1, 1; 24, 0, 0; 25, 1, 1; 25, 2, 0; 25, 2, 0; 25, 2, 0; 26, 3, 1; 25, 2, 0; 25, 2, 0;
  8. 25, 1, 2; 26, 1, 2; 25, 1, 1; 24, 1, 0; 24, 2, 0; 24, 2, 0; 24, 2, 0; 24, 2, 0; 25, 3, 0; 26, 5, 0;
  9. 24, 0, 0; 25, 1, 2; 23, 1, 0; 23, 2, 0; 23, 2, 0; 23, 2, 0; 23, 2, 0; 24, 4, 2; 24, 4, 0; 24, 4, 0;
  10. 24, 3, 1; 22, 1, 0; 24, 3, 1; 23, 2, 1; 22, 1, 0; 23, 2, 0; 23, 3, 0; 24, 4, 0; 22, 2, 0; 25, 5, 1;
  11. 25, 3, 2; 23, 2, 1; 26, 5, 1; 26, 6, 2; 25, 4, 2; 28, 7, 5; 24, 3, 1; 29, 8, 6; 27, 6, 4; 28, 7, 4;
  12. // c++
  13. 26, 1, 0, 26, 1, 0, 25, 1, 0, 25, 2, 0, 24, 1, 0, 24, 1, 0, 24, 1, 0, 25, 1, 0, 26, 2, 1, 27, 2, 1,
  14. 26, 0, 0, 27, 0, 3, 26, 2, 2, 25, 1, 0, 25, 1, 0, 24, 1, 0, 25, 1, 0, 27, 2, 0, 26, 0, 0, 26, 0, 0,
  15. 27, 0, 3, 26, 2, 4, 26, 0, 0, 26, 0, 0, 28, 2, 2, 27, 1, 1, 28, 2, 2, 27, 1, 1, 28, 2, 2, 28, 2, 2,
  16. 24, 0, 0, 25, 1, 1, 27, 1, 1, 29, 3, 3, 28, 2, 2, 27, 1, 1, 27, 1, 1, 27, 2, 0, 28, 3, 1, 28, 3, 1,
  17. 23, 2, 0, 23, 2, 0, 25, 1, 1, 26, 2, 2, 26, 2, 2, 25, 1, 1, 25, 2, 0, 25, 2, 0, 25, 2, 0, 26, 3, 1,
  18. 25, 1, 1, 25, 1, 1, 24, 0, 0, 25, 2, 1, 25, 2, 0, 25, 2, 0, 25, 2, 0, 26, 3, 1, 25, 2, 0, 25, 3, 0,
  19. 25, 1, 2, 26, 0, 2, 25, 1, 1, 24, 2, 0, 24, 2, 0, 24, 2, 0, 24, 2, 0, 24, 2, 0, 25, 4, 0, 26, 5, 0,
  20. 24, 1, 0, 24, 1, 2, 23, 2, 0, 23, 2, 0, 23, 2, 0, 23, 2, 0, 23, 2, 0, 24, 4, 1, 24, 4, 0, 24, 4, 0,
  21. 24, 3, 1, 22, 1, 0, 25, 4, 2, 23, 2, 0, 22, 1, 0, 23, 2, 0, 23, 3, 0, 24, 4, 0, 22, 2, 0, 26, 6, 1,
  22. 24, 3, 2, 23, 2, 1, 26, 6, 1, 26, 5, 2, 25, 4, 2, 29, 8, 6, 24, 3, 1, 30, 9, 7, 28, 7, 5, 29, 8, 6,

结果值仍不相同。

根据:一篇文章为你讲透双线性插值 - 知乎 可知,几何中心点重合对应公式:

因此对c++中双线性插值实现进行修改:


   
   
  1. // preprocess.cu
  2. __global__ void warpaffine_kernel(~){
  3. ...
  4. // float src_x = m_x1 * dx + m_y1 * dy + m_z1 + 0.5f;
  5. // float src_y = m_x2 * dx + m_y2 * dy + m_z2 + 0.5f;
  6. // 目标图像上的点对应于原图上的点的坐标
  7. float src_x = m_x1 * (dx+ 0.5f) + m_y1 * (dy+ 0.5f) + m_z1 - 0.5f;
  8. float src_y = m_x2 * (dx+ 0.5f) + m_y2 * (dy+ 0.5f) + m_z2 - 0.5f;
  9. ...
  10. }

对比两者图片预处理后输出结果:


   
   
  1. # torch
  2. 25, 1, 0; 25, 1, 0; 24, 1, 0; 25, 2, 0; 25, 2, 0; 24, 1, 0; 24, 1, 0; 25, 1, 0; 25, 2, 0; 26, 2, 1;
  3. 26, 0, 0; 27, 0, 3; 26, 1, 2; 25, 2, 0; 24, 1, 0; 24, 1, 0; 24, 1, 0; 27, 2, 0; 26, 0, 0; 26, 0, 0;
  4. 27, 0, 3; 26, 2, 4; 26, 0, 1; 26, 0, 0; 28, 2, 2; 27, 1, 1; 27, 1, 1; 27, 1, 1; 28, 2, 2; 28, 2, 2;
  5. 24, 0, 0; 25, 1, 1; 27, 1, 1; 28, 2, 2; 28, 2, 2; 27, 1, 1; 27, 1, 1; 27, 2, 1; 27, 2, 0; 27, 2, 0;
  6. 23, 2, 0; 23, 2, 0; 24, 1, 1; 25, 1, 1; 26, 2, 2; 25, 1, 1; 25, 2, 0; 24, 1, 0; 25, 2, 0; 26, 3, 1;
  7. 25, 1, 1; 25, 1, 1; 24, 0, 0; 25, 1, 1; 25, 2, 0; 25, 2, 0; 25, 2, 0; 26, 3, 1; 25, 2, 0; 25, 2, 0;
  8. 25, 1, 2; 26, 1, 2; 25, 1, 1; 24, 1, 0; 24, 2, 0; 24, 2, 0; 24, 2, 0; 24, 2, 0; 25, 3, 0; 26, 5, 0;
  9. 24, 0, 0; 25, 1, 2; 23, 1, 0; 23, 2, 0; 23, 2, 0; 23, 2, 0; 23, 2, 0; 24, 4, 2; 24, 4, 0; 24, 4, 0;
  10. 24, 3, 1; 22, 1, 0; 24, 3, 1; 23, 2, 1; 22, 1, 0; 23, 2, 0; 23, 3, 0; 24, 4, 0; 22, 2, 0; 25, 5, 1;
  11. 25, 3, 2; 23, 2, 1; 26, 5, 1; 26, 6, 2; 25, 4, 2; 28, 7, 5; 24, 3, 1; 29, 8, 6; 27, 6, 4; 28, 7, 4;
  12. // c++
  13. 25, 1, 0, 25, 1, 0, 25, 1, 0, 25, 2, 0, 25, 2, 0, 24, 1, 0, 24, 1, 0, 25, 1, 0, 26, 2, 0, 26, 2, 1,
  14. 26, 0, 0, 27, 0, 3, 26, 1, 2, 25, 2, 0, 24, 1, 0, 24, 1, 0, 25, 1, 0, 27, 2, 0, 26, 0, 0, 26, 0, 0,
  15. 27, 0, 3, 26, 2, 4, 26, 0, 1, 26, 0, 0, 28, 2, 2, 27, 1, 1, 27, 1, 1, 27, 1, 1, 28, 2, 2, 28, 2, 2,
  16. 24, 0, 0, 25, 1, 1, 27, 1, 1, 28, 2, 2, 28, 2, 2, 27, 1, 1, 27, 1, 1, 27, 2, 1, 27, 2, 0, 27, 2, 0,
  17. 23, 2, 0, 23, 2, 0, 24, 1, 1, 25, 1, 1, 26, 2, 2, 25, 1, 1, 25, 2, 0, 24, 1, 0, 25, 2, 0, 26, 3, 1,
  18. 25, 1, 1, 25, 1, 1, 24, 0, 0, 25, 1, 1, 25, 2, 0, 25, 2, 0, 25, 2, 0, 26, 3, 1, 25, 2, 0, 25, 2, 0,
  19. 25, 1, 2, 26, 1, 2, 25, 1, 1, 24, 2, 0, 24, 2, 0, 24, 2, 0, 24, 2, 0, 24, 2, 0, 25, 3, 1, 26, 5, 0,
  20. 24, 1, 0, 25, 1, 3, 23, 2, 0, 23, 2, 0, 23, 2, 0, 23, 2, 0, 23, 2, 0, 25, 4, 2, 24, 4, 0, 24, 4, 0,
  21. 24, 3, 1, 22, 1, 0, 24, 3, 1, 23, 2, 1, 22, 1, 0, 23, 2, 0, 23, 3, 0, 24, 4, 0, 23, 3, 0, 25, 5, 1,
  22. 25, 4, 2, 23, 2, 1, 26, 5, 1, 26, 6, 2, 25, 4, 2, 28, 7, 5, 24, 3, 2, 29, 8, 6, 27, 6, 4, 28, 7, 5,

结果基本相同,仍有些微不同,至此图像预处理结果对齐完成。

2. 网络结构

对比torch和c++两者的网络结构实现,无异常。关注BN层的参数,torch中为默认参数:


   
   
  1. # models/commom.py
  2. self.bn = nn.BatchNorm2d(c2)

其中eps为1e-5.

c++的BN层eps为1e-3:


   
   
  1. // common.hpp
  2. IScaleLayer* bn = addBatchNorm2d(network, weightMap, *cat-> getOutput( 0), lname + ".bn", 1e-3);

进行相应修改。

3. 网络输出后处理

torch:


   
   
  1. # utils/general.py
  2. def non_max_suppression( prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=300):
  3. """Runs Non-Maximum Suppression (NMS) on inference results
  4. Returns:
  5. list of detections, on (n,6) tensor per image [xyxy, conf, cls]
  6. """
  7. nc = prediction.shape[ 2] - 5 # number of classes
  8. xc = prediction[..., 4] > conf_thres #obj_conf>conf_thres
  9. # Checks
  10. assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
  11. assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
  12. # Settings
  13. min_wh, max_wh = 2, 7680 # (pixels) minimum and maximum box width and height
  14. max_nms = 30000 # maximum number of boxes into torchvision.ops.nms()
  15. time_limit = 10.0 # seconds to quit after
  16. redundant = True # require redundant detections
  17. multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
  18. merge = False # use merge-NMS
  19. t = time.time()
  20. output = [torch.zeros(( 0, 6), device=prediction.device)] * prediction.shape[ 0]
  21. for xi, x in enumerate(prediction): # image index, image inference
  22. # Apply constraints
  23. x[((x[..., 2: 4] < min_wh) | (x[..., 2: 4] > max_wh)). any( 1), 4] = 0 # width-height
  24. x = x[xc[xi]] # confidence
  25. # Cat apriori labels if autolabelling
  26. if labels and len(labels[xi]):
  27. lb = labels[xi]
  28. v = torch.zeros(( len(lb), nc + 5), device=x.device)
  29. v[:, : 4] = lb[:, 1: 5] # box
  30. v[:, 4] = 1.0 # conf
  31. v[ range( len(lb)), lb[:, 0].long() + 5] = 1.0 # cls
  32. x = torch.cat((x, v), 0)
  33. # If none remain process next image
  34. if not x.shape[ 0]:
  35. continue
  36. # Compute conf
  37. x[:, 5:] *= x[:, 4: 5] # conf = obj_conf * cls_conf
  38. # Box (center x, center y, width, height) to (x1, y1, x2, y2)
  39. box = xywh2xyxy(x[:, : 4])
  40. # Detections matrix nx6 (xyxy, conf, cls)
  41. if multi_label:
  42. i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple= False).T
  43. x = torch.cat((box[i], x[i, j + 5, None], j[:, None]. float()), 1)
  44. else: # best class only
  45. conf, j = x[:, 5:]. max( 1, keepdim= True)
  46. x = torch.cat((box, conf, j. float()), 1)[conf.view(- 1) > conf_thres] # conf>conf_thres
  47. # Filter by class
  48. if classes is not None:
  49. x = x[(x[:, 5: 6] == torch.tensor(classes, device=x.device)). any( 1)]
  50. # Apply finite constraint
  51. # if not torch.isfinite(x).all():
  52. # x = x[torch.isfinite(x).all(1)]
  53. # Check shape
  54. n = x.shape[ 0] # number of boxes
  55. if not n: # no boxes
  56. continue
  57. elif n > max_nms: # excess boxes
  58. x = x[x[:, 4].argsort(descending= True)[:max_nms]] # sort by confidence
  59. ...

网络输出的最大框数量不超过max_nms=30000个,且每个框的obj_conf都要大于conf_thres=0.25,总的conf(=obj_conf*cls_conf)也要大于conf_thres。

c++:


   
   
  1. // yololayer.cu
  2. __global__ void CalDetection(~){
  3. ...
  4. for ( int k = 0; k < CHECK_COUNT; ++k) {
  5. ...
  6. if (box_prob < IGNORE_THRESH) continue;
  7. ...
  8. int count = ( int) atomicAdd(res_count, 1);
  9. if (count >= maxoutobject) return;
  10. ...
  11. }
  12. ...
  13. }

只有obj_conf,没有对总conf进行校对,添加:


   
   
  1. // yololayer.cu
  2. __global__ void CalDetection(~){
  3. ...
  4. float max_cls_prob = 0.0;
  5. for ...
  6. if (box_prob * max_cls_prob < IGNORE_THRESH) continue; // conf < thres
  7. ...
  8. }

4. nms后处理

torch:


   
   
  1. # utils/general.py
  2. def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=300):
  3. ...
  4. # Batched NMS
  5. c = x[:, 5: 6] * ( 0 if agnostic else max_wh) # classes
  6. boxes, scores = x[:, : 4] + c, x[:, 4] # boxes (offset by class), scores
  7. i = torchvision.ops. nms(boxes, scores, iou_thres) # NMS
  8. if i.shape[ 0] > max_det: # limit detections
  9. i = i[:max_det]
  10. if merge and ( 1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
  11. # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
  12. iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
  13. weights = iou * scores[None] # box weights
  14. x[i, : 4] = torch. mm(weights, x[:, : 4]). float() / weights. sum( 1, keepdim=True) # merged boxes
  15. if redundant:
  16. i = i[iou. sum( 1) > 1] # require redundancy
  17. output[xi] = x[i]
  18. ...

nms后,如果超过max_det=1000个框,则只保存conf从高到低的前1000个框。

c++,增加对输出数量的校对:


   
   
  1. // commom.hpp
  2. void nms(~){
  3. ...
  4. for ( auto it = m. begin(); it != m. end(); it++) {
  5. ...
  6. // 只保存conf前1000个结果
  7. std:: sort(res. begin(), res. end(), cmp);
  8. if(res. size()>Yolo::MAX_OUTPUT_BBOX_COUNT){
  9. res. erase(res. begin()+Yolo::MAX_OUTPUT_BBOX_COUNT, res. end());
  10. }
  11. }
  12. }

5. 坐标转换后处理

torch:


   
   
  1. # utils/general.py
  2. def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
  3. # Rescale coords (xyxy) from img1_shape to img0_shape
  4. if ratio_pad is None: # calculate from img0_shape
  5. gain = min(img1_shape[ 0] / img0_shape[ 0], img1_shape[ 1] / img0_shape[ 1]) # gain = old / new
  6. pad = (img1_shape[ 1] - img0_shape[ 1] * gain) / 2, (img1_shape[ 0] - img0_shape[ 0] * gain) / 2 # wh padding
  7. else:
  8. gain = ratio_pad[ 0][ 0]
  9. pad = ratio_pad[ 1]
  10. coords[:, [ 0, 2]] -= pad[ 0] # x padding
  11. coords[:, [ 1, 3]] -= pad[ 1] # y padding
  12. coords[:, : 4] /= gain
  13. clip_coords(coords, img0_shape)
  14. return coords
  15. def clip_coords(boxes, shape):
  16. # Clip bounding xyxy bounding boxes to image shape (height, width)
  17. if isinstance(boxes, torch.Tensor): # faster individually
  18. boxes[:, 0]. clamp_( 0, shape[ 1]) # x1
  19. boxes[:, 1]. clamp_( 0, shape[ 0]) # y1
  20. boxes[:, 2]. clamp_( 0, shape[ 1]) # x2
  21. boxes[:, 3]. clamp_( 0, shape[ 0]) # y2
  22. else: # np.array (faster grouped)
  23. boxes[:, [ 0, 2]] = boxes[:, [ 0, 2]]. clip( 0, shape[ 1]) # x1, x2
  24. boxes[:, [ 1, 3]] = boxes[:, [ 1, 3]]. clip( 0, shape[ 0]) # y1, y2

c++:


   
   
  1. // common.hpp
  2. cv::Rect get_rect(cv::Mat& img, float bbox[4]) {
  3. float l, r, t, b;
  4. float r_w = Yolo::INPUT_W / (img.cols * 1.0);
  5. float r_h = Yolo::INPUT_H / (img.rows * 1.0);
  6. if (r_h > r_w) {
  7. l = bbox[ 0] - bbox[ 2] / 2.f;
  8. r = bbox[ 0] + bbox[ 2] / 2.f;
  9. t = bbox[ 1] - bbox[ 3] / 2.f - (Yolo::INPUT_H - r_w * img.rows) / 2;
  10. b = bbox[ 1] + bbox[ 3] / 2.f - (Yolo::INPUT_H - r_w * img.rows) / 2;
  11. l = l / r_w;
  12. r = r / r_w;
  13. t = t / r_w;
  14. b = b / r_w;
  15. } else {
  16. l = bbox[ 0] - bbox[ 2] / 2.f - (Yolo::INPUT_W - r_h * img.cols) / 2;
  17. r = bbox[ 0] + bbox[ 2] / 2.f - (Yolo::INPUT_W - r_h * img.cols) / 2;
  18. t = bbox[ 1] - bbox[ 3] / 2.f;
  19. b = bbox[ 1] + bbox[ 3] / 2.f;
  20. l = l / r_h;
  21. r = r / r_h;
  22. t = t / r_h;
  23. b = b / r_h;
  24. }
  25. return cv:: Rect( round(l), round(t), round(r - l), round(b - t));
  26. }

转换的方法有些微不同,且没有对坐标的越界进行判断。

修改后:


   
   
  1. // common.hpp
  2. float clip_coords(float x, int xmin, int xmax) {
  3. if (x < xmin) {
  4. x = xmin;
  5. }
  6. if (x > xmax ){
  7. x = xmax;
  8. }
  9. return x;
  10. }
  11. // yolov5/utils/general.py xywh2xyxy(~) and scale_coords(~)
  12. cv::Rect get_rect(cv::Mat& img, float bbox[4]) {
  13. // xc,yc,w,h --> xmin,ymin,xmax,ymax
  14. float l, r, t, b;
  15. l = bbox[ 0] - bbox[ 2] / 2.f;
  16. r = bbox[ 0] + bbox[ 2] / 2.f;
  17. t = bbox[ 1] - bbox[ 3] / 2.f;
  18. b = bbox[ 1] + bbox[ 3] / 2.f;
  19. // Rescale coords (xyxy) from dst shape(640x640) to src shape
  20. float pad[ 2];
  21. float gain = std:: min( ( float)Yolo::INPUT_W / ( float)img.cols, ( float)Yolo::INPUT_H / ( float)img.rows);
  22. pad[ 0] = (Yolo::INPUT_W - img.cols * gain)/ 2;
  23. pad[ 1] = (Yolo::INPUT_H - img.rows * gain)/ 2;
  24. l = ( l - pad[ 0] ) / gain; // x padding
  25. r = ( r - pad[ 0] ) / gain;
  26. t = ( t - pad[ 1] ) / gain; // y padding
  27. b = ( b - pad[ 1] ) / gain;
  28. // 越界
  29. l = clip_coords(l, 0, img.cols);
  30. r = clip_coords(r, 0, img.cols);
  31. t = clip_coords(t, 0, img.rows);
  32. b = clip_coords(b, 0, img.rows);
  33. // xmin,ymin,xmax,ymax --> xmin, ymin, w, h
  34. return cv:: Rect( round(l), round(t), round(r - l), round(b - t));
  35. }

6. 结果比对

c++:
45 0.973848 0 126 810 797
0 0.931803 50 408 215 875
55 0.923260 0 254 33 328
0 0.922524 215 412 323 863
0 0.917015 677 485 810 871
0 0.883489 0 622 64 868
3 0.594060 119 768 156 816
torch:
3 0.253175 120 767 158 815
0 0.859743 677 484 810 872
0 0.863529 1 620 63 868
55 0.906701 0 254 33 328
0 0.907883 214 412 321 861
0 0.922536 52 408 214 876
45 0.962665 0 106 810 813

预测目标个数相同,坐标值基本对应上了,虽然置信度有所不同,但c++普遍比torch高。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值