CUDA-PointPillars有时崩溃的原因

15 篇文章 5 订阅
13 篇文章 1 订阅
博客讲述了在使用CUDA-PointPillars时遇到的程序崩溃问题,问题出现在预处理kernel中,由于浮点数转换为整数时的舍入误差导致数组访问越界。作者发现了两个缺陷,一是浮点到整数转换可能导致voxel_idx或voxel_idy越界,二是未检查voxels数量是否超过最大限制。经过修复,崩溃问题得到解决,并已向官方报告问题等待正式修复。
摘要由CSDN通过智能技术生成

集成了CUDA-PointPillars的程序有时突然崩溃,而且崩溃的位置怪异,发生在CUDA的cudaDeviceSynchronize()或cudaStreamsynchronize()或cudaEventSynchronize、cudaEventRecord()之类的API里,反正是平时不大可能发生的地方,凭以往经验直觉,原因应该不是在这些API里,而是因该其他地方有非法访问把内存搞乱了,调查后发现果然是CUDA-PointPillars里的预处理部分代码有漏洞:

__global__ void generateVoxels_random_kernel(float *points, size_t points_size,
        float min_x_range, float max_x_range,
        float min_y_range, float max_y_range,
        float min_z_range, float max_z_range,
        float pillar_x_size, float pillar_y_size, float pillar_z_size,
        int grid_y_size, int grid_x_size,
        unsigned int *mask, float *voxels)
{
  int point_idx = blockIdx.x * blockDim.x + threadIdx.x;
  if(point_idx >= points_size) return;

  float4 point = ((float4*)points)[point_idx];

  if(point.x<min_x_range||point.x>=max_x_range
    || point.y<min_y_range||point.y>=max_y_range
    || point.z<min_z_range||point.z>=max_z_range) return;

  int voxel_idx = floorf((point.x - min_x_range)/pillar_x_size);
  int voxel_idy = floorf((point.y - min_y_range)/pillar_y_size);

  //在GPU上(CPU上表现不同),上面这里floorf()舍入精度非常高(小数点后位数多于5)的数据时可能会导致voxel_idx或voxel_idy的值越界,把float类型数值赋值给int类型变量也会发生舍入+1,从而造成voxel_idx可能等于grid_x_size或者voxel_idy等于grid_y_size,从而造成下面对mask和voxels数组的访问发生越界访问,把内存弄脏而发生崩溃!
  unsigned int voxel_index = voxel_idy * grid_x_size
                            + voxel_idx;

  unsigned int point_id = atomicAdd(&(mask[voxel_index]), 1);

  if(point_id >= POINTS_PER_VOXEL) return;
  float *address = voxels + (voxel_index*POINTS_PER_VOXEL + point_id)*4;
  atomicExch(address+0, point.x);
  atomicExch(address+1, point.y);
  atomicExch(address+2, point.z);
  atomicExch(address+3, point.w);
}
__global__ void generateBaseFeatures_kernel(unsigned int *mask, float *voxels,
        int grid_y_size, int grid_x_size,
        unsigned int *pillar_num,
        float *voxel_features,
        unsigned int *voxel_num,
        unsigned int *voxel_idxs)
{
  unsigned int voxel_idx = blockIdx.x * blockDim.x + threadIdx.x;
  unsigned int voxel_idy = blockIdx.y * blockDim.y + threadIdx.y;

  if(voxel_idx >= grid_x_size ||voxel_idy >= grid_y_size) return;

  unsigned int voxel_index = voxel_idy * grid_x_size
                           + voxel_idx;
  unsigned int count = mask[voxel_index];
  if( !(count>0) ) return;
  count = count<POINTS_PER_VOXEL?count:POINTS_PER_VOXEL;

  unsigned int current_pillarId = 0;
  current_pillarId = atomicAdd(pillar_num, 1);
  //这里没有对current_pillarId做是否达于等于MAX_VOXELS的检查!当点云足够多分布的范围足够大,以至于voxelization得到的voxels大于MAX_VOXELS时,下面对voxel_features数组的访问会发生越界访问!从而把内存弄脏造成崩溃!
  voxel_num[current_pillarId] = count;

  uint4 idx = {0, 0, voxel_idy, voxel_idx};
  ((uint4*)voxel_idxs)[current_pillarId] = idx;

  for (int i=0; i<count; i++){
    int inIndex = voxel_index*POINTS_PER_VOXEL + i;
    int outIndex = current_pillarId*POINTS_PER_VOXEL + i;
    ((float4*)voxel_features)[outIndex] = ((float4*)voxels)[inIndex];
  }

  // clear buffer for next infer
  atomicExch(mask + voxel_index, 0);
}

尝试做了个fix后crash消失了: Fixed two crashes caused by two defects in the preprocessing kernel f… · arnoldfychen/CUDA-PointPillars@2c9c4a1 · GitHub  

报告了问题等待官方正式解决:​​​​​​P​​​​​​​Found two crashes caused by two defects in preprocess_kernels.cu · Issue #79 · NVIDIA-AI-IOT/CUDA-PointPillars · GitHub​​​​​​​

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Arnold-FY-Chen

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值