PolarNet介绍
github工程代码:https://github.com/edwardzhou130/PolarSeg
点云语义分割在自动驾驶领域的感知模块占据重要地位, 从多年前基于传统的点云聚类和分割,到近些年基于深度学习的点云语义分割方法, 技术逐渐成熟已经进入实时端到端的阶段. 前有基于球面投影映射出二维深度图的SqueezeSeg,后又百度的Apollo和Autoware中开源的基于鸟瞰图二维网格的cnn_seg. 本文将介绍一种较新的点云语义分割方法: PolarNet,以及它的升级版点云全景语义分割方法:Panoptic-PolarNet.
从整体结构来看,PolarNet和多数点云语义分割网络结构一样, 针对点云数据的特点: 空间分布稀疏性,无序性,和虽是三维数据却无法直接应用三维卷积(计算量剧增)等问题, 先将三维点云的鸟瞰图映射为二维图像,然后使用编码器-解码器结构的全卷积网络进行逐像素的分类. 这里PolarNet使用U-Net网络的编码阶段来学习不同尺度的特征,使用解码器阶段完成每个像素点的分类.
关于U-Net网络及其升级版U-Net++,U-Net3+, 其主体网络都是基于FCN网络, 只是在垮层级间的深浅特征融合方式上做了不同的调整. 使网络可以提取和学习到更为细节的图像信息,语义分割效果也更佳.
所以, PolarNet主要思想重点就放在如何将3维点云的鸟瞰图映射为二维图像上. 同样的思想, SqueezeSeg的实现方法是通过将3维点云经球面投射映射为一个二维的深度图,cnn_seg的实现方法是将3维点云鸟瞰视图向地面投影映射为直角坐标系网格的二维网格.
PolarNet的基于极坐标的鸟瞰图(Bird Eye View, BEV)投影:
PolarNet对BEV投射的采用了极坐标系来分割二维投影映射(这让我想起了有一种基于传统方法的点云地面分割算法,也采用了类似的方法,在<<二十五. 智能驾驶之基于点云分割和聚类的障碍物检测>>做了介绍). 这有别于其他算法对BEV投影以笛卡尔坐标系来分割二维投影映射(二维网格).如下图所示:
(以笛卡尔坐标系分割二维鸟瞰图)
(以极坐标系分割二维鸟瞰图)
如上图,PolarNet采用了极坐标系替代笛卡尔坐标系,使用极角θ和极径ρ表示的有序对(ρ,θ)来代表点云的投影分区,按照极坐标系网格对平面进行划分,这种做法能够进一步使得点在网格间均匀分布,远距离的网格也能保留更多的点。特别是对地面,可以保证同一条激光扫描线落在同一个分区内.
基于极坐标系分割鸟瞰图BEV的过程(图片来源原伦文供图):
PolarNet训练了一个简单的PointNet(注:此PointNet不是语义分割网络PointNet)神经网络用于极坐标网格内的特征提取. 如上图, 对点云的处理大致分以下几步:
一. 将一帧点云从鸟瞰图BEV视角投射到地面,映射为一个二维平面映射图. 并以车身载体坐标系原点为中心点,建立极坐标系.
二. 在极坐标系中,按照预设的角度偏移θ为step,将0-360度的极坐标系平分为360/θ=N份扇面,类似于磁盘单个盘面的扇区感念, 同时在以原点为中心的径向方向,按照预设的半径偏移ρ,向外等分出一系列同心圆, 类似于磁盘单个盘面的柱面. 此时,由不同的扇区和同心圆,就将整个有效极坐标系内的点云,分成若干个不同的扇形网格, 类似于磁盘单盘面的一个扇区.
三. 使用代码内置的一个PointNet对每一个扇形网格进行学习得到一个固定长度的特征表示.
四. 因为每一个同心圆环是有若干个扇形网格环绕一周连接而成,即它们是首尾连接的. 为了学习到的特征完整不间断,把学习到的每一个同心圆环内的扇形网格的特征也按原顺序首尾拼接,形成一个环形特征表示.
五. 将学习到的环形特征表示输入一个Ring CNN的神经网络中学习其整体特征.
Ring CNN就是基于环卷积的卷积神经网络,这也是PolarNet的独到之处. 普通的卷积网络输入的特征是个平面二维数据. Ring CNN这里输入的是将平面首尾连接的特征, 使用卷积核循环一边卷积操作,以保证不遗漏极坐标系下同心环的连接处的特征(详细参见其官方源码).
极坐标划分
点云有越远越稀疏的特性,相对于传统的BEV视角,点云分割更加的均匀。
1、极化鸟瞰图
基于LiDAR扫描俯视图出现的环形结构,作者展示了图3所示的Polar分区,取代了图3中的笛卡尔分区。具体地,首先以传感器的位置为原点,计算XY平面上每个点的方位角和半径,而不是对笛卡尔坐标系中的点进行量化。然后将点云分配给根据量化方位角和半径确定的网格单元。
极化BEV有两个好处,首先,它可以更平均地分配点。通过统计SemanticKITTI数据集拆分的验证集,发现每个极点栅格像元靠近传感器时的点数远小于笛卡尔BEV中的点数。因而,用于密集区域的网格的表示更精细。在相同数量的网格单元中,传统的BEV网格单元平均为0.7±3.2点,而极性BEV网格单元平均为0.7±1.4点。标准偏差之间的差异表明,总体而言,这些点在极地BEV网格上分布更均匀。
极化BEV的第二个好处是,更平衡的点分布减轻了预测变量的负担。由于将2D网络输出重塑为体素以进行点预测,因此不可避免地,某些具有不同真实值标签的点将分配给同一体素。而且其中有些无论如何都会被错误分类。使用笛卡尔BEV,每个网格单元中平均98.75%的点共享相同的标签。在极化BEV中,这一数字跃升至99.3%。这表明由于空间表示特性,极化BEV中的点较少遭受错误分类。考虑到小物体更有可能被体素中的多数标签所淹没,这种0.6%的差异可能会对最终的mIoU产生更深远的影响。研究mIoU的上限表明,笛卡尔BEV的mIoU达到97.3%。极化BEV达到98.5%。极化BEV的较高上限可能会提高下游模型的性能。
2、环形卷积
无需随意为每个网格手工绘制特征,而是使用固定长度的表示形式捕获每个网格中的点分布。它是由可学习的简化PointNet [22] h和最大池化产生的。该网络仅包含完全连接的层,批处理规范化和ReLu层。扫描中第i,第j网格单元中的特征为:
其中w和l是量化大小。px和py是地图中点p的位置。注意,位置和量化大小可以是极坐标或笛卡尔坐标。我们不对沿z轴的输入点云进行量化。类似于point pillar,学习到的表示表示网格的整个垂直列。
如果表示是在极坐标系中学习的,则特征矩阵的两侧将在物理空间中沿方位轴连接,如图2所示。作者开发了离散卷积,称为环形卷积。假设矩阵在半径轴的两端相连,则环形卷积核将对矩阵进行卷积。同时,位于相反一侧的梯度可以通过该环形卷积核传播回另一侧。通过在2D网络中将常规卷积替换为环形卷积,该网络将能够端到端处理极坐标网格,而不会忽略其连通性。这为模型提供了扩展的应用范围。由于它是一个2D神经网络,因此最终的预测也将是一个极坐标网格,其特征维等于量化的高度通道和类数的乘积。然后,可以将预测重塑为4D矩阵,以得出基于体素的分割损失。将卷积替换为环形卷积,则大多数CNN在技术上都可以处理极坐标网格。作者将具有环形卷积的网络称为经过训练以处理极化网格的环CNN。
自己实际项目背景
点云分割功能是为了从激光雷达所采集的原始点云中,提取出感兴趣的类别,比如道路面、建筑物、树木和电线杆,加上背景一共5类。
算法模型:polarseg语义分割模型。(根据自身的数据特点,我这里没有使用作者的polar坐标系,而是使用voxel体素切割的grid ceil,可以叫他voxelseg)
输入数据:单线激光器采集的点云数据,单线激光器在车上倾斜45度角放置,保存成las点云文件,每个las文件有96w个点。
输出数据:las文件中每个点赋值一个类别,将点云分类的结果保存到指定目录下。
代码解读:
整个算法中,class ptBEVnet中的fea_compre = compression_model非常重要,设置不好很容易影响算法精度。
dataloader数据输出解读:(以激光雷达坐标系为基础)
(voxel_position,processed_label,grid_ind,labels,return_fea)
voxel_position: shape = (3,360,256,64)存放栅格x,y,z坐标。(相对于激光雷达坐标系得真实坐标)
processed_label: shape = (360,256,64)网格类别
grid_ind: shape = (960000,3)点在网格的index
labels: shape = (960000,1)每个点的label
return_fea: shape = (960000,7/9)7:(xyz差分、xyz、instensity)
代码修改1:加上如下if判断,标注人员不仔细,会将很多散点标注进去,这里判断体素voxel内,点数小于4个,就判断此体素voxel的标签为背景。在class voxel_dataset类中,修改如下:(两个方案,任选其一)
代码如下:
# #去除标注散点。
#方案一,推荐用方案一
interval_temp0 = 4
#增大体素范围到interval_temp0倍数
grid_ind_temp = (np.floor((np.clip(xyz,min_bound,max_bound)-min_bound) / (intervals * interval_temp0))).astype(np.int)
unq, unq_cnt = torch.unique(torch.from_numpy(grid_ind_temp), return_counts=True, dim=0)
threshold_value = 3
for i,count in enumerate(unq_cnt):
#判断当前grid ceil点数
if ( count != 0 and count < threshold_value):
cur_index = unq[i].tolist()
splashes_statu = True #是否为散点判定
for j,data in enumerate(unq):
interval_temp = 1
#判断周围grid ceil点数
for index_x in range(-interval_temp,interval_temp+1):
if splashes_statu == False:
break
for index_y in range(-interval_temp,interval_temp+1):
if splashes_statu == False:
break
for index_z in range(-interval_temp,interval_temp+1):
if np.array_equal(data.numpy(), np.array([cur_index[0]+index_x,cur_index[1]+index_y,cur_index[2]+index_z])):
if unq_cnt[j] >= threshold_value:
splashes_statu = False
break
if splashes_statu == True:
for index_x in range(1,interval_temp0+1):
for index_y in range(1,interval_temp0+1):
for index_z in range(1,interval_temp0+1):
processed_label[cur_index[0]*interval_temp0 + index_x, cur_index[1]*interval_temp0 + index_y, cur_index[2]*interval_temp0 + index_z] = 0
'''
#方案二
unq, unq_cnt = torch.unique(torch.from_numpy(grid_ind), return_counts=True, dim=0)
threshold_value = 1
for i,count in enumerate(unq_cnt):
if ( count != 0 and count < 2): #count == 1
cur_index = unq[i].tolist()
splashes_statu = True
for j,data in enumerate(unq):
interval_temp = 2
for index_x in range(-interval_temp,interval_temp+1):
if splashes_statu == False:
break
for index_y in range(-interval_temp,interval_temp+1):
if splashes_statu == False:
break
for index_z in range(-interval_temp,interval_temp+1):
if np.array_equal(data.numpy(), np.array([cur_index[0]+index_x,cur_index[1]+index_y,cur_index[2]+index_z])):
if unq_cnt[j] >= threshold_value:
splashes_statu = False
break
if splashes_statu == True:
processed_label[cur_index[0], cur_index[1], cur_index[2]] = 0
'''
代码修改2:在class voxel_dataset(data.Dataset)类中,grid_ind = (np.floor((np.clip(xyz,min_bound,max_bound)-min_bound)/intervals)).astype(np.int)太暴力了,直接把超过规定范围(max_volume_space = [120,80,94], min_volume_space = [-120,-80,-6])的点对应的体素取边缘的体素位置,这里我将超过规定范围的点去丢掉,不放到算法网络中运算,全部赋值为背景。
代码修改3:由于我已经将点云las中点的形式:大地坐标,变换到车辆坐标系下的相对坐标(就是IMU坐标系:x代表右侧;y代表车头前进方向;z代表车顶向上,指向天空;右手坐标系),坐标系的原点在车辆轨迹的中点处,几乎等同处于中心位置。
1.所以在class voxel_dataset(data.Dataset)类中,自己数据都是长方体的,是统一的。所以,对点云做数据增强时,不希望变化特别大,这事为了缩小应用场景,让模型专心学习自己的特定场景。
2.根据自身的数据特点,数据增强时,去掉z方向的镜像。
3.在class voxel_dataset(data.Dataset)类中,根据自身的数据特点,做平移数据增强,xyz偏移范围在[-1,1]之间。如下图:
代码修改4:修改class BEV_Unet中的网络参数,因为我的grid_size[2]=256比较大,又加上想要缩小网络大小和计算量,新建UNet_zs类代替UNet类,修改网络代码如下:
class UNet_zs(nn.Module):
def __init__(self, n_class,n_height,dilation,group_conv,input_batch_norm, dropout,circular_padding,dropblock):
super(UNet_zs, self).__init__()
self.inc = inconv(n_height, 128, dilation, input_batch_norm, circular_padding)
# self.down1 = down(64, 128, dilation, group_conv, circular_padding)
self.down1 = down(128, 256, dilation, group_conv, circular_padding)
self.down2 = down(256, 512, dilation, group_conv, circular_padding)
self.down3 = down(512, 512, dilation, group_conv, circular_padding)
# self.up1 = up(1024, 256, circular_padding, group_conv = group_conv, use_dropblock=dropblock, drop_p=dropout)
# self.up2 = up(512, 128, circular_padding, group_conv = group_conv, use_dropblock=dropblock, drop_p=dropout)
# self.up3 = up(256, 128, circular_padding, group_conv = group_conv, use_dropblock=dropblock, drop_p=dropout)
# self.dropout = nn.Dropout(p=0. if dropblock else dropout)
# self.outc = outconv(128, n_class)
self.up1 = up(1024, 256, circular_padding, group_conv = group_conv, use_dropblock=dropblock, drop_p=dropout)
self.up2 = up(512, 384, circular_padding, group_conv = group_conv, use_dropblock=dropblock, drop_p=dropout)
self.up3 = up(512, 256, circular_padding, group_conv = group_conv, use_dropblock=dropblock, drop_p=dropout)
self.dropout = nn.Dropout(p=0. if dropblock else dropout)
self.outc = outconv(256, n_class)
def forward(self, x):
x1 = self.inc(x)
x2 = self.down1(x1)
x3 = self.down2(x2)
x4 = self.down3(x3)
x = self.up1(x4, x3)
x = self.up2(x, x2)
x = self.up3(x, x1)
x = self.outc(self.dropout(x))
return x
自己的测试程序
项目背景:单线激光倾斜45度安装,每扫描96w个点,保存一个las文件,对这个las做分割,类别要特别注意:背景也要算做一类。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import time
import argparse
import sys
import numpy as np
import torch
import torch.optim as optim
from tqdm import tqdm
import laspy
import math
from network.BEV_Unet import BEV_Unet
from network.ptBEV import ptBEVnet
from dataloader.dataset import collate_fn_BEV,collate_fn_BEV_test,SemKITTI_zs,SemKITTI_label_name,spherical_dataset,voxel_dataset
#ignore weird np warning
import warnings
warnings.filterwarnings("ignore")
def read_pcd(filepath):
lidar = []
with open(filepath,'r') as f:
line = f.readline().strip()
while line:
linestr = line.split(" ")
if len(linestr) == 5:
linestr_convert = list(map(float, linestr))
lidar.append(linestr_convert)
line = f.readline().strip()
return np.array(lidar)
def get_las_path_list(las_dir):
las_path_list = []
for las_name in os.listdir(las_dir):
if las_name.endswith('.las'):
las_path = os.path.join(las_dir, las_name)
if os.path.exists(las_path):
las_path_list.append(las_path)
return las_path_list
def fast_hist(pred, label, n):
k = (label >= 0) & (label < n)
bin_count=np.bincount(
n * label[k].astype(int) + pred[k], minlength=n ** 2)
return bin_count[:n ** 2].reshape(n, n)
def per_class_iu(hist):
return np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))
def fast_hist_crop(output, target, unique_label):
hist = fast_hist(output.flatten(), target.flatten(), np.max(unique_label)+1)
hist=hist[unique_label,:]
hist=hist[:,unique_label]
return hist
def SemKITTI2train(label):
if isinstance(label, list):
return [SemKITTI2train_single(a) for a in label]
else:
return SemKITTI2train_single(label)
def SemKITTI2train_single(label):
return label - 1 # uint8 trick
def train2SemKITTI(input_label):
# delete 0 label (uses uint8 trick : 0 - 1 = 255 )
return input_label + 1
def BLH2NEH(m_b, m_l, m_h, TripWidth=3, nFlag=0):
a = 6378137.0
f = 298.257223563
B = m_b
L = m_l
H = m_h
L_ddd_Style = L
dZoneNumber = math.floor((L_ddd_Style - 1.5) / TripWidth) + 1
L0 = dZoneNumber * 3.0
B = math.radians(B)
L = math.radians(L)
L0 = math.radians(L0)
l = L - L0
e = 2.0 / f - 1.0 / f / f
e12 = e / (1.0 - e)
e1 = 1.0 - e
dsin = math.sin(B)
dcos = math.cos(B)
l2 = l * l
N = a / math.sqrt(1.0 - e * pow(dsin, 2))
dcos2 = pow(dcos, 2)
temp_value = 3.0 / 4.0 * e + 45.0 / 64.0 * pow(e, 2) + 175.0 / 256.0 * pow(e, 3) + 11025.0 / 16384.0 * pow(e, 4) + 43659.0 / 65536.0 * pow(e, 5) + 693693.0 / 1048576.0 * pow(e, 6)
temp_value = temp_value + 2760615.0 / 4194304.0 * pow(e, 7)
A1 = 1.0 + temp_value
B1 = temp_value
C1 = 15.0 / 32.0 * pow(e, 2) + 175.0 / 384.0 * pow(e, 3) + 3675.0 / 8192.0 * pow(e, 4) + 14553.0 / 32768.0 * pow(e, 5) + 231231.0 / 524288.0 * pow(e, 6) + 920205.0 / 2097152.0 * pow(e, 7)
D1 = 35.0 / 96.0 * pow(e, 3) + 735.0 / 2048.0 * pow(e, 4) + 14553.0 / 40960.0 * pow(e, 5) + 231231.0 / 655360.0 * pow(e, 6) + 184041.0 / 524288.0 * pow(e, 7)
E1 = 315.0 / 1024.0 * pow(e, 4) + 6237.0 / 20480.0 * pow(e, 5) + 99099.0 / 327680.0 * pow(e, 6) + 552123.0 / 1835008.0 * pow(e, 7)
F1 = 693.0 / 2560.0 * pow(e, 5) + 11011.0 / 40960.0 * pow(e, 6) + 61347.0 / 229376.0 * pow(e, 7)
can1 = A1 * a * e1
can2 = -B1 * a * e1
can3 = -C1 * a * e1
can4 = -D1 * a * e1
can5 = -E1 * a * e1
can6 = -F1 * a * e1
X0 = can1 * B
a0 = can2 + can3 * pow(dsin, 2) + can4 * pow(dsin, 4) + can5 * pow(dsin, 6) + can6 * pow(dsin, 8)
a41 = 6.0 / 24.0
a42 = 9.0 / 24.0 * e12
a43 = 4.0 / 24.0 * pow(e12, 2)
a4c = -1.0 / 24.0
a4 = (a41 + (a42 + a43 * dcos2) * dcos2) * dcos2 + a4c
a61 = -60.0 / 720.0
a62 = 120.0 / 720.0
a6c = 1.0 / 720.0
a6 = (a61 + a62 * dcos2) * dcos2 + a6c
x84 = X0 + (a0 + (0.5 + (a4 + a6 * l2) * l2) * l2 * N) * dsin * dcos
a31 = 1.0 / 3.0
a32 = 1.0 / 6.0 * e12
a3c = -1.0 / 6.0
a3 = (a31 + a32 * dcos2) * dcos2 + a3c
a51 = -20.0 / 120.0
a52 = 24.0 / 120.0 - 58.0 / 120.0 * e12
a53 = 72.0 / 120.0 * e12
a5c = 1.0 / 120.0
a5 = (a51 + (a52 + a53 * dcos2) * dcos2) * dcos2 + a5c
y84 = (1.0 + (a3 + a5 * l2) * l2) * l * N * dcos + 500000.0
if nFlag == 1:
x84 = 0.9996 * x84
y84 = 0.9996 * (y84 - 500000.0) + 500000.0
#
m_N = x84
m_E = y84
m_H = H
return dZoneNumber, m_N, m_E, m_H
def gnssData2Pose(data):
dDistZ = data['dAltitude']
dZone, dDistY, dDistX, _ = BLH2NEH(data['dLatitude'], data['dLongitude'], 0.0)
dDistX += dZone * 1e6
a = math.radians(-1.0 * data['dHeading'])
b = math.radians(data['dPitch'])
c = math.radians(data['dRoll'])
pose = np.zeros((4, 4), dtype=np.float64)
z_matrix = [math.cos(a), -math.sin(a), 0, math.sin(a), math.cos(a), 0, 0, 0, 1]
z_matrix = np.array(z_matrix).reshape((3, 3))
x_matrix = [1, 0, 0, 0, math.cos(b), -math.sin(b), 0, math.sin(b), math.cos(b)]
x_matrix = np.array(x_matrix).reshape((3, 3))
y_matrix = [math.cos(c), 0, math.sin(c), 0, 1, 0, -math.sin(c), 0, math.cos(c)]
y_matrix = np.array(y_matrix).reshape((3, 3))
xyz_matrix = np.matmul(np.matmul(z_matrix, x_matrix), y_matrix)
pose[0:3, 0:3] = xyz_matrix
pose[0:3, 3] = np.array([dDistX, dDistY, dDistZ])
pose[3, :] = np.array([0, 0, 0, 1])
return pose
def sequential_search(las_secInWeek, nav_data_list):
absolute_time_seconds_temp = nav_data_list[0]['dGpsWeek'] * 7 * 24 * 3600 + las_secInWeek
last_line_time_seconds = nav_data_list[-1]['dAbsoluteTime'] / 1000000.0
interval_time_week = (last_line_time_seconds - absolute_time_seconds_temp) / 3600.0 / 24.0 / 7.0
las_absolute_time = absolute_time_seconds_temp
if interval_time_week >= 1.0:
las_absolute_time = las_absolute_time + 7 * 24 * 3600
# microsecond
las_absolute_time = las_absolute_time * 1000000.0
nav_data_absolute_time_list = []
for nav_data in nav_data_list:
dAbsoluteTime = nav_data['dAbsoluteTime']
nav_data_absolute_time_list.append(dAbsoluteTime)
start_id = 0
flag = False
for nav_data_current_time, nav_data_next_time in zip(nav_data_absolute_time_list[:-1:1], nav_data_absolute_time_list[1::1]):
if nav_data_current_time <= las_absolute_time < nav_data_next_time:
flag = True
break
start_id = start_id + 1
if flag is False:
if las_absolute_time < nav_data_absolute_time_list[0]['dAbsoluteTime']:
start_id = 0
elif las_absolute_time >= nav_data_absolute_time_list[-1]['dAbsoluteTime']:
start_id = len(nav_data_absolute_time_list) - 1
return start_id
#获取las点云的中心点车载pose
def get_las_center_pose_value_zs(las_narray, need_nav_data_list):
gps_time = las_narray[len(las_narray)//2, 4]
start_id = sequential_search(gps_time, need_nav_data_list)
nav_data = need_nav_data_list[start_id]
gnss_data = dict()
key_list = ['dSecInWeek', 'dLongitude', 'dLatitude', 'dAltitude', 'dRoll', 'dPitch', 'dHeading']
for key in key_list:
gnss_data[key] = nav_data[key]
gnss_data['nPoseType'] = 0
pose = gnssData2Pose(gnss_data)
return pose
def get_LoadNav(nav_path):
need_nav_data_list = []
if not os.path.exists(nav_path):
print('ERROR::{} not exist.')
return need_nav_data_list
nav_data_list = []
with open(nav_path, 'r') as f:
nav_data_list = f.readlines()
for nav_data in nav_data_list:
nav_data = nav_data.strip()
# tuple
per_data_list = nav_data.split()
per_data_list = list(per_data_list)
# print('per_data_list=', per_data_list)
# ['2267.000', '283442.000', '30.4024309989', '114.4038392277', '41.877', '0.002', '0.266', '0.007', '-0.1575653763', '1.3607864935', '35.1567443404', '0.020', '0.021', '0.030', '0.000', '0.000', '0.0
need_nav_data = dict()
dGpsWeek = float(per_data_list[0])
dGpsSec = float(per_data_list[1])
dLatitude = float(per_data_list[2])
dLongitude = float(per_data_list[3])
dAltitude = float(per_data_list[4])
dPitch = float(per_data_list[8])
dRoll = float(per_data_list[9])
dHeading = float(per_data_list[10])
unit_conversion = 1e6
need_nav_data['dGpsWeek'] = dGpsWeek
need_nav_data['dGpsSec'] = dGpsSec
# 微妙
need_nav_data['dSecInWeek'] = dGpsSec * unit_conversion
need_nav_data['dAbsoluteTime'] = (dGpsWeek * 7 * 24 * 3600 + dGpsSec) * unit_conversion
need_nav_data['dLatitude'] = dLatitude
need_nav_data['dLongitude'] = dLongitude
need_nav_data['dAltitude'] = dAltitude
need_nav_data['dPitch'] = dPitch
need_nav_data['dRoll'] = dRoll
need_nav_data['dHeading'] = dHeading
# x,y,z
dDistZ = dAltitude
dZone, dDistY, dDistX, _ = BLH2NEH(dLatitude, dLongitude, 0.0)
dDistX += dZone * 1e6
need_nav_data['x'] = dDistX
need_nav_data['y'] = dDistY
need_nav_data['z'] = dDistZ
need_nav_data_list.append(need_nav_data)
# if len(need_nav_data_list) >= 2:
# print("Start time: %.2f, End time: %.2f" % (need_nav_data_list[0]['dSecInWeek'], need_nav_data_list[-1]['dSecInWeek']))
return need_nav_data_list
def get_las_trajectory_min_max_z_value(las_narray, need_nav_data_list):
gps_time = las_narray[:, 4]
gps_time_min = np.min(gps_time)
gps_time_max = np.max(gps_time)
# print('gps_time_min=', gps_time_min)
# print('gps_time_max=', gps_time_max)
las_secInWeek_min = gps_time_min
start_id = sequential_search(las_secInWeek_min, need_nav_data_list)
# print('start_id=', start_id)
start_altitude = need_nav_data_list[start_id]['dAltitude']
start_point = np.zeros(3)
start_point[0] = need_nav_data_list[start_id]['x']
start_point[1] = need_nav_data_list[start_id]['y']
start_point[2] = need_nav_data_list[start_id]['z']
# second
las_secInWeek_max = gps_time_max
end_id = sequential_search(las_secInWeek_max, need_nav_data_list)
# print('end_id=', end_id)
end_altitude = need_nav_data_list[end_id]['dAltitude']
end_point = np.zeros(3)
end_point[0] = need_nav_data_list[end_id]['x']
end_point[1] = need_nav_data_list[end_id]['y']
end_point[2] = need_nav_data_list[end_id]['z']
min_z = np.min([start_altitude, end_altitude])
max_z = np.max([start_altitude, end_altitude])
mid_id = (start_id + end_id) // 2
mid_point = np.zeros(3)
mid_point[0] = need_nav_data_list[mid_id]['x']
mid_point[1] = need_nav_data_list[mid_id]['y']
mid_point[2] = need_nav_data_list[mid_id]['z']
return min_z, max_z, start_point, mid_point, end_point
def get_three_points(start_point, mid_point, end_point):
distance_a = np.linalg.norm(mid_point - start_point)
distance_b = np.linalg.norm(end_point - mid_point)
distance = distance_a + distance_b
return distance
# save_point_cloud
def save_point_cloud(las_narray, label, out_path, header=None, is_deployment=True):
# las_narray[:, 0:3] = las_narray[:, 0:3] + header.offsets
cloud = las_narray[:, 0:3]
indensity = las_narray[:, 3]
gps_time = las_narray[:, 4]
# label = las_narray[:, 5]
points = np.array(cloud)
print('points.shape=', points.shape)
if header is None:
header = laspy.LasHeader(point_format=3, version="1.2")
header.offsets = np.min(points[:, 0:3], axis=0)
print('header.offsets', header.offsets)
header.scales = np.array([0.01, 0.01, 0.01])
# with laspy.open(out_path, mode="w", header=header) as writer:
# point_record = laspy.ScaleAwarePointRecord.zeros(len(points), header=header)
# point_record.x = points[:, 0]
# point_record.y = points[:, 1]
# point_record.z = points[:, 2]
# point_record.intensity = indensity
# print('is_deployment=', is_deployment)
# if is_deployment is True:
# point_record.gps_time = gps_time
# writer.write_points(point_record)
# 2. Create a Las
# header.add_extra_dim(laspy.ExtraBytesParams(name="intensity", type=np.int32))
las = laspy.LasData(header)
las.x = points[:, 0]
las.y = points[:, 1]
las.z = points[:, 2]
las.intensity = indensity
las.classification = label
las.gps_time = gps_time
# las.label = label
las.write(out_path)
return
def preprocess(data, grid_size, fixed_volume_space= False, max_volume_space = [60,50,76], min_volume_space = [-60,-50,-4]):
xyz,sig = data[:,:3],data[:,3]
max_bound = np.percentile(xyz,100,axis = 0)
min_bound = np.percentile(xyz,0,axis = 0)
if fixed_volume_space:
max_bound = np.asarray(max_volume_space)
min_bound = np.asarray(min_volume_space)
grid_size = np.asarray(grid_size)
# get grid index
crop_range = max_bound - min_bound #点云长宽高
cur_grid_size = grid_size #网格大小
intervals = crop_range/(cur_grid_size-1)
if (intervals==0).any(): print("Zero interval!")
grid_ind = (np.floor((np.clip(xyz,min_bound,max_bound)-min_bound)/intervals)).astype(np.int)
# center data on each voxel for PTnet
voxel_centers = (grid_ind.astype(np.float32) + 0.5)*intervals + min_bound
return_xyz = xyz - voxel_centers
return_xyz = np.concatenate((return_xyz,xyz),axis = 1)
return_fea = np.concatenate((return_xyz,sig[...,np.newaxis]),axis = 1)
data_tuple = (grid_ind, return_fea)
return data_tuple
def main(args):
data_path = args.data_las_dir
nav_path = args.data_nav_path
model_save_path = args.model_save_path
output_path = args.test_output_path
fixed_volume_space = args.fixed_volume_space
compression_model = args.grid_size[2]
grid_size = args.grid_size
pytorch_device = torch.device('cuda:0')
model = args.model
if model == 'polar':
fea_dim = 9
circular_padding = True
elif model == 'traditional':
fea_dim = 7
circular_padding = False
# prepare miou fun
# unique_label=np.asarray(sorted(list(SemKITTI_label_name.keys())))[1:] - 1
# unique_label_str=[SemKITTI_label_name[x] for x in unique_label+1]
unique_label = np.arange(5)
unique_label_str = ["VOID", "ground", "buildings", "trees", "telephone_poles"]
# unique_label = np.arange(4)
# unique_label_str = ["ground", "buildings", "trees", "telephone_poles"]
# prepare model
my_BEV_model=BEV_Unet(n_class=len(unique_label), n_height = compression_model, input_batch_norm = True, dropout = 0.5, circular_padding = circular_padding)
my_model = ptBEVnet(my_BEV_model, pt_model = 'pointnet', grid_size = grid_size, fea_dim = fea_dim, max_pt_per_encode = 256,
out_pt_fea_dim = 512, kernal_size = 1, pt_selection = 'random', fea_compre = compression_model)
if os.path.exists(model_save_path):
my_model.load_state_dict(torch.load(model_save_path))
my_model.to(pytorch_device)
my_model.eval() ##必须要加这个,不然结果怪怪的。之前没加,结果测出来怪怪的,导致弄了很长时间。
# prepare dataset
# test_pt_dataset = SemKITTI_zs(data_path + '/leador/20231023/', imageset = 'train_test', return_ref = True)
# test_dataset=voxel_dataset(test_pt_dataset, grid_size = grid_size, ignore_label = 0, fixed_volume_space = args.fixed_volume_space, return_test= True)
# val_dataset=voxel_dataset(val_pt_dataset, grid_size = grid_size, ignore_label = 0, fixed_volume_space = args.fixed_volume_space)
las_path_list = get_las_path_list(data_path)
with open('/data/zs/github/segmantic3d/PolarSeg/laspath.txt','w') as f:
# f.writelines([line+'\n' for line in las_path_list])
f.writelines('\n'.join(las_path_list))
f.close()
las_path_list = sorted(las_path_list)
need_nav_data_list = get_LoadNav(nav_path)
for i, las_path in enumerate(las_path_list):
print('debug debug i=', i)
print('las_path =', las_path)
las = laspy.read(las_path)
# las_narray = np.column_stack((las.xyz, las.intensity, las.gps_time , np.zeros(len(las.intensity), dtype=np.int32)))
las_narray = np.column_stack((las.xyz, las.intensity, las.gps_time ))
print('las_narray.shape', las_narray.shape)
# z value filter
z_value = las_narray[:, 2]
min_z, max_z, start_point, mid_point, end_point = get_las_trajectory_min_max_z_value(las_narray, need_nav_data_list)
distance = get_three_points(start_point, mid_point, end_point)
_,out_las_filename = las_path.split('/19/',1)
if distance > 3.0:
out_las_path = output_path + '/result/' + out_las_filename
if not os.path.exists(os.path.dirname(out_las_path)):
try:
os.makedirs(os.path.dirname(out_las_path))
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
pose = get_las_center_pose_value_zs(las_narray, need_nav_data_list)
points_homo= np.column_stack((las_narray[:, 0:3],np.ones(len(las_narray), dtype=np.float32)))
points_homo_imu = np.dot(points_homo, np.linalg.inv(pose).T)[:,0:3]
# las_narray[:, 0:3] = points_homo_imu[:,0:3]
vhc_data = np.column_stack((points_homo_imu, las.intensity))
test_grid, test_pt_fea = preprocess(vhc_data, grid_size, fixed_volume_space= fixed_volume_space)
test_grid = [test_grid]
test_pt_fea = [test_pt_fea]
test_pt_fea_ten = [torch.from_numpy(i).type(torch.FloatTensor).to(pytorch_device) for i in test_pt_fea]
test_grid_ten = [torch.from_numpy(i[:,:2]).to(pytorch_device) for i in test_grid]
predict_labels = my_model(test_pt_fea_ten,test_grid_ten)
predict_labels = torch.argmax(predict_labels,1).type(torch.uint8)
predict_labels = predict_labels.cpu().detach().numpy()
for count,i_test_grid in enumerate(test_grid):
test_pred_label = predict_labels[count,test_grid[count][:,0],test_grid[count][:,1],test_grid[count][:,2]]
#las.header.offsets = np.zeros(3, dtype=np.float32)
save_point_cloud(las_narray, test_pred_label, out_las_path, header=las.header, is_deployment=False)
else:
out_las_path = output_path + '/delete/' + out_las_filename
# las.header.offsets = np.zeros(3, dtype=np.float64)
if not os.path.exists(os.path.dirname(out_las_path)):
try:
os.makedirs(os.path.dirname(out_las_path))
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
label = np.zeros(len(las.intensity), dtype=np.int32)
save_point_cloud(las_narray, label, out_las_path, header=las.header, is_deployment=False)
if __name__ == '__main__':
# Testing settings
parser = argparse.ArgumentParser(description='')
parser.add_argument('-d', '--data_las_dir', default='/data/zs/github/segmantic3d/data/mms/segmantic/Lidar1/021005/2023/06/19/202306190555330000_202306190742000000')
parser.add_argument('-n', '--data_nav_path', default='/data/zs/github/segmantic3d/data/mms/segmantic/Project1/021005/2023/06/19/202306190555330000_202306190742000000/Project.nav')
parser.add_argument('-p', '--model_save_path', default='./SemKITTI_trad_BEVSeg5.pt')
parser.add_argument('-o', '--test_output_path', default='out/leador')
parser.add_argument('-m', '--model', choices=['polar','traditional'], default='traditional', help='training model: polar or traditional (default: polar)')
parser.add_argument('-s', '--grid_size', nargs='+', type=int, default = [480,400,64], help='grid size of BEV representation (default: [480,360,32])')
parser.add_argument('--fixed_volume_space', type=bool, default=True, help='validation interval (default: 4000)')
args = parser.parse_args()
if not len(args.grid_size) == 3:
raise Exception('Invalid grid size! Grid size should have 3 dimensions.')
print(' '.join(sys.argv))
print(args)
main(args)
代码逐步解析
注释:都是普通cnn卷积和nn.Linear,polar会有环卷积,我使用的是voxel,所以,环卷积也没有。
整个算法有点像BEV占用网络。
1.输入test_pt_fea_ten、test_grid_ten,test_grid_ten是点在网格xy的index,需要注意test_grid_ten只传入了xy两个。
test_pt_fea_ten = [torch.from_numpy(i).type(torch.FloatTensor).to(pytorch_device) for i in test_pt_fea]
test_grid_ten = [torch.from_numpy(i[:,:2]).to(pytorch_device) for i in test_grid]
predict_labels = my_model(test_pt_fea_ten,test_grid_ten)
2.xy_ind只传入了xy两个,将batch也加入xy的index,例如之前是对应第3个batch的[44,66],经过下面的代码之后,就变成了[3,44,66],既[batch,index_x,index_y]。此时cat_pt_ind 第0 dim代表batch,后面维度存:[batch,index_x,index_y]。
# concate everything
cat_pt_ind = []
for i_batch in range(len(xy_ind)):
cat_pt_ind.append(F.pad(xy_ind[i_batch],(1,0),'constant',value = i_batch))
3.在第0 dim(代表batch)进行concat;pt_num 是所有batch的点数相加,得到所有batch总的点数,后面的所有操作都作为一个整体,不再区别batch了,batch的区别在cat_pt_ind 第0维都有了。
cat_pt_fea = torch.cat(pt_fea,dim = 0)
cat_pt_ind = torch.cat(cat_pt_ind,dim = 0)
pt_num = cat_pt_ind.shape[0]
4.unq是独立无重复张量,shape是[无重复grid ceil个数,3],其中最后3维是[batch,index_x,index_y];
unq_inv是原始tensor中的每个元素在这个无重复张量中的索引,其shape是[points_count];值是索引;
unq_cnt 是张量中每个独立元素的个数,其shape是[无重复grid ceil个数],值是个数。
# unique xy grid index
'''
torch.unique()的功能类似于数学中的集合,就是挑出tensor中的独立不重复元素。
这个方法的参数在官方解释文档中有这么几个:torch.unique(input, sorted=True, return_inverse=False, return_counts=False, dim=None)
input: 待处理的tensor
sorted: 是否对返回的无重复张量按照数值进行排列,默认是生序排列的
return_inverse: 是否返回原始tensor中的每个元素在这个无重复张量中的索引
return_counts: 统计原始张量中每个独立元素的个数
dim: 值沿着哪个维度进行unique的处理,这个我试验后没有搞懂怎样的机理。如果处理的张量都是一维的,那么这个不需要理会。
'''
unq, unq_inv, unq_cnt = torch.unique(cat_pt_ind,return_inverse=True, return_counts=True, dim=0)
unq = unq.type(torch.int64)
5.功能:对每一个grid ceil中,去掉其中包含点数超过self.max_pt的点。换言之,就是每一个grid ceil最多有self.max_pt个点,就是这个作用。
# subsample pts
if self.pt_selection == 'random':
grp_ind = grp_range_torch(unq_cnt,cur_dev)[torch.argsort(torch.argsort(unq_inv))]
remain_ind = grp_ind < self.max_pt
#---------------------------------
cat_pt_fea = cat_pt_fea[remain_ind,:]
cat_pt_ind = cat_pt_ind[remain_ind,:]
unq_inv = unq_inv[remain_ind]
unq_cnt = torch.clamp(unq_cnt,max=self.max_pt)
6.对每一个点做全连接,用的都是nn.Linear(fea_dim, 64), 函数,既提取特征的操作,我这里输入:cat_pt_fea是[points_count,7],输出是[points_count,out_pt_fea_dim].只是维度上提高了。
# process feature
if self.pt_model == 'pointnet':
processed_cat_pt_fea = self.PPmodel(cat_pt_fea)
7.运用‘max’方法,求出每一个grid ceil中的最大值,点云算法中经常用到,不细说,为了强调点云的无序性。
输出是:pooled_data的shape=[“有效”无重复grid ceil个数,out_pt_fea_dim]
if self.pt_pooling == 'max':
pooled_data = torch_scatter.scatter_max(processed_cat_pt_fea, unq_inv, dim=0)[0]
else: raise NotImplementedError
8.普通cnn卷积过一道,通道数改变,通道数变成grid_size的高度值。
输出是:processed_pooled_data 的shape=[“有效”无重复grid ceil个数,fea_compre = compression_model]
if self.fea_compre:
processed_pooled_data = self.fea_compression(pooled_data)
else:
processed_pooled_data = pooled_data
9.out_data_dim代表空间四D大小,
len(pt_fea)=batchsize大小
self.grid_size[0]=x方向grid ceil的长度
self.grid_size[1]=y方向grid ceil的长度
self.pt_fea_dim=z方向grid ceil的长度
# stuff pooled data into 4D tensor
out_data_dim = [len(pt_fea),self.grid_size[0],self.grid_size[1],self.pt_fea_dim]
10.把第八步算出来的grid ceil特征processed_pooled_data,塞填到对应的网格。
最后out_data的shape是:[batch, 高度,x方向, y方向]
out_data = torch.zeros(out_data_dim, dtype=torch.float32).to(cur_dev)
out_data[unq[:,0],unq[:,1],unq[:,2],:] = processed_pooled_data
out_data = out_data.permute(0,3,1,2)
11.后面很简单了,把第十步out_data当成图像特征 ,就是利用图像的cnn卷积,什么多尺度,concat呀,一顿输出。最后输出“class*高度”个特征,
最后net_return_data 的shape是:[batch,类别个数,x方向, y方向, 高度]
再用torch.argmax(predict_labels,1).type(torch.uint8)直接获取类别。
net_return_data = self.BEV_model(out_data)
测试效果
(1)点云分类功能:
目标:点云分割功能是为了从激光雷达所采集的原始点云中,提取出感兴趣的类别,比如0背景、1道路面、2建筑物、3树木和4==电线杆。
输入数据:单线激光器采集的点云数据,保存成las点云文件,每个las文件有96w个点。
输出数据:las文件中每个点赋值一个类别,将点云分类的结果保存到指定目录下。
点云分割效果
通过cloudcompare工具打开点云分割保存的las文件,整体分割效果如下图所示,背景为蓝色,道路面为青色、建筑物为绿色、
树木为黄色、电线杆为橙黄色。
远图:
近图:
点云分割效果分析,从图中可以看出:
道路面:
路面基本都在一个平面上,精度MIOU=90%以上,符合项目需求;但衔接部分路面分割效果有点不太好,容易分错,特别是在路肩附近处,道路面和非道路面区分不明显。路面识别效果如下:
好的效果:
不足之处:
建筑物:
建筑物大部分都识别准确,精度MIOU=90%以上。建筑识别效果如下:
识别效果:
树木:
树木大部分都识别出来了,精度MIOU=85%。树木识别效果如下:
电线杆:
电线杆识别准确度最低,精度MIOU=80%以上,原因大概有几点:
1.标注数据中,电线杆类别太少。
2.电线杆特征比较少,且容易和杆状物(如路灯等)混淆。
电线杆识别效果如下: