Waymo数据集解析

Waymo数据集学习

因为研究需要,需要用到waymo数据集中的点云数据,所以将waymo数据集处理过程记录下来

下载

waymo数据集比较大,怎么下载这里不再赘述。下载完成后是压缩包,解压后为tfrecord文件,这是tensorflow的一种数据格式,一个tfrecord文件中包含199帧数据,是连续时间段的数据帧,一帧中包含了车上所有传感器的数据以及相应的label,因为本人主要使用激光点云数据,下面主要说如何从tfrecord中解析激光点云数据。

附上waymo数据集下载地址

环境

环境往往是通往成功道路上的第一个坑,下面是基本的环境配置

  • Ubuntu 16.04 LTS,RTX1080Ti
  • Python 3.7.3
  • Tensorflow 1.14.0
  • Nvidia驱动程序
  • CUDA 10.0
  • Cudnn7.4.1
  • conda 4.6.14

准备

首先要下载一个waymo数据解析代码,然后用pip安装必要的依赖文件,下面是命令操作

cd /your folder
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
cd waymo-od && git branch -a
git checkout remotes/origin/r1.0
pip3 install waymo-open-dataset

这个代码仓中缺少两个文件,一个是label_pb2.py,一个是dataset_pb2.py,可以通过git 下载,下载完成后,将文件夹中的这两个文件移到之前下载的waymo-od中的waymo_open_dataset 中,然后将test.py移动到waymo-od中。

git clone https://github.com/yyuuliang/waymo-dataset-viewer.git 

demon

在test.py中,修改tfrecord文件的读入路径,讲代码中标记为TO DO的内容修改后,就可以运行demon了,下面是demon运行的结果。

这是五个相机图片

在这里插入图片描述

这个是生成的应该是激光的图像(不太清楚有啥用)

在这里插入图片描述

这个就是比较waymo数据比较特殊的一点,可以将激光雷达扫描到的点投射到二维图像上。

在这里插入图片描述

进阶

其实,只是跑跑demon并没有什么用,不过跑demon可以证明你的环境没有问题,那么你就可以进阶了。接下来我们解读一下test.py里面的内容,这一块真正让你生成自己的激光点云数据,像kitti、apollo一样,一帧点云为1个bin文件,对应一个txt label文件。

Step1:导入环境

import os
import imp
import tensorflow as tf
import math
import numpy as np
import itertools
import matplotlib.pyplot as plt
import struct
m=imp.find_module('waymo_open_dataset', ['.'])
imp.load_module('waymo_open_dataset', m[0], m[1], m[2])
from waymo_open_dataset.utils import range_image_utils
from waymo_open_dataset.utils import transform_utils
from waymo_open_dataset import dataset_pb2 as open_dataset
tf.enable_eager_execution()

tf.enable_eager_execution()这是一个无需执行sess.run即可计算张量的便捷函数

Step2:加载数据集

FILENAME = '/yourpath/filename.tfrecord'
dataset = tf.data.TFRecordDataset(FILENAME)
for data in dataset:
    frame = open_dataset.Frame()
    frame.ParseFromString(bytearray(data.numpy()))
    #解析数据帧并获取你想要的内容

FILENAME是你的tfrecord文件的路径,然后dataset中是tfrecord中所有的帧数据,一般包含199帧,采样频率是10Hz,所以近似为20秒数据。

  • camera_labels: 5台摄像机检测到的对象的图像坐标,大小,类型等,从每帧0到4
  • context: 相机和激光雷达的内部和外部参数,光束倾斜度值
  • images: 图片
  • laser_labels: 激光雷达坐标系上物体的XYZ坐标,大小,行进方向,对象类型等等
  • lasers: 激光点
  • no_label_zones:非标记区域的设置(有关详细信息,请参阅文档)
  • pose: 车辆姿势和位置
  • projected_lidar_labels: 投影由LIDAR检测到的对象时的图像坐标
  • timestamp_micros: 时间戳

此处可以了解参数详细情况

Step3:解析数据帧

接下来的功能是读取一帧并输出以下三个内容

  • range_images
  • camera_projections
  • range_image_top_pose
(range_images, camera_projections, range_image_top_pose) = parse_range_image_and_camera_projection(frame)#解析数据帧
(pointss,points2) = convert_range_image_to_point_cloud(frame,range_images,camera_projections,range_image_top_pose)#获取激光点云
#write label file      
obj_types={1:'TYPE_VEHICLE', 3:'TYPE_SIGN',2:'TYPE_PEDESTRIAN',4:'TYPE_CYCLISTS'}
labelf=open("/your label file dirc/1.txt","w")
for label in frame.laser_labels:#获取当前帧所有obj标签
    obj_type=obj_types[label.type]
    x=label.box.center_x
    y=label.box.center_y
    z=label.box.center_z
    l=label.box.width
    w=label.box.length
    h=label.box.height
    r=label.box.heading
    one_obj=obj_type+' '+str(w)+' '+str(h)+' '+str(l)+' '+str(y)+' '+str(z)+' '+str(x)+' '+str(r)
    labelf.write(one_obj+"\n")
#write bin file
with open("your bin file dict/1.bin", 'wb')as fp:
	for points in pointss:
		l = [k for k in points]
        for i in range(len(l)):
            x = struct.pack('f', l[i][0])
            y = struct.pack('f', l[i][1])
            z = struct.pack('f', l[i][2])
            r = struct.pack('f', 0)
            fp.write(x)
            fp.write(y)
            fp.write(z)
            fp.write(r)  

生成标签文件时,当你直接print(label)的时候,可以看到一个object的label中type是字符串,例如TYPE_VEHICLE,当你print(label.type)时,输出的时数字1、2、3、4,所以我将它们之间做了映射,方便大家根据需要进行类型标注。

由于waymo将雷达数据进行了划分,分为正前、左前、右前、左后、右后,所以可以将五个点云数组全部写入组成一个完成的帧点云。

上面的代码中,函数parse_range_image_and_camera_projection与函数convert_range_image_to_point_cloud在下面的代码中给出。

def parse_range_image_and_camera_projection(frame):
  """Parse range images and camera projections given a frame.

  Args:
     frame: open dataset frame proto
  Returns:
     range_images: A dict of {laser_name,
       [range_image_first_return, range_image_second_return]}.
     camera_projections: A dict of {laser_name,
       [camera_projection_from_first_return,
        camera_projection_from_second_return]}.
    range_image_top_pose: range image pixel pose for top lidar.
  """
  range_images = {}
  camera_projections = {}
  range_image_top_pose = None
  for laser in frame.lasers:
    if len(laser.ri_return1.range_image_compressed) > 0:
      range_image_str_tensor = tf.decode_compressed(
          laser.ri_return1.range_image_compressed, 'ZLIB')
      ri = open_dataset.MatrixFloat()
      ri.ParseFromString(bytearray(range_image_str_tensor.numpy()))
      range_images[laser.name] = [ri]

      if laser.name == open_dataset.LaserName.TOP:
        range_image_top_pose_str_tensor = tf.decode_compressed(
            laser.ri_return1.range_image_pose_compressed, 'ZLIB')
        range_image_top_pose = open_dataset.MatrixFloat()
        range_image_top_pose.ParseFromString(
            bytearray(range_image_top_pose_str_tensor.numpy()))

      camera_projection_str_tensor = tf.decode_compressed(
          laser.ri_return1.camera_projection_compressed, 'ZLIB')
      cp = open_dataset.MatrixInt32()
      cp.ParseFromString(bytearray(camera_projection_str_tensor.numpy()))
      camera_projections[laser.name] = [cp]
    if len(laser.ri_return2.range_image_compressed) > 0:
      range_image_str_tensor = tf.decode_compressed(
          laser.ri_return2.range_image_compressed, 'ZLIB')
      ri = open_dataset.MatrixFloat()
      ri.ParseFromString(bytearray(range_image_str_tensor.numpy()))
      range_images[laser.name].append(ri)

      camera_projection_str_tensor = tf.decode_compressed(
          laser.ri_return2.camera_projection_compressed, 'ZLIB')
      cp = open_dataset.MatrixInt32()
      cp.ParseFromString(bytearray(camera_projection_str_tensor.numpy()))
      camera_projections[laser.name].append(cp)
  return range_images, camera_projections, range_image_top_pose 

def convert_range_image_to_point_cloud(frame,
                                       range_images,
                                       camera_projections,
                                       range_image_top_pose,
                                       ri_index = 0):
  """Convert range images to point cloud.
  Args:
    frame: open dataset frame
     range_images: A dict of {laser_name,
       [range_image_first_return, range_image_second_return]}.
     camera_projections: A dict of {laser_name,
       [camera_projection_from_first_return,
        camera_projection_from_second_return]}.
    range_image_top_pose: range image pixel pose for top lidar.
    ri_index: 0 for the first return, 1 for the second return.
  Returns:
    points: {[N, 3]} list of 3d lidar points of length 5 (number of lidars).
    cp_points: {[N, 6]} list of camera projections of length 5
      (number of lidars).
  """
  calibrations = sorted(frame.context.laser_calibrations, key=lambda c: c.name)
  lasers = sorted(frame.lasers, key=lambda laser: laser.name)
  points = [] 
  cp_points = []
  inten=[]
  
  frame_pose = tf.convert_to_tensor(
      np.reshape(np.array(frame.pose.transform), [4, 4]))
  # [H, W, 6]
  range_image_top_pose_tensor = tf.reshape(
      tf.convert_to_tensor(range_image_top_pose.data),
      range_image_top_pose.shape.dims)
  # [H, W, 3, 3]
  range_image_top_pose_tensor_rotation = transform_utils.get_rotation_matrix(
      range_image_top_pose_tensor[..., 0], range_image_top_pose_tensor[..., 1],
      range_image_top_pose_tensor[..., 2])
  range_image_top_pose_tensor_translation = range_image_top_pose_tensor[..., 3:]
  range_image_top_pose_tensor = transform_utils.get_transform(
      range_image_top_pose_tensor_rotation,
      range_image_top_pose_tensor_translation)
  #通过角度三维向量 与 位置 三维向量获取pose tensor
  for c in calibrations:
    range_image = range_images[c.name][ri_index]
    if len(c.beam_inclinations) == 0:
      beam_inclinations = range_image_utils.compute_inclination(
          tf.constant([c.beam_inclination_min, c.beam_inclination_max]),
          height=range_image.shape.dims[0])
    else:
      beam_inclinations = tf.constant(c.beam_inclinations)

    beam_inclinations = tf.reverse(beam_inclinations, axis=[-1])
    extrinsic = np.reshape(np.array(c.extrinsic.transform), [4, 4])

    range_image_tensor = tf.reshape(
        tf.convert_to_tensor(range_image.data), range_image.shape.dims)
    pixel_pose_local = None
    frame_pose_local = None
    if c.name == open_dataset.LaserName.TOP:
      pixel_pose_local = range_image_top_pose_tensor
      pixel_pose_local = tf.expand_dims(pixel_pose_local, axis=0)
      frame_pose_local = tf.expand_dims(frame_pose, axis=0)
    range_image_mask = range_image_tensor[..., 0] > 0
    range_image_cartesian = range_image_utils.extract_point_cloud_from_range_image(
        tf.expand_dims(range_image_tensor[..., 0], axis=0),
        tf.expand_dims(extrinsic, axis=0),
        tf.expand_dims(tf.convert_to_tensor(beam_inclinations), axis=0),
        pixel_pose=pixel_pose_local,
        frame_pose=frame_pose_local)
    #range_image_cartesian
    #range_image_tensor
    range_image_cartesian = tf.squeeze(range_image_cartesian, axis=0)
    points_tensor = tf.gather_nd(range_image_cartesian,
                                 tf.where(range_image_mask))

    cp = camera_projections[c.name][0]
    cp_tensor = tf.reshape(tf.convert_to_tensor(cp.data), cp.shape.dims)
    cp_points_tensor = tf.gather_nd(cp_tensor, tf.where(range_image_mask))
    points.append(points_tensor.numpy())
    cp_points.append(cp_points_tensor.numpy())
    inten.append(range_image_tensor[..., 1].numpy)
  print("point shape:",points.shape)
  print("inten shape:",inten.shape)

  return points, cp_points

到这一步我们已经拿到了我们想要的东西:激光点云的bin文件与对应的label文件,可以开心的进行模型训练啦!

补更!!!

最近遇到一个问题,那就是示例代码中的点云,缺失了一个维度——反射强度,之前在生成点云的时候,这一个维度直接用0进行了表示,但训练模型后发现存在很大问题,所以需要将这个维度找回来。

在认真阅读了代码以及数据结构后,发现了一些端倪。首先frame里面包含了image和range_image,而range image含有四个通道,分别是:range,intensity,还有一个伸长率和不知名的东东。而激光点云是依据range和校准矩阵得出的,这里有点复杂,不做解释。intensity的维度是64*2650=169600,而点云中的点只有13万个,这两个数值其实很接近,而且也比较容易解释。那就是64线激光雷达中的每一线激光扫描一圈可以得到2650个点,而点云中只有13万个点很可能是因为某种原因导致某些点被剔除了,所以我们需要在代码中找到在哪里这些点被剔除了,在剔除这些点之前,可以将强度加到后面,然后一起进行处理。经过不懈的努力,终于发现是函数convert_range_image_to_point_cloud中的points_tensor=tf.gather_nd(range_image_cartesian,tf.where(range_image_mask))删除了一些点,而range_image_mask这个数组总存的是True和False,在这句代码对点进行过滤之前,points_tensor是1*64*2650*3的tensor,所以我们只需要在这句代码之前把intensity加到每个点的后面,然后集体进行过滤。下面附上更新后的函数convert_range_image_to_point_cloud.

def convert_range_image_to_point_cloud(frame,
                                       range_images,
                                       camera_projections,
                                       range_image_top_pose,
                                       ri_index = 0):
  calibrations = sorted(frame.context.laser_calibrations, key=lambda c: c.name)
  lasers = sorted(frame.lasers, key=lambda laser: laser.name)
  points = [] 
  cp_points = []
  frame_pose = tf.convert_to_tensor(
      np.reshape(np.array(frame.pose.transform), [4, 4]))
  range_image_top_pose_tensor = tf.reshape(
      tf.convert_to_tensor(range_image_top_pose.data),
      range_image_top_pose.shape.dims)
  range_image_top_pose_tensor_rotation = transform_utils.get_rotation_matrix(
      range_image_top_pose_tensor[..., 0], range_image_top_pose_tensor[..., 1],
      range_image_top_pose_tensor[..., 2])
  range_image_top_pose_tensor_translation = range_image_top_pose_tensor[..., 3:]
  range_image_top_pose_tensor = transform_utils.get_transform(
      range_image_top_pose_tensor_rotation,
      range_image_top_pose_tensor_translation)
  for c in calibrations:
    range_image = range_images[c.name][ri_index]
    if len(c.beam_inclinations) == 0:
      beam_inclinations = range_image_utils.compute_inclination(
          tf.constant([c.beam_inclination_min, c.beam_inclination_max]),
          height=range_image.shape.dims[0])
    else:
      beam_inclinations = tf.constant(c.beam_inclinations)

    beam_inclinations = tf.reverse(beam_inclinations, axis=[-1])
    extrinsic = np.reshape(np.array(c.extrinsic.transform), [4, 4])

    range_image_tensor = tf.reshape(
        tf.convert_to_tensor(range_image.data), range_image.shape.dims)
    pixel_pose_local = None
    frame_pose_local = None
    if c.name == open_dataset.LaserName.TOP:
      pixel_pose_local = range_image_top_pose_tensor
      pixel_pose_local = tf.expand_dims(pixel_pose_local, axis=0)
      frame_pose_local = tf.expand_dims(frame_pose, axis=0)
    range_image_mask = range_image_tensor[..., 0] > 0
    range_image_cartesian = range_image_utils.extract_point_cloud_from_range_image(
        tf.expand_dims(range_image_tensor[..., 0], axis=0),
        tf.expand_dims(extrinsic, axis=0),
        tf.expand_dims(tf.convert_to_tensor(beam_inclinations), axis=0),
        pixel_pose=pixel_pose_local,
        frame_pose=frame_pose_local)

    range_image_cartesian = tf.squeeze(range_image_cartesian, axis=0)
    
    #到这里,range_image_cartesian的shape是[64,2650,3]
    #下面的一行代码是为了将强度与point进行连接,而对张量进行了变形,原来的形状是 [64,2650]
    range_image_inten=range_image_tensor[..., 1].numpy().reshape([64, 2650,1])
    #在这里将point的x y z与intensity进行连接
    points_real=np.append(range_image_cartesian.numpy(),range_image_inten,axis = 2)
    #整体进行过滤
    points_real=tf.gather_nd(points_real,tf.where(range_image_mask))
    '''
    points_tensor = tf.gather_nd(range_image_cartesian,
                                 tf.where(range_image_mask))

    cp = camera_projections[c.name][0]
    cp_tensor = tf.reshape(tf.convert_to_tensor(cp.data), cp.shape.dims)
    cp_points_tensor = tf.gather_nd(cp_tensor, tf.where(range_image_mask))
    points.append(points_tensor.numpy())
    cp_points.append(cp_points_tensor.numpy())
    '''
    #这里break循环是因为车子一共有五个雷达,循环第一次拿到的是顶部的雷达,我们只需要这个就够了,返回的值也改为了两个,减少不必要的计算开销。
    break

  return points_real.numpy()

下面给大家放上我自己用的代码(完整版),程序员读一读就懂了!!

import os
import imp
import tensorflow as tf
import math
import numpy as np
import itertools
import matplotlib.pyplot as plt
import struct
m=imp.find_module('waymo_open_dataset', ['.'])
imp.load_module('waymo_open_dataset', m[0], m[1], m[2])
from waymo_open_dataset.utils import range_image_utils
from waymo_open_dataset.utils import transform_utils
from waymo_open_dataset import dataset_pb2 as open_dataset
tf.enable_eager_execution()

def image_show(data, name, layout, cmap=None):
  """Show an image."""
  plt.subplot(*layout)
  plt.imshow(tf.image.decode_jpeg(data), cmap=cmap)
  plt.title(name)
  plt.grid(False)
  plt.axis('off')
def parse_range_image_and_camera_projection(frame):
  """Parse range images and camera projections given a frame.

  Args:
     frame: open dataset frame proto
  Returns:
     range_images: A dict of {laser_name,
       [range_image_first_return, range_image_second_return]}.
     camera_projections: A dict of {laser_name,
       [camera_projection_from_first_return,
        camera_projection_from_second_return]}.
    range_image_top_pose: range image pixel pose for top lidar.
  """
  range_images = {}
  camera_projections = {}
  range_image_top_pose = None
  for laser in frame.lasers:
    if len(laser.ri_return1.range_image_compressed) > 0:
      range_image_str_tensor = tf.decode_compressed(
          laser.ri_return1.range_image_compressed, 'ZLIB')
      ri = open_dataset.MatrixFloat()
      ri.ParseFromString(bytearray(range_image_str_tensor.numpy()))
      range_images[laser.name] = [ri]

      if laser.name == open_dataset.LaserName.TOP:
        range_image_top_pose_str_tensor = tf.decode_compressed(
            laser.ri_return1.range_image_pose_compressed, 'ZLIB')
        range_image_top_pose = open_dataset.MatrixFloat()
        range_image_top_pose.ParseFromString(
            bytearray(range_image_top_pose_str_tensor.numpy()))

      camera_projection_str_tensor = tf.decode_compressed(
          laser.ri_return1.camera_projection_compressed, 'ZLIB')
      cp = open_dataset.MatrixInt32()
      cp.ParseFromString(bytearray(camera_projection_str_tensor.numpy()))
      camera_projections[laser.name] = [cp]
    if len(laser.ri_return2.range_image_compressed) > 0:
      range_image_str_tensor = tf.decode_compressed(
          laser.ri_return2.range_image_compressed, 'ZLIB')
      ri = open_dataset.MatrixFloat()
      ri.ParseFromString(bytearray(range_image_str_tensor.numpy()))
      range_images[laser.name].append(ri)

      camera_projection_str_tensor = tf.decode_compressed(
          laser.ri_return2.camera_projection_compressed, 'ZLIB')
      cp = open_dataset.MatrixInt32()
      cp.ParseFromString(bytearray(camera_projection_str_tensor.numpy()))
      camera_projections[laser.name].append(cp)
  return range_images, camera_projections, range_image_top_pose 
def plot_range_image_helper(data, name, layout, vmin = 0, vmax=1, cmap='gray'):
  """Plots range image.

  Args:
    data: range image data
    name: the image title
    layout: plt layout
    vmin: minimum value of the passed data
    vmax: maximum value of the passed data
    cmap: color map
  """
  plt.subplot(*layout)
  plt.imshow(data, cmap=cmap, vmin=vmin, vmax=vmax)
  plt.title(name)
  plt.grid(False)
  plt.axis('off')
def get_range_image(laser_name, return_index):
  """Returns range image given a laser name and its return index."""
  return range_images[laser_name][return_index]
def show_range_image(range_image, layout_index_start = 1):
  """Shows range image.

  Args:
    range_image: the range image data from a given lidar of type MatrixFloat.
    layout_index_start: layout offset
  """
  range_image_tensor = tf.convert_to_tensor(range_image.data)
  range_image_tensor = tf.reshape(range_image_tensor, range_image.shape.dims)
  lidar_image_mask = tf.greater_equal(range_image_tensor, 0)
  range_image_tensor = tf.where(lidar_image_mask, range_image_tensor,
                                tf.ones_like(range_image_tensor) * 1e10)
  range_image_range = range_image_tensor[...,0] 
  range_image_intensity = range_image_tensor[...,1]
  range_image_elongation = range_image_tensor[...,2]
  plot_range_image_helper(range_image_range.numpy(), 'range',
                   [8, 1, layout_index_start], vmax=75, cmap='gray')
  plot_range_image_helper(range_image_intensity.numpy(), 'intensity',
                   [8, 1, layout_index_start + 1], vmax=1.5, cmap='gray')
  plot_range_image_helper(range_image_elongation.numpy(), 'elongation',
                   [8, 1, layout_index_start + 2], vmax=1.5, cmap='gray')

def show_range_image2(range_image):
  range_image_tensor = tf.convert_to_tensor(range_image.data)
  range_image_tensor = tf.reshape(range_image_tensor, range_image.shape.dims)
  lidar_image_mask = tf.greater_equal(range_image_tensor, 0)
  range_image_tensor = tf.where(lidar_image_mask, range_image_tensor,
                                tf.ones_like(range_image_tensor) * 1e10)
  print(range_image_tensor[...,0].numpy(),"\n",range_image_tensor[...,1].numpy())
  
def convert_range_image_to_point_cloud(frame,
                                       range_images,
                                       camera_projections,
                                       range_image_top_pose,
                                       ri_index = 0):
  """Convert range images to point cloud.

  Args:
    frame: open dataset frame
     range_images: A dict of {laser_name,
       [range_image_first_return, range_image_second_return]}.
     camera_projections: A dict of {laser_name,
       [camera_projection_from_first_return,
        camera_projection_from_second_return]}.
    range_image_top_pose: range image pixel pose for top lidar.
    ri_index: 0 for the first return, 1 for the second return.
  Returns:
    points: {[N, 3]} list of 3d lidar points of length 5 (number of lidars).
    cp_points: {[N, 6]} list of camera projections of length 5
      (number of lidars).
  """
  calibrations = sorted(frame.context.laser_calibrations, key=lambda c: c.name)
  lasers = sorted(frame.lasers, key=lambda laser: laser.name)
  points = [] 
  cp_points = []
  
  frame_pose = tf.convert_to_tensor(
      np.reshape(np.array(frame.pose.transform), [4, 4]))
  # [H, W, 6]
  range_image_top_pose_tensor = tf.reshape(
      tf.convert_to_tensor(range_image_top_pose.data),
      range_image_top_pose.shape.dims)
  # [H, W, 3, 3]
  range_image_top_pose_tensor_rotation = transform_utils.get_rotation_matrix(
      range_image_top_pose_tensor[..., 0], range_image_top_pose_tensor[..., 1],
      range_image_top_pose_tensor[..., 2])
  range_image_top_pose_tensor_translation = range_image_top_pose_tensor[..., 3:]
  range_image_top_pose_tensor = transform_utils.get_transform(
      range_image_top_pose_tensor_rotation,
      range_image_top_pose_tensor_translation)
  for c in calibrations:
    range_image = range_images[c.name][ri_index]
    if len(c.beam_inclinations) == 0:
      beam_inclinations = range_image_utils.compute_inclination(
          tf.constant([c.beam_inclination_min, c.beam_inclination_max]),
          height=range_image.shape.dims[0])
    else:
      beam_inclinations = tf.constant(c.beam_inclinations)

    beam_inclinations = tf.reverse(beam_inclinations, axis=[-1])
    extrinsic = np.reshape(np.array(c.extrinsic.transform), [4, 4])

    range_image_tensor = tf.reshape(
        tf.convert_to_tensor(range_image.data), range_image.shape.dims)
    pixel_pose_local = None
    frame_pose_local = None
    if c.name == open_dataset.LaserName.TOP:
      pixel_pose_local = range_image_top_pose_tensor
      pixel_pose_local = tf.expand_dims(pixel_pose_local, axis=0)
      frame_pose_local = tf.expand_dims(frame_pose, axis=0)
    range_image_mask = range_image_tensor[..., 0] > 0
    range_image_cartesian = range_image_utils.extract_point_cloud_from_range_image(
        tf.expand_dims(range_image_tensor[..., 0], axis=0),
        tf.expand_dims(extrinsic, axis=0),
        tf.expand_dims(tf.convert_to_tensor(beam_inclinations), axis=0),
        pixel_pose=pixel_pose_local,
        frame_pose=frame_pose_local)

    range_image_cartesian = tf.squeeze(range_image_cartesian, axis=0)
    range_image_inten=range_image_tensor[..., 1].numpy().reshape([64, 2650,1])
    points_real=np.append(range_image_cartesian.numpy(),range_image_inten,axis = 2)
    points_real=tf.gather_nd(points_real,tf.where(range_image_mask))
    '''
    points_tensor = tf.gather_nd(range_image_cartesian,
                                 tf.where(range_image_mask))

    cp = camera_projections[c.name][0]
    cp_tensor = tf.reshape(tf.convert_to_tensor(cp.data), cp.shape.dims)
    cp_points_tensor = tf.gather_nd(cp_tensor, tf.where(range_image_mask))
    points.append(points_tensor.numpy())
    cp_points.append(cp_points_tensor.numpy())
    '''
    break

  return points_real.numpy()

#formate Car 0 0 -10 50 50 50 50 w h l y z x r
#label 3D labels: vehicles, pedestrians, cyclists, signs
#TYPE_VEHICLE  TYPE_SIGN   TYPE_PEDESTRIAN  TYPE_CYCLISTS
path="/media/autolab/3187f5af-bb7c-4819-9e8c-3a685e91817c/TXB/source_data/waymo/waymo_data/"
tfs=os.listdir(path)
num=0
obj_types={1:'Car', 3:'DontCare',2:'Pedestrian',4:'Cyclist'}
re_index=open("/media/autolab/3187f5af-bb7c-4819-9e8c-3a685e91817c/TXB/source_data/waymo/re_index.txt","w")
for file in tfs:
    FILENAME = path+file
    dataset = tf.data.TFRecordDataset(FILENAME)
    print(file)
    for data in dataset:
      index="%06d" % num
      re_index.write(file+'\t\t'+index+"\n")
      frame = open_dataset.Frame()
      frame.ParseFromString(bytearray(data.numpy()))
      (range_images, camera_projections, range_image_top_pose) = parse_range_image_and_camera_projection(frame)
      points = convert_range_image_to_point_cloud(frame,range_images,camera_projections,range_image_top_pose)
      newf=open("/media/autolab/3187f5af-bb7c-4819-9e8c-3a685e91817c/TXB/source_data/waymo/waymo_label/"+index+".txt","w")
      for label in frame.laser_labels:
        obj_type=obj_types[label.type]
        x=label.box.center_x
        y=label.box.center_y
        z=label.box.center_z
        l=label.box.width
        w=label.box.length
        h=label.box.height
        r=label.box.heading+1.5707963
        if r>3.1415926:
          r=r-3.1415926
        one_obj=obj_type+' 0 0 -10 50 50 50 50 '+str(w)+' '+str(h)+' '+str(l)+' '+str(y)+' '+str(z)+' '+str(x)+' '+str(r)
        newf.write(one_obj+"\n")
      with open("/media/autolab/3187f5af-bb7c-4819-9e8c-3a685e91817c/TXB/source_data/waymo/waymo_bin/"+index+".bin", 'wb')as fp:
        for point in points:
          x = struct.pack('f', point[0])
          y = struct.pack('f', point[1])
          z = struct.pack('f', point[2])
          r = struct.pack('f', point[3])
          fp.write(x)
          fp.write(y)
          fp.write(z)
          fp.write(r) 
      print("frame:"+index)
      num+=1

续内容等有时间再更新吧,大家也可以根据demon自己玩!有问题大家可以在下面留言!!

Step4:显示图像

Step5:投影点云到图像

遇到的问题

  1. 环境配置完成后,显示memory有问题,具体错误代码稍后附上,检查后发现是程序会使用你所有的GPU,如果你其中的一个GPU被占用,会显示错误。

    解决:CUDA_VISIBLE_DEVICES=0 python test.py

总结

一直以来,激光点云中3D物体的检测可用的数据集主要为kitti,其次可以使用apollo数据集,但是要想让你的深度学习3D目标检测具有更好的泛化性能,就必须在足够大的数据集上进行训练,waymo数据集刚开源,希望能能对大家的学习有帮助。

  • 25
    点赞
  • 98
    收藏
    觉得还不错? 一键收藏
  • 44
    评论
推荐,waymo自动驾驶资料大全,包括相关专利资料和报告。 专利资料合集 1用于显示自动驾驶系统内部状态的用户界面 2具有多个光检测和测距设备的车辆(激光雷达) 3用于汽车雷达的2D紧凑型无功波束形成网络 4用于引导,控制和测试自主车辆特征和驾驶员响应的数据处理系统 5基于视觉的交通信号灯检测和分类 6用于确定车辆的姿势数据的系统和方法 7用于确定和动态更新乘客舒适度的路线和驾驶风格的方法和系统 8粘性安全气囊,用于行人保护 9使用印刷波导传输线的开槽波导阵列天线 10剪销失效系统 11对自动驾驶汽车的辅助感知 12LIDAR光学校准系统和方法 13冷凝传感器数据用于传输和处理 14用于显示自动驾驶系统内部状态的用户界面 15测试自动驾驶汽车的预测 16面向车辆的乘客显示 17指定自动驾驶车辆的不可用位置 18控制具有不充分的地图数据的车辆 19用于显示自动驾驶系统内部状态的用户界面 20用于自驾车的集成MIMO和SAR雷达天线架构 21使用车轮方向确定未来的航向 22用于对车辆和其他代理的行为进行建模的自动化系统和方法 23控制具有不充分的地图数据的车辆 24用于汽车雷达的3D紧凑型无功波束形成网络 25后传感器外壳 26控制具有不充分的地图数据的车辆 27自动驾驶汽车的动态路线 28控制具有不充分的地图数据的车辆 29对象边界框估计 30根据所需的驾驶员互动量建议路线 31控制具有不充分的地图数据的车辆 32用于具有多个无线链路的旋转接头的装置和方法 33PCB集成波导终端和负载 34二次冲击安全气囊 35可展开的乘客舱 36通过车轮运动检测车辆运动 37显示屏幕或其部分具有图形用户界面 38用于物体检测的神经网络 39将观察到的车辆轨迹合并为单个代表轨迹的方法 40使用跟踪回路的激光二极管定时反馈 41自动驾驶车辆的交通信号响应 42光电探测器阵列与数字前端的混合集成 43镜子组装 44参与和脱离自动驾驶 45通过自动驾驶车辆运输到目的地的方法和系统 46具有特别关注区域的宽视角激光雷达 47自动驾驶汽车 48光束分割扩展动态范围图像捕获系统 49美联储为汽车雷达决定开放式波导(DOEWG)天线阵列 50用于介电旋转接头的装置和方法 51自动车门 52自动车门 53检测街道停放的车辆 54多路复用多通道光电探测器 55使用障碍物间隙来测量精确的横向间隙 56通过高光谱传感器表征光学反射特征 57基于乘员存在和位置的自主驾驶行为的适应性 58基于乘员存在和位置的自主驾驶行为的适应性 59报告道路事件数据并与其他车辆共享 60显示屏幕或其部分具有图形用户界面 61停止标志检测和响应 62基于特殊用途车辆的存在来修改车辆状态 63印刷电路板布局 64自动驾驶汽车的增强轨迹 65MIMO雷达的正交线性调频 66根据交通状况修改自动驾驶汽车的速度 67显示屏幕或其一部分具有图形用户界面 68使用反向视差分析验证目标对象 69通过主动控制自主车辆来检测传感器劣化 70显示屏幕或其一部分具有图形用户界面 71用于低延迟通信的系统和方法 72用于确定车辆的姿势数据的系统和方法 73基于上下文信息预测对象的轨迹 74显示屏幕或其部分具有图形用户界面 75用于评估自主车辆的感知系统的系统和方法 76定位车辆以提高交叉口的观测质量 77等候乘客时的自主车辆行为 78用于LIDAR光学对准的方法和系统 79使用扭曲区域在级别之间转换的方法和装置 80基于视觉的交通信号灯检测和分类 81使用时空滤波的基于视觉的指示器信号检测 82 用于可旋转LIDAR设备的非接触式电耦合器 83检测并响应尾部 84远程可操纵LIDAR系统 85用于关闭车门的装置和方法 86用于DOEWG天线阵列功率和相位设置的集成镜头 87利用网络中不同交通信号的活动之间的关系来改善交通信号状态估计 88交叉验证自动驾驶车辆的传感器 89基于场景的自动驾驶汽车声音警告 90对自动驾驶汽车的辅助感知 91具有多个光检测和测距设备(LIDAR)的车辆 92碰撞减轻了自动驾驶车辆的制动 93基于多阶段分类的基于实时图像的车辆检测 94控制被动雨刷系统 95第二排车辆优先座位 96智能部署自动驾驶汽车的安全机制 97面向车辆的乘客显示 98基于音频样本的警笛检测方法 99用于雷达的相位编码线性频率调制 100模拟虚拟对象 ………… 资料太多不一一列举了,共169份 报告合集(9份)

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 44
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值