基于实例分割方法的端到端车道线检测 论文+代码解读

15 篇文章 15 订阅

Towards End-to-End Lane Detection: an Instance Segmentation Approach

论文原文

https://arxiv.org/pdf/1802.05591v1.pdf

前言
车道线检测的一篇很经典的论文,网上关于这篇论文的代码解读很多。这里记录一下自己的学习。

摘要

传统方法:手工特征提取,易受环境影响,达不到实时性要求。
之前的论文中基于深度学习方法:只能检测固定数量的车道线。
本文方法:将车道线检测作为一个实例分割问题。同时提出用神经网络去拟合逆透视变换矩阵,而不是直接固定矩阵参数,从而可以对道路变化更加鲁棒。

50fps,tuSimple数据集验证。

正文

图1是本文方法框架。
在这里插入图片描述

LaneNet

图2:LaneNet架构。将车道线检测作为一个实例分割问题,实现端到端。从而可实现对不同数量的车道线的检测。
在这里插入图片描述
该网络结合了二值车道分割的优点和为one-shot实例分割而设计的聚类损失函数。在LaneNet的输出中,每个lane pixel都被分配对应的lane id。

多任务网络联合训练可提高速度和准确率。包含两个分支

  • lane segmentation branch:(两分类问题)输出背景或者车道线。从而不必为不同的车道分配不同的类别。

为了构造ground-truth segmentation map,将所有ground-truth lane points连接在一起,形成每个lane的一条连接线。并且对隐含的车道线进行标注。(通过物体(如阻塞的汽车),或者在没有明显可见的车道片段(如虚线或褪色的车道)的情况下,绘制这些真实的车道。)这样网络也可以学习到隐藏的车道线。
损失函数:standard cross-entropy
lane/background类别不均衡:bounded inverse class weighting
在这里插入图片描述
p为对应类别在总体样本中出现的概率,c是超参数。

  • lane embedding branch:分割出的车道线分为不同的实例。使用聚类损失函数,为车道分割分支中的每个像素分配一个车道id,忽略背景像素。

目标检测方法(边界框)适合于紧实的物体,而车道线不是。因此将其对待为实例分割问题。采用一种基于距离度量学习的one-shot方法。通过对聚类损失函数的设计,使得同一条车道线的像素距离近,不同车道线像素距离远。具体实现如下:

L=Lvar+Ldist

在这里插入图片描述
Lvar:方差项。每个像素向量点施加一个拉力,使其朝向车道的平均像素向量点。(hinged)像素向量与聚类中心距离大于δv时才被激活。
Ldist:距离项。使聚类中心彼此远离。(hinged)聚类中心之间的距离小于δd时才被激活。
C:聚类中心(车道线)的数量
Nc:聚类中心c的元素数量
xi:一个像素向量
uc:聚类中心c的平均向量
||·||:L2距离
[x]+ = max(0,x):hinge

clustering

迭代过程。 为了方便在推理时对像素进行聚类,在上述损失L中设置δd>6δv。(因为这样以一个随机的车道线嵌入为圆心,以2δv为半径,选取圆中所有的像素归为同一车道线。)

在进行聚类时,首先使用mean shift聚类,使得簇中心沿着密度上升的方向移动,防止将离群点选入相同的簇中;之后对像素向量进行划分:以簇中心为圆心,以2δv为半径,选取圆中所有的像素归为同一车道线。重复该步骤,直到将所有的车道线像素分配给对应的车道。

(作者:liyonghong
链接:https://www.jianshu.com/p/c6d38d648509
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。)

网络架构

基于ENet架构。具体如图2所示。两个分支的损失同等的后向传播。
关于ENet的论文:

https://arxiv.org/pdf/1606.02147.pdf

解读:

https://blog.csdn.net/u011974639/article/details/78956380

ENet各个子模块介绍(caffe实现):

https://blog.csdn.net/u013241583/article/details/90170369
https://blog.csdn.net/u013241583/article/details/90171242
https://blog.csdn.net/u013241583/article/details/90174188
https://blog.csdn.net/u013241583/article/details/90174490

ENet网络结构图
在这里插入图片描述
本文对ENet网络进行了略微的修改:

  • LaneNet的体系结构是基于编码器-解码器网络ENet[29],该网络因此被修改为一个双分支网络。由于ENet的编码器比解码器包含更多的参数,完全共享两个任务之间的完整编码器将导致不满意的结果[27]。因此,虽然原始的ENet编码器包括三个阶段(阶段1、2、3),但LaneNet只在两个分支之间共享前两个阶段(阶段1和2),ENet编码器的阶段3和完整的ENet解码器作为每个独立分支的主干。分割分支的最后一层输出一个通道图像(二值分割),而嵌入分支的最后一层输出一个N通道图像,嵌入维数为N。如图2所示。每个支路的损失项是相等加权的,并通过网络反向传播。

本文设置

嵌入向量维度4
δv = 0.5
δd = 3
输入尺寸:512x256
Adam
batch size = 8
learning rate = 5e-4

HNet

有了车道线实例之后,为了参数化描述车道线:鸟瞰图(在保持计算效率的同时提高拟合的质量)。先将实例转化到鸟瞰图上,再转换回原图。(神经网络拟合变换矩阵)具体如下。

LaneNet输出每个车道的像素集合,仍然需要通过这些像素拟合一条曲线来得到参数化的车道。在原图中直接拟合效果并不好(需要高次多项式)。因此将LaneNet的输出(像素集合)转换为鸟瞰图来拟合。如果直接用固定的变换矩阵,就会导致如图4(2):fixed所示的效果。因此本文采用H-Net输出变换矩阵,在这种变换中,车道可以用一个低阶多项式进行最佳拟合,效果如图4(2):cond所示。
在这里插入图片描述
其中,变换矩阵H有6个自由度。(放置这些零是为了强制要求水平线在转换下保持水平。)

即坐标y的变换不受坐标x的影响

在这里插入图片描述

车道线拟合

(原文对这里讲的过于复杂,其实看下面即可)
在这里插入图片描述
具体过程如图3所示。
在这里插入图片描述
网络架构如表1所示。
在这里插入图片描述

本文设置

训练用于三阶多项式拟合
输入尺寸128x64
Adam
batch size = 10
learning rate = 5e-5

实验

tuSimple数据集。3626训练集,2782测试集。
accuracy:每幅图像的平均正确点数。
在这里插入图片描述
Cim:定位对的点的数量
Sim:ground truth点的数量
(小于指定阈值时被认为正确)

false positive and false negative scores
在这里插入图片描述
Fpred:错误预测的车道数
Npred:预测的车道数
Mpred:错过的ground-truth车道数
Ngt:所有ground-truth车道数

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

附一篇优秀的解读:

https://www.jianshu.com/p/c6d38d648509

代码

https://github.com/ms5898/LaneNet-PyTorch

应用pytorch框架
注:该代码并未对Hnet进行复现。而只是用sklearn的linearregress进行拟合。


    Python 3.7
    PyTorch 1.4.0
    torchvision
    sklearn 0.22.1
    NumPy 1.18.2

数据集

TuSimple 数据集

下载

  1. 解压train_set.zip、test_set.zip到文件夹ECBM6040-Project/TUSIMPLE
  2. test_label.json放到ECBM6040-Project/TUSIMPLE/test_set(从test_set.zip解压出来的)

准备

  1. 将train_set加工为ground truth image, binary ground truth and instance ground truth
python utils/process_training_dataset_2.py --src_dir (your train_set folder place)
for me this step is: python utils/process_training_dataset_2.py --src_dir /Users/smiffy/Documents/GitHub/ECBM6040-Project/TUSIMPLE/train_set

解读process_training_dataset_2.py:这个py文件就是将图森数据集中的训练集进一步划分为训练、验证、测试集。并保存起来。

import argparse # https://blog.csdn.net/yy_diego/article/details/82851661
import glob
import json
import os
import os.path as ops
import shutil

import cv2
import numpy as np


def init_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--src_dir', type=str, help='The origin path of unzipped tusimple dataset')
    return parser.parse_args() # 返回Namespace


def get_image_to_folders(json_label_path, gt_image_dir, gt_binary_dir, gt_instance_dir, src_dir):
    image_nums = len(os.listdir(gt_image_dir)) # 记录当前目录下的文件图片数(如果`process_training_dataset_2.py`运行了不止一次,则数量会不对。需要删掉training文件夹再重新运行。正常应该是3626(测试集)数。
    with open(json_label_path, 'r') as file:
        for line_index, line in enumerate(file):
            info_dict = json.loads(line)

            raw_file = info_dict['raw_file']
            h_samples = info_dict['h_samples']
            lanes = info_dict['lanes']

            image_path = ops.join(src_dir, raw_file)
            image_name_new = '{:s}.png'.format('{:d}'.format(line_index + image_nums).zfill(4)) # zfill():返回指定长度字符(右起)。https://www.runoob.com/python/att-string-zfill.html
            image_output_path = ops.join(ops.split(src_dir)[0], 'training', 'gt_image', image_name_new)
            binary_output_path = ops.join(ops.split(src_dir)[0], 'training', 'gt_binary_image', image_name_new)
            instance_output_path = ops.join(ops.split(src_dir)[0], 'training', 'gt_instance_image', image_name_new)

            src_image = cv2.imread(image_path, cv2.IMREAD_COLOR) # cv2.IMREAD_COLOR:加载一张彩色图片,忽视它的透明度。
            dst_binary_image = np.zeros([src_image.shape[0], src_image.shape[1]], np.uint8)
            dst_instance_image = np.zeros([src_image.shape[0], src_image.shape[1]], np.uint8)

            for lane_index, lane in enumerate(lanes): # 图森数据集介绍https://blog.csdn.net/qq_38096703/article/details/105513685
                assert len(h_samples) == len(lane) # 除去无效图片。
                lane_x = []
                lane_y = []
                for index in range(len(lane)): # 去除无效点。
                    if lane[index] == -2:
                        continue
                    else:
                        ptx = lane[index] # 有效x点
                        pty = h_samples[index] # 有效y点
                        lane_x.append(ptx) # lane_x:一张图中一条车道线的所有有效x点
                        lane_y.append(pty) # lane_y:一张图中一条车道线的所有有效y点
                if not lane_x:
                    continue
                lane_pts = np.vstack((lane_x, lane_y)).transpose() # np.vstack:按垂直方向(行顺序)堆叠数组构成一个新的数组。transpose:转置。https://blog.csdn.net/xiongchengluo1129/article/details/79017142
                lane_pts = np.array([lane_pts], np.int64)

                cv2.polylines(dst_binary_image, lane_pts, isClosed=False, color=255, thickness=5)
                cv2.polylines(dst_instance_image, lane_pts, isClosed=False, color=lane_index * 50 + 20, thickness=5) # 通过color控制线条颜色。

            cv2.imwrite(binary_output_path, dst_binary_image) # 写入
            cv2.imwrite(instance_output_path, dst_instance_image)
            cv2.imwrite(image_output_path, src_image)
        print('Process {:s} success'.format(json_label_path)) # 打印完成信息。


def gen_train_sample(src_dir, b_gt_image_dir, i_gt_image_dir, image_dir):
    os.makedirs('{:s}/txt_for_local'.format(ops.split(src_dir)[0]), exist_ok=True)
    with open('{:s}/txt_for_local/train.txt'.format(ops.split(src_dir)[0]), 'w') as file:
        for image_name in os.listdir(b_gt_image_dir): # os.listdir() 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表。https://www.runoob.com/python/os-listdir.html
            if not image_name.endswith('.png'):
                continue
            binary_gt_image_path = ops.join(b_gt_image_dir, image_name)
            instance_gt_image_path = ops.join(i_gt_image_dir, image_name)
            image_path = ops.join(image_dir, image_name)

            b_gt_image = cv2.imread(binary_gt_image_path, cv2.IMREAD_COLOR)
            i_gt_image = cv2.imread(instance_gt_image_path, cv2.IMREAD_COLOR)
            image = cv2.imread(image_path, cv2.IMREAD_COLOR)

            if b_gt_image is None or image is None or i_gt_image is None:
                print('Image set: {:s} broken'.format(image_name))
                continue
            else:
                info = '{:s} {:s} {:s}'.format(image_path, binary_gt_image_path, instance_gt_image_path)
                file.write(info + '\n') # 三张对应的图片为一行
    return


def split_train_txt(src_dir):
    train_file_path =  '{:s}/txt_for_local/train.txt'.format(ops.split(src_dir)[0])
    test_file_path = '{:s}/txt_for_local/test.txt'.format(ops.split(src_dir)[0])
    valid_file_path = '{:s}/txt_for_local/val.txt'.format(ops.split(src_dir)[0])
    with open(train_file_path, 'r') as file: # 对图森数据集中的测试集再进一步划分
        data = file.readlines()
        train_data = data[0:int(len(data)*0.8)] # 2900
        test_data = data[int(len(data)*0.8): int(len(data)*0.9)] # 363
        valid_data = data[int(len(data) * 0.9): -1] # 362
    with open(train_file_path, 'w') as file:
        for d in train_data:
            file.write(d)
    with open(test_file_path, 'w') as file:
        for d in test_data:
            file.write(d)
    with open(valid_file_path, 'w') as file:
        for d in valid_data:
            file.write(d)


def process_tusimple_dataset(src_dir):
    traing_folder_path = ops.join(ops.split(src_dir)[0], 'training') # os.path.split():https://blog.csdn.net/xijuezhu8128/article/details/87861417 os.path.join():https://www.jb51.net/article/171478.htm
    os.makedirs(traing_folder_path, exist_ok=True) # 创建目录

    gt_image_dir = ops.join(traing_folder_path, 'gt_image')
    gt_binary_dir = ops.join(traing_folder_path, 'gt_binary_image')
    gt_instance_dir = ops.join(traing_folder_path, 'gt_instance_image')

    os.makedirs(gt_image_dir, exist_ok=True)
    os.makedirs(gt_binary_dir, exist_ok=True)
    os.makedirs(gt_instance_dir, exist_ok=True)

    for json_label_path in glob.glob('{:s}/*.json'.format(src_dir)): # glob.glob:获取指定类型文件。https://blog.csdn.net/georgeai/article/details/81035422
        get_image_to_folders(json_label_path, gt_image_dir, gt_binary_dir, gt_instance_dir, src_dir) # 将图像放到文件夹中
    gen_train_sample(src_dir, gt_binary_dir, gt_instance_dir, gt_image_dir) # 把training中三个文件夹(gt_binary_image、gt_image、gt_instance_image)的每一张图片文件名	对应写入train.txt文件中
    split_train_txt(src_dir)


if __name__ == '__main__':
    args = init_args()
    process_tusimple_dataset(args.src_dir)

为了单步调试process_training_dataset_2.py文件,将第一个函数修改如下:

def init_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--src_dir', type=str, default='/home/wqf/ECBM6040-Project/TUSIMPLE/train_set', help='The origin path of unzipped tusimple dataset')
    return parser.parse_args()

你的TUSIMPLE目录应该类似下图。
在这里插入图片描述

训练LaneNet(基于E-Net)

  1. ECBM6040-Project/Notebook-experiment/Dataset Show.ipynb查看用于训练的数据集。代码如下:
import os.path as ops
import numpy as np
import torch
import cv2
import sys
sys.path.append('..') # ..代表上一级目录。如果不写,下一行导入模块是用不了的。https://www.cnblogs.com/mandy-study/p/7735801.html
from dataset.dataset_utils import TUSIMPLE, TUSIMPLE_AUG

# Build The datasets
# root = '/Users/smiffy/Documents/GitHub/TUSIMPLE/Data_Tusimple_PyTorch/training'
root = '../TUSIMPLE/txt_for_local'

train_set = TUSIMPLE(root=root, flag='train')
valid_set = TUSIMPLE(root=root, flag='valid')
test_set = TUSIMPLE(root=root, flag='test')

print('train_set length {}'.format(len(train_set))) # 调用__len__方法。返回2900
print('valid_set length {}'.format(len(valid_set))) # 362
print('test_set length {}'.format(len(test_set))) # 363

gt, bgt, igt = train_set[280] # 选取一张图片
print('image type {}'.format(type(gt))) # image type <class 'torch.Tensor'>
print('image size {} \n'.format(gt.size())) # image size torch.Size([3, 256, 512]) 

print('gt binary image type {}'.format(type(bgt))) # gt binary image type <class 'torch.Tensor'>
print('gt binary image size {}'.format(bgt.size())) # gt binary image size torch.Size([256, 512])
print('items in gt binary image {} \n'.format(torch.unique(bgt))) # items in gt binary image tensor([0, 1]) 

print('gt instance type {}'.format(type(igt))) # gt instance type <class 'torch.Tensor'>
print('gt instance size {}'.format(igt.size())) # gt instance size torch.Size([256, 512])
print('items in gt instance {} \n'.format(torch.unique(igt))) # items in gt instance tensor([  0,  20,  70, 120, 170]) 

# Show the images
image_show = ((gt.numpy() + 1) * 127.5).astype(int) # 
image_show.shape # (3, 256, 512)

import matplotlib.pyplot as plt
# image_show = image_show[...,::-1]
plt.figure(figsize=(15,15))
image_show = image_show.transpose(1,2,0)
image_show = image_show[...,::-1]
plt.imshow(image_show)

bgt.shape # torch.Size([256, 512])

plt.figure(figsize=(20,20))
ax1 = plt.subplot(121)
plt.imshow(bgt, cmap='gray')
ax1 = plt.subplot(122)
plt.imshow(igt, cmap='gray')

# Aug Dataset
# root = '/Users/smiffy/Documents/GitHub/TUSIMPLE/Data_Tusimple_PyTorch/training'
root = '../TUSIMPLE/txt_for_local'

train_set = TUSIMPLE_AUG(root=root, flag='train')
valid_set = TUSIMPLE_AUG(root=root, flag='valid')
test_set = TUSIMPLE_AUG(root=root, flag='test')

print('train_set length {}'.format(len(train_set))) # 2900x2
print('valid_set length {}'.format(len(valid_set))) # 362x2
print('test_set length {}'.format(len(test_set)))  # 363x2

idx = 280
gt, bgt, igt = train_set[idx]
gt_aug, bgt_aug, igt_aug = train_set[idx+1]
print('image type {}'.format(type(gt)))
print('image size {} \n'.format(gt.size()))

print('gt binary image type {}'.format(type(bgt)))
print('gt binary image size {}'.format(bgt.size()))
print('items in gt binary image {} \n'.format(torch.unique(bgt)))

print('gt instance type {}'.format(type(igt)))
print('gt instance size {}'.format(igt.size()))
print('items in gt instance {} \n'.format(torch.unique(igt)))

image_show = ((gt.numpy() + 1) * 127.5).astype(int)
image_show_aug = ((gt_aug.numpy() + 1) * 127.5).astype(int)
image_show.shape

import matplotlib.pyplot as plt
# image_show = image_show[...,::-1]
plt.figure(figsize=(20,20))
ax1 = plt.subplot(121)
image_show = image_show.transpose(1,2,0)
image_show = image_show[...,::-1]
plt.imshow(image_show)

ax1 = plt.subplot(122)
image_show_aug = image_show_aug.transpose(1,2,0)
image_show_aug = image_show_aug[...,::-1]
plt.imshow(image_show_aug)

plt.show()

plt.figure(figsize=(20,20))
ax1 = plt.subplot(121)
plt.imshow(bgt, cmap='gray')
ax1 = plt.subplot(122)
plt.imshow(igt, cmap='gray')

plt.figure(figsize=(20,20))
ax1 = plt.subplot(121)
plt.imshow(bgt_aug, cmap='gray')
ax1 = plt.subplot(122)
plt.imshow(igt_aug, cmap='gray')

在上述文件导入的本地模块from dataset.dataset_utils import TUSIMPLE, TUSIMPLE_AUG代码解读如下:

import os.path as ops
import numpy as np
import torch
import cv2
import torchvision


class TUSIMPLE(torch.utils.data.Dataset): # torch.utils.data.Dataset是代表自定义数据集方法的抽象类,你可以自己定义你的数据类继承这个抽象类,非常简单,只需要定义__len__和__getitem__这两个方法就可以。。https://blog.csdn.net/qq_36653505/article/details/83351808
    def __init__(self, root, transforms=None, resize=(512, 256), flag='train'):
        self.root = root
        self.transforms = transforms
        self.resize = resize
        self.flag = flag

        self.img_pathes = []

        self.train_file = ops.join(root, 'train.txt') # 进入自己写的模块,见下面代码块解析。
        self.val_file = ops.join(root, 'val.txt')
        self.test_file = ops.join(root, 'test.txt')

        if self.flag == 'train':
            file_open = self.train_file
        elif self.flag == 'valid':
            file_open = self.val_file
        else:
            file_open = self.test_file

        with open(file_open, 'r') as file:
            data = file.readlines()
            for l in data: # l:'/home/wqf/ECBM6040-Project/TUSIMPLE/training/gt_image/0487.png /home/wqf/ECBM6040-Project/TUSIMPLE/training/gt_binary_image/0487.png /home/wqf/ECBM6040-Project/TUSIMPLE/training/gt_instance_image/0487.png
'
                line = l.split() # line:{list:3}
                self.img_pathes.append(line) # {list:{list:3}}

    def __len__(self): # __len__是魔法方法,它可以让你的自定义类使用len()方法来直接获取类的长度值,len() 是内置的方法,对于python的一些内置的类,比如列表(list),字符串(str),子节等,可以直接使用。但是,如果你的自定义类不包含__len__方法,len()函数在终端运行中是会报错的哦。https://blog.csdn.net/qq_38883271/article/details/96439208
        return len(self.img_pathes) # 返回图片数量

    def __getitem__(self, idx): # 如果在类中定义了__getitem__()方法,那么他的实例对象(假设为P)就可以这样P[key]取值。当实例对象做P[key]运算时,就会调用类中的__getitem__()方法。https://blog.csdn.net/chituozha5528/article/details/78354833
        gt_image = cv2.imread(self.img_pathes[idx][0], cv2.IMREAD_UNCHANGED) # 读取图片
        gt_binary_image = cv2.imread(self.img_pathes[idx][1], cv2.IMREAD_UNCHANGED)
        gt_instance = cv2.imread(self.img_pathes[idx][2], cv2.IMREAD_UNCHANGED)

        gt_image = cv2.resize(gt_image, dsize=self.resize, interpolation=cv2.INTER_LINEAR) # resize
        gt_binary_image = cv2.resize(gt_binary_image, dsize=self.resize, interpolation=cv2.INTER_NEAREST)
        gt_instance = cv2.resize(gt_instance, dsize=self.resize, interpolation=cv2.INTER_NEAREST)

        gt_image = gt_image / 127.5 - 1.0 # 归一化到[-1,1]
        gt_binary_image = np.array(gt_binary_image / 255.0, dtype=np.uint8) # 归一化到[0,1]
        gt_binary_image = gt_binary_image[:, :, np.newaxis]
        gt_instance = gt_instance[:, :, np.newaxis]

        gt_binary_image = np.transpose(gt_binary_image, (2, 0, 1)) # (1,256,512)
        gt_instance = np.transpose(gt_instance, (2, 0, 1))

        gt_image = torch.tensor(gt_image, dtype=torch.float)
        gt_image = np.transpose(gt_image, (2, 0, 1))
        # trsf = torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), inplace=False)
        # gt_image = trsf(gt_image)

        gt_binary_image = torch.tensor(gt_binary_image, dtype=torch.long).view(self.resize[1], self.resize[0])
        #gt_binary_image = torch.tensor(gt_binary_image, dtype=torch.float)
        # gt_instance = torch.tensor(gt_instance, dtype=torch.float)
        gt_instance = torch.tensor(gt_instance, dtype=torch.long).view(self.resize[1], self.resize[0])

        return gt_image, gt_binary_image, gt_instance # 返回tensor。关于pytroch中数据格式的转换还是不太明白。

    
class TUSIMPLE_AUG(torch.utils.data.Dataset): # 数据增强
    def __init__(self, root, transforms=None, resize=(512, 256), flag='train'):
        self.root = root
        self.transforms = transforms
        self.resize = resize
        self.flag = flag

        self.img_pathes = []

        self.train_file = ops.join(root, 'train.txt')
        self.val_file = ops.join(root, 'val.txt')
        self.test_file = ops.join(root, 'test.txt')

        if self.flag == 'train':
            file_open = self.train_file
        elif self.flag == 'valid':
            file_open = self.val_file
        else:
            file_open = self.test_file

        with open(file_open, 'r') as file:
            data = file.readlines()
            for l in data:
                line = l.split()
                self.img_pathes.append(line)

    def __len__(self):
        return len(self.img_pathes) * 2

    def __getitem__(self, idx):
        if idx % 2 == 0:
            gt_image = cv2.imread(self.img_pathes[int(idx/2)][0], cv2.IMREAD_UNCHANGED) # cv2.IMREAD_UNCHANGED:顾名思义,读入完整图片,包括alpha通道。也就是透明度通道。https://wendao.blog.csdn.net/article/details/98768293
            gt_binary_image = cv2.imread(self.img_pathes[int(idx/2)][1], cv2.IMREAD_UNCHANGED)
            gt_instance = cv2.imread(self.img_pathes[int(idx/2)][2], cv2.IMREAD_UNCHANGED)
        else:
            gt_image = cv2.imread(self.img_pathes[int((idx-1)/2)][0], cv2.IMREAD_UNCHANGED)
            gt_binary_image = cv2.imread(self.img_pathes[int((idx-1)/2)][1], cv2.IMREAD_UNCHANGED)
            gt_instance = cv2.imread(self.img_pathes[int((idx-1)/2)][2], cv2.IMREAD_UNCHANGED)

            gt_image = cv2.flip(gt_image, 1) # 水平翻转图像
            gt_binary_image = cv2.flip(gt_binary_image, 1)
            gt_instance = cv2.flip(gt_instance, 1)

        gt_image = cv2.resize(gt_image, dsize=self.resize, interpolation=cv2.INTER_LINEAR)
        gt_binary_image = cv2.resize(gt_binary_image, dsize=self.resize, interpolation=cv2.INTER_NEAREST)
        gt_instance = cv2.resize(gt_instance, dsize=self.resize, interpolation=cv2.INTER_NEAREST)

        gt_image = gt_image / 127.5 - 1.0
        gt_binary_image = np.array(gt_binary_image / 255.0, dtype=np.uint8)
        gt_binary_image = gt_binary_image[:, :, np.newaxis]
        gt_instance = gt_instance[:, :, np.newaxis]

        gt_binary_image = np.transpose(gt_binary_image, (2, 0, 1))
        gt_instance = np.transpose(gt_instance, (2, 0, 1))

        gt_image = torch.tensor(gt_image, dtype=torch.float)
        gt_image = np.transpose(gt_image, (2, 0, 1))
        # trsf = torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), inplace=False)
        # gt_image = trsf(gt_image)

        gt_binary_image = torch.tensor(gt_binary_image, dtype=torch.long).view(self.resize[1], self.resize[0])
        # gt_binary_image = torch.tensor(gt_binary_image, dtype=torch.float)
        # gt_instance = torch.tensor(gt_instance, dtype=torch.float)
        gt_instance = torch.tensor(gt_instance, dtype=torch.long).view(self.resize[1], self.resize[0])

        return gt_image, gt_binary_image, gt_instance
  1. ECBM6040-Project/Train.ipynb训练LaneNet,模型将会保存在ECBM6040-Project/TUSIMPLE/Lanenet_output。代码如下
import os.path as ops
import numpy as np
import torch
import cv2
import time
from dataset.dataset_utils import TUSIMPLE
from Lanenet.model2 import Lanenet

# define the dataset
# root = '/Users/smiffy/Documents/GitHub/TUSIMPLE/Data_Tusimple_PyTorch/training'
root = 'TUSIMPLE/txt_for_local'
train_set = TUSIMPLE(root=root, flag='train')
valid_set = TUSIMPLE(root=root, flag='valid')
test_set = TUSIMPLE(root=root, flag='test')

print('train_set length {}'.format(len(train_set)))
print('valid_set length {}'.format(len(valid_set)))
print('test_set length {}'.format(len(test_set)))

gt, bgt, igt = train_set[0]
print('image type {}'.format(type(gt)))
print('image size {} \n'.format(gt.size()))

print('gt binary image type {}'.format(type(bgt)))
print('gt binary image size {}'.format(bgt.size()))
print('items in gt binary image {} \n'.format(torch.unique(bgt)))

print('gt instance type {}'.format(type(igt)))
print('gt instance size {}'.format(igt.size()))
print('items in gt instance {} \n'.format(torch.unique(igt)))

# DataLoader
batch_size = 8

data_loader_train = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0) # num_workers进程数。为什么要设为0不太明白(使用主进程导入),是关于《操作系统》方面的知识。要补。
data_loader_valid = torch.utils.data.DataLoader(valid_set, batch_size=1, shuffle=True, num_workers=0) # torch.utils.data.DataLoader:用来把训练数据分成多个小组,此函数每次抛出一组数据。直至把所有的数据都抛出。
data_loader_test = torch.utils.data.DataLoader(test_set, batch_size=1, shuffle=False, num_workers=0)

# Model and optim
learning_rate = 5e-4

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

LaneNet_model = Lanenet(2, 4) # 二进制分类,4个嵌入纬度表示
LaneNet_model.to(device) # 放到gpu

params = [p for p in LaneNet_model.parameters() if p.requires_grad] # 获取网络中所有需要梯度更新的参数。requires_grad : https://blog.csdn.net/xuyi582605786/article/details/104973079/
optimizer = torch.optim.Adam(params, lr=learning_rate, weight_decay=0.0002)

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) # 动态调整学习率。具体策略用到再看。https://www.cnblogs.com/zf-blog/p/11262906.html

num_epochs = 30

from Lanenet.cluster_loss3 import cluster_loss
criterion = cluster_loss() # 实例化聚类损失
# criterion = torch.nn.CrossEntropyLoss(weight=torch.tensor([ 1.4393, 27.7296]).cuda())

from torch.autograd import Variable

loss_all = []
for epoch in range(num_epochs): # data_loader_train所有图片经过一次计算叫一个epoch
    LaneNet_model.train() # 告知程序开启训练模式
    ts = time.time()
    for iter, batch in enumerate(data_loader_train): # 迭代器。每个batch经过一次计算叫一个iter
        input_image = Variable(batch[0]).to(device) # 一个batch的数据放到gpu上
        binary_labels = Variable(batch[1]).to(device)
        instance_labels = Variable(batch[2]).to(device)
        
        binary_final_logits, instance_embedding = LaneNet_model(input_image) # 返回的是网络预测的二进制图片和实例嵌入
        # loss = LaneNet_model.compute_loss(binary_logits=binary_final_logits, binary_labels=binary_labels,
        #                               instance_logits=instance_embedding, instance_labels=instance_labels, delta_v=0.5, delta_d=3)
        binary_segmenatation_loss, instance_segmenatation_loss = criterion(binary_logits=binary_final_logits, binary_labels=binary_labels,
                                       instance_logits=instance_embedding, instance_labels=instance_labels, delta_v=0.5, delta_d=3) # 计算损失
        
        # binary_segmenatation_loss = criterion(binary_final_logits, binary_labels)
        loss = 1*binary_segmenatation_loss + 1*instance_segmenatation_loss # 文中提到。一样看待二者损失
        optimizer.zero_grad()
        loss_all.append(loss.item()) # 一个元素张量可以用item得到元素值。https://www.jianshu.com/p/be3276b434b2
        loss.backward()
        optimizer.step()
        
        if iter % 20 == 0:
            print("epoch[{}] iter[{}] loss: [{}, {}] ".format(epoch, iter, binary_segmenatation_loss.item(), instance_segmenatation_loss.item()))
    lr_scheduler.step() # 学习率也动态变化
    print("Finish epoch[{}], time elapsed[{}]".format(epoch, time.time() - ts))
    torch.save(LaneNet_model.state_dict(),                        f"TUSIMPLE/Lanenet_output/lanenet_epoch_{epoch}_batch_{8}.model")

# Show the Loss
import matplotlib.pylab as plt
plt.plot(loss_all)

可以看到loss下降的非常快:
在这里插入图片描述

在上述文件导入的本地模块from Lanenet.model2 import Lanenet代码解读如下:

import torch.nn as nn
import torch


class InitialBlock(nn.Module):
    """The initial block is composed of two branches:两个分支
    1. a main branch which performs a regular convolution with stride 2;main分支执行常规卷积,stride=2。输出13层
    2. an extension branch which performs max-pooling. # extension分支执行最大池化。输出3层
    Doing both operations in parallel and concatenating their results # 并行,之后concatenat。总共16层特征图
    allows for efficient downsampling and expansion. The main branch # 可以有效下采样
    outputs 13 feature maps while the extension branch outputs 3, for a 
    total of 16 feature maps after concatenation.
    Keyword arguments:
    - in_channels (int): the number of input channels.
    - out_channels (int): the number output channels.
    - kernel_size (int, optional): the kernel size of the filters used in
    the convolution layer. Default: 3.
    - padding (int, optional): zero-padding added to both sides of the
    input. Default: 0.
    - bias (bool, optional): Adds a learnable bias to the output if
    ``True``. Default: False. # 如果bias=True时输出一个学习得到的bias
    - relu (bool, optional): When ``True`` ReLU is used as the activation
    function; otherwise, PReLU is used. Default: True.
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 bias=False,
                 relu=True):
        super().__init__() # https://blog.csdn.net/a__int__/article/details/104600972

        if relu:
            activation = nn.ReLU
        else:
            activation = nn.PReLU

        # Main branch - As stated above the number of output channels for this
        # branch is the total minus 3, since the remaining channels come from
        # the extension branch
        self.main_branch = nn.Conv2d(
            in_channels,
            out_channels - 3,
            kernel_size=3,
            stride=2,
            padding=1,
            bias=bias)

        # Extension branch
        self.ext_branch = nn.MaxPool2d(3, stride=2, padding=1)

        # Initialize batch normalization to be used after concatenation
        self.batch_norm = nn.BatchNorm2d(out_channels)

        # PReLU layer to apply after concatenating the branches
        self.out_activation = activation()

    def forward(self, x):
        main = self.main_branch(x)
        ext = self.ext_branch(x)

        # Concatenate branches
        out = torch.cat((main, ext), 1)

        # Apply batch normalization
        out = self.batch_norm(out)

        return self.out_activation(out)


class RegularBottleneck(nn.Module):
    """Regular bottlenecks are the main building block of ENet.
    Main branch:
    1. Shortcut connection.
    Extension branch:
    1. 1x1 convolution which decreases the number of channels by
    ``internal_ratio``, also called a projection;
    2. regular, dilated or asymmetric convolution;
    3. 1x1 convolution which increases the number of channels back to
    ``channels``, also called an expansion;
    4. dropout as a regularizer.
    Keyword arguments:
    - channels (int): the number of input and output channels.
    - internal_ratio (int, optional): a scale factor applied to
    ``channels`` used to compute the number of
    channels after the projection. eg. given ``channels`` equal to 128 and
    internal_ratio equal to 2 the number of channels after the projection
    is 64. Default: 4.
    - kernel_size (int, optional): the kernel size of the filters used in
    the convolution layer described above in item 2 of the extension
    branch. Default: 3.
    - padding (int, optional): zero-padding added to both sides of the
    input. Default: 0.
    - dilation (int, optional): spacing between kernel elements for the
    convolution described in item 2 of the extension branch. Default: 1.dilation卷积是卷积核之间留有间隙。
    asymmetric (bool, optional): flags if the convolution described in
    item 2 of the extension branch is asymmetric or not. Default: False.标记扩展分支的第2项中描述的卷积是否不对称。
    - dropout_prob (float, optional): probability of an element to be
    zeroed. Default: 0 (no dropout).
    - bias (bool, optional): Adds a learnable bias to the output if
    ``True``. Default: False.
    - relu (bool, optional): When ``True`` ReLU is used as the activation
    function; otherwise, PReLU is used. Default: True.
    """

    def __init__(self,
                 channels,
                 internal_ratio=4,
                 kernel_size=3,
                 padding=0,
                 dilation=1,
                 asymmetric=False,
                 dropout_prob=0,
                 bias=False,
                 relu=True):
        super().__init__()

        # Check in the internal_scale parameter is within the expected range
        # [1, channels]
        if internal_ratio <= 1 or internal_ratio > channels:
            raise RuntimeError("Value out of range. Expected value in the "
                               "interval [1, {0}], got internal_scale={1}."
                               .format(channels, internal_ratio))

        internal_channels = channels // internal_ratio

        if relu:
            activation = nn.ReLU
        else:
            activation = nn.PReLU

        # Main branch - shortcut connection

        # Extension branch - 1x1 convolution, followed by a regular, dilated or
        # asymmetric convolution, followed by another 1x1 convolution, and,
        # finally, a regularizer (spatial dropout). Number of channels is constant.

        # 1x1 projection convolution
        self.ext_conv1 = nn.Sequential(
            nn.Conv2d(
                channels,
                internal_channels,
                kernel_size=1,
                stride=1,
                bias=bias), nn.BatchNorm2d(internal_channels), activation())

        # If the convolution is asymmetric we split the main convolution in
        # two. Eg. for a 5x5 asymmetric convolution we have two convolution:
        # the first is 5x1 and the second is 1x5.可分解卷积
        if asymmetric:
            self.ext_conv2 = nn.Sequential(
                nn.Conv2d(
                    internal_channels,
                    internal_channels,
                    kernel_size=(kernel_size, 1),
                    stride=1,
                    padding=(padding, 0),
                    dilation=dilation,
                    bias=bias), nn.BatchNorm2d(internal_channels), activation(),
                nn.Conv2d(
                    internal_channels,
                    internal_channels,
                    kernel_size=(1, kernel_size),
                    stride=1,
                    padding=(0, padding),
                    dilation=dilation,
                    bias=bias), nn.BatchNorm2d(internal_channels), activation())
        else:
            self.ext_conv2 = nn.Sequential(
                nn.Conv2d(
                    internal_channels,
                    internal_channels,
                    kernel_size=kernel_size,
                    stride=1,
                    padding=padding,
                    dilation=dilation,
                    bias=bias), nn.BatchNorm2d(internal_channels), activation())

        # 1x1 expansion convolution
        self.ext_conv3 = nn.Sequential(
            nn.Conv2d(
                internal_channels,
                channels,
                kernel_size=1,
                stride=1,
                bias=bias), nn.BatchNorm2d(channels), activation())

        self.ext_regul = nn.Dropout2d(p=dropout_prob)

        # PReLU layer to apply after adding the branches
        self.out_activation = activation()

    def forward(self, x):
        # Main branch shortcut
        main = x

        # Extension branch
        ext = self.ext_conv1(x)
        ext = self.ext_conv2(ext)
        ext = self.ext_conv3(ext)
        ext = self.ext_regul(ext)

        # Add main and extension branches
        out = main + ext

        return self.out_activation(out)


class DownsamplingBottleneck(nn.Module):
    """Downsampling bottlenecks further downsample the feature map size.用于进一步下采样特征图。
    Main branch:
    1. max pooling with stride 2; indices are saved to be used for
    unpooling later.步长为2的最大池化,保存索引用于后续上采样
    Extension branch:
    1. 2x2 convolution with stride 2 that decreases the number of channels
    by ``internal_ratio``, also called a projection;用2x2卷积通过internal_ratio(也叫投影)降采样通道数
    2. regular convolution (by default, 3x3);
    3. 1x1 convolution which increases the number of channels to
    ``out_channels``, also called an expansion;1x1卷积增加输出通道数,也叫expansion
    4. dropout as a regularizer.dropout实现正则化
    Keyword arguments:
    - in_channels (int): the number of input channels.
    - out_channels (int): the number of output channels.
    - internal_ratio (int, optional): a scale factor applied to ``channels``
    used to compute the number of channels after the projection. eg. given
    ``channels`` equal to 128 and internal_ratio equal to 2 the number of
    channels after the projection is 64. Default: 4.应用于“通道”的比例因子,用于计算投影后的通道数。例如,给定``通道``等于128,internal_ratio``等于2,则投影后的通道数是64。默认值:4。
    - return_indices (bool, optional):  if ``True``, will return the max
    indices along with the outputs. Useful when unpooling later.和输出一起返回最大值索引,上采样时用到
    - dropout_prob (float, optional): probability of an element to be
    zeroed. Default: 0 (no dropout).
    - bias (bool, optional): Adds a learnable bias to the output if
    ``True``. Default: False.
    - relu (bool, optional): When ``True`` ReLU is used as the activation
    function; otherwise, PReLU is used. Default: True.
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 internal_ratio=4,
                 return_indices=False,
                 dropout_prob=0,
                 bias=False,
                 relu=True):
        super().__init__()

        # Store parameters that are needed later
        self.return_indices = return_indices

        # Check in the internal_scale parameter is within the expected range
        # [1, channels]
        if internal_ratio <= 1 or internal_ratio > in_channels:
            raise RuntimeError("Value out of range. Expected value in the "
                               "interval [1, {0}], got internal_scale={1}. "
                               .format(in_channels, internal_ratio))

        internal_channels = in_channels // internal_ratio

        if relu:
            activation = nn.ReLU
        else:
            activation = nn.PReLU

        # Main branch - max pooling followed by feature map (channels) padding
        self.main_max1 = nn.MaxPool2d(
            2,
            stride=2,
            return_indices=return_indices)

        # Extension branch - 2x2 convolution, followed by a regular, dilated or
        # asymmetric convolution, followed by another 1x1 convolution. Number
        # of channels is doubled.

        # 2x2 projection convolution with stride 2
        self.ext_conv1 = nn.Sequential(
            nn.Conv2d(
                in_channels,
                internal_channels,
                kernel_size=2,
                stride=2,
                bias=bias), nn.BatchNorm2d(internal_channels), activation())

        # Convolution
        self.ext_conv2 = nn.Sequential(
            nn.Conv2d(
                internal_channels,
                internal_channels,
                kernel_size=3,
                stride=1,
                padding=1,
                bias=bias), nn.BatchNorm2d(internal_channels), activation())

        # 1x1 expansion convolution
        self.ext_conv3 = nn.Sequential(
            nn.Conv2d(
                internal_channels,
                out_channels,
                kernel_size=1,
                stride=1,
                bias=bias), nn.BatchNorm2d(out_channels), activation())

        self.ext_regul = nn.Dropout2d(p=dropout_prob)

        # PReLU layer to apply after concatenating the branches
        self.out_activation = activation()

    def forward(self, x):
        # Main branch shortcut
        if self.return_indices:
            main, max_indices = self.main_max1(x)
        else:
            main = self.main_max1(x)

        # Extension branch
        ext = self.ext_conv1(x)
        ext = self.ext_conv2(ext)
        ext = self.ext_conv3(ext)
        ext = self.ext_regul(ext)

        # Main branch channel padding
        n, ch_ext, h, w = ext.size()
        ch_main = main.size()[1]
        padding = torch.zeros(n, ch_ext - ch_main, h, w)

        # Before concatenating, check if main is on the CPU or GPU and
        # convert padding accordingly
        if main.is_cuda:
            padding = padding.cuda()

        # Concatenate
        main = torch.cat((main, padding), 1)

        # Add main and extension branches
        out = main + ext

        return self.out_activation(out), max_indices


class UpsamplingBottleneck(nn.Module):
    """The upsampling bottlenecks upsample the feature map resolution using max
    pooling indices stored from the corresponding downsampling bottleneck.用下采样中索引来实现上采样
    Main branch:
    1. 1x1 convolution with stride 1 that decreases the number of channels by
    ``internal_ratio``, also called a projection;
    2. max unpool layer using the max pool indices from the corresponding
    downsampling max pool layer.
    Extension branch:
    1. 1x1 convolution with stride 1 that decreases the number of channels by
    ``internal_ratio``, also called a projection;
    2. transposed convolution (by default, 3x3);
    3. 1x1 convolution which increases the number of channels to
    ``out_channels``, also called an expansion;
    4. dropout as a regularizer.
    Keyword arguments:
    - in_channels (int): the number of input channels.
    - out_channels (int): the number of output channels.
    - internal_ratio (int, optional): a scale factor applied to ``in_channels``
     used to compute the number of channels after the projection. eg. given
     ``in_channels`` equal to 128 and ``internal_ratio`` equal to 2 the number
     of channels after the projection is 64. Default: 4.
    - dropout_prob (float, optional): probability of an element to be zeroed.
    Default: 0 (no dropout).
    - bias (bool, optional): Adds a learnable bias to the output if ``True``.
    Default: False.
    - relu (bool, optional): When ``True`` ReLU is used as the activation
    function; otherwise, PReLU is used. Default: True.
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 internal_ratio=4,
                 dropout_prob=0,
                 bias=False,
                 relu=True):
        super().__init__()

        # Check in the internal_scale parameter is within the expected range
        # [1, channels]
        if internal_ratio <= 1 or internal_ratio > in_channels:
            raise RuntimeError("Value out of range. Expected value in the "
                               "interval [1, {0}], got internal_scale={1}. "
                               .format(in_channels, internal_ratio))

        internal_channels = in_channels // internal_ratio

        if relu:
            activation = nn.ReLU
        else:
            activation = nn.PReLU

        # Main branch - max pooling followed by feature map (channels) padding
        self.main_conv1 = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=bias),
            nn.BatchNorm2d(out_channels))

        # Remember that the stride is the same as the kernel_size, just like
        # the max pooling layers
        self.main_unpool1 = nn.MaxUnpool2d(kernel_size=2)

        # Extension branch - 1x1 convolution, followed by a regular, dilated or
        # asymmetric convolution, followed by another 1x1 convolution. Number
        # of channels is doubled.

        # 1x1 projection convolution with stride 1
        self.ext_conv1 = nn.Sequential(
            nn.Conv2d(
                in_channels, internal_channels, kernel_size=1, bias=bias),
            nn.BatchNorm2d(internal_channels), activation())

        # Transposed convolution
        self.ext_tconv1 = nn.ConvTranspose2d(
            internal_channels,
            internal_channels,
            kernel_size=2,
            stride=2,
            bias=bias)
        self.ext_tconv1_bnorm = nn.BatchNorm2d(internal_channels)
        self.ext_tconv1_activation = activation()

        # 1x1 expansion convolution
        self.ext_conv2 = nn.Sequential(
            nn.Conv2d(
                internal_channels, out_channels, kernel_size=1, bias=bias),
            nn.BatchNorm2d(out_channels), activation())

        self.ext_regul = nn.Dropout2d(p=dropout_prob)

        # PReLU layer to apply after concatenating the branches
        self.out_activation = activation()

    def forward(self, x, max_indices, output_size):
        # Main branch shortcut
        main = self.main_conv1(x)
        main = self.main_unpool1(
            main, max_indices, output_size=output_size)

        # Extension branch
        ext = self.ext_conv1(x)
        ext = self.ext_tconv1(ext, output_size=output_size)
        ext = self.ext_tconv1_bnorm(ext)
        ext = self.ext_tconv1_activation(ext)
        ext = self.ext_conv2(ext)
        ext = self.ext_regul(ext)

        # Add main and extension branches
        out = main + ext

        return self.out_activation(out)


class Lanenet(nn.Module): # 目前可以理解为固定写法。https://zhuanlan.zhihu.com/p/88712978。https://blog.csdn.net/weixin_42018112/article/details/90084419
    def __init__(self, binary_seg, embedding_dim, encoder_relu=False, decoder_relu=True):
        super(Lanenet, self).__init__() # super().__init__()和 super(class,self).__init__()区别:https://www.v2ex.com/amp/t/740751。目前不太理解,好像在这里是没什么区别,后面学一些python高级编程再说吧。

        self.initial_block = InitialBlock(3, 16, relu=encoder_relu) # https://blog.csdn.net/u013241583/article/details/90170369

        # Stage 1 share
        self.downsample1_0 = DownsamplingBottleneck(16, 64, return_indices=True, dropout_prob=0.01, relu=encoder_relu) # https://blog.csdn.net/u013241583/article/details/90171242
        self.regular1_1 = RegularBottleneck(64, padding=1, dropout_prob=0.01, relu=encoder_relu)
        self.regular1_2 = RegularBottleneck(64, padding=1, dropout_prob=0.01, relu=encoder_relu)
        self.regular1_3 = RegularBottleneck(64, padding=1, dropout_prob=0.01, relu=encoder_relu)
        self.regular1_4 = RegularBottleneck(64, padding=1, dropout_prob=0.01, relu=encoder_relu)

        # Stage 2 share
        self.downsample2_0 = DownsamplingBottleneck(64, 128, return_indices=True, dropout_prob=0.1, relu=encoder_relu)
        self.regular2_1 = RegularBottleneck(128, padding=1, dropout_prob=0.1, relu=encoder_relu)
        self.dilated2_2 = RegularBottleneck(128, dilation=2, padding=2, dropout_prob=0.1, relu=encoder_relu)
        self.asymmetric2_3 = RegularBottleneck(128, kernel_size=5, padding=2, asymmetric=True, dropout_prob=0.1, relu=encoder_relu)
        self.dilated2_4 = RegularBottleneck(128, dilation=4, padding=4, dropout_prob=0.1, relu=encoder_relu)
        self.regular2_5 = RegularBottleneck(128, padding=1, dropout_prob=0.1, relu=encoder_relu)
        self.dilated2_6 = RegularBottleneck(128, dilation=8, padding=8, dropout_prob=0.1, relu=encoder_relu)
        self.asymmetric2_7 = RegularBottleneck(128, kernel_size=5, asymmetric=True, padding=2, dropout_prob=0.1, relu=encoder_relu)
        self.dilated2_8 = RegularBottleneck(128, dilation=16, padding=16, dropout_prob=0.1, relu=encoder_relu)

        # stage 3 binary
        self.regular_binary_3_0 = RegularBottleneck(128, padding=1, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_binary_3_1 = RegularBottleneck(128, dilation=2, padding=2, dropout_prob=0.1, relu=encoder_relu)
        self.asymmetric_binary_3_2 = RegularBottleneck(128, kernel_size=5, padding=2, asymmetric=True, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_binary_3_3 = RegularBottleneck(128, dilation=4, padding=4, dropout_prob=0.1, relu=encoder_relu)
        self.regular_binary_3_4 = RegularBottleneck(128, padding=1, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_binary_3_5 = RegularBottleneck(128, dilation=8, padding=8, dropout_prob=0.1, relu=encoder_relu)
        self.asymmetric_binary_3_6 = RegularBottleneck(128, kernel_size=5, asymmetric=True, padding=2, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_binary_3_7 = RegularBottleneck(128, dilation=16, padding=16, dropout_prob=0.1, relu=encoder_relu)

        # stage 3 embedding
        self.regular_embedding_3_0 = RegularBottleneck(128, padding=1, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_embedding_3_1 = RegularBottleneck(128, dilation=2, padding=2, dropout_prob=0.1, relu=encoder_relu)
        self.asymmetric_embedding_3_2 = RegularBottleneck(128, kernel_size=5, padding=2, asymmetric=True, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_embedding_3_3 = RegularBottleneck(128, dilation=4, padding=4, dropout_prob=0.1, relu=encoder_relu)
        self.regular_embedding_3_4 = RegularBottleneck(128, padding=1, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_embedding_3_5 = RegularBottleneck(128, dilation=8, padding=8, dropout_prob=0.1, relu=encoder_relu)
        self.asymmetric_bembedding_3_6 = RegularBottleneck(128, kernel_size=5, asymmetric=True, padding=2, dropout_prob=0.1, relu=encoder_relu)
        self.dilated_embedding_3_7 = RegularBottleneck(128, dilation=16, padding=16, dropout_prob=0.1, relu=encoder_relu)

        # binary branch
        self.upsample_binary_4_0 = UpsamplingBottleneck(128, 64, dropout_prob=0.1, relu=decoder_relu)
        self.regular_binary_4_1 = RegularBottleneck(64, padding=1, dropout_prob=0.1, relu=decoder_relu)
        self.regular_binary_4_2 = RegularBottleneck(64, padding=1, dropout_prob=0.1, relu=decoder_relu)
        self.upsample_binary_5_0 = UpsamplingBottleneck(64, 16, dropout_prob=0.1, relu=decoder_relu)
        self.regular_binary_5_1 = RegularBottleneck(16, padding=1, dropout_prob=0.1, relu=decoder_relu)
        self.binary_transposed_conv = nn.ConvTranspose2d(16, binary_seg, kernel_size=3, stride=2, padding=1, bias=False)

        # embedding branch
        self.upsample_embedding_4_0 = UpsamplingBottleneck(128, 64, dropout_prob=0.1, relu=decoder_relu)
        self.regular_embedding_4_1 = RegularBottleneck(64, padding=1, dropout_prob=0.1, relu=decoder_relu)
        self.regular_embedding_4_2 = RegularBottleneck(64, padding=1, dropout_prob=0.1, relu=decoder_relu)
        self.upsample_embedding_5_0 = UpsamplingBottleneck(64, 16, dropout_prob=0.1, relu=decoder_relu)
        self.regular_embedding_5_1 = RegularBottleneck(16, padding=1, dropout_prob=0.1, relu=decoder_relu)
        self.embedding_transposed_conv = nn.ConvTranspose2d(16, embedding_dim, kernel_size=3, stride=2, padding=1, bias=False)

    def forward(self, x):
        # Initial block
        input_size = x.size() # torch.Size([8, 3, 256, 512])
        x = self.initial_block(x)

        # Stage 1 share
        stage1_input_size = x.size()
        x, max_indices1_0 = self.downsample1_0(x)
        x = self.regular1_1(x)
        x = self.regular1_2(x)
        x = self.regular1_3(x)
        x = self.regular1_4(x)

        # Stage 2 share
        stage2_input_size = x.size()
        x, max_indices2_0 = self.downsample2_0(x)
        x = self.regular2_1(x)
        x = self.dilated2_2(x)
        x = self.asymmetric2_3(x)
        x = self.dilated2_4(x)
        x = self.regular2_5(x)
        x = self.dilated2_6(x)
        x = self.asymmetric2_7(x)
        x = self.dilated2_8(x)

        # stage 3 binary
        x_binary = self.regular_binary_3_0(x)
        x_binary = self.dilated_binary_3_1(x_binary)
        x_binary = self.asymmetric_binary_3_2(x_binary)
        x_binary = self.dilated_binary_3_3(x_binary)
        x_binary = self.regular_binary_3_4(x_binary)
        x_binary = self.dilated_binary_3_5(x_binary)
        x_binary = self.asymmetric_binary_3_6(x_binary)
        x_binary = self.dilated_binary_3_7(x_binary)

        # stage 3 embedding
        x_embedding = self.regular_embedding_3_0(x)
        x_embedding = self.dilated_embedding_3_1(x_embedding)
        x_embedding = self.asymmetric_embedding_3_2(x_embedding)
        x_embedding = self.dilated_embedding_3_3(x_embedding)
        x_embedding = self.regular_embedding_3_4(x_embedding)
        x_embedding = self.dilated_embedding_3_5(x_embedding)
        x_embedding = self.asymmetric_bembedding_3_6(x_embedding)
        x_embedding = self.dilated_embedding_3_7(x_embedding)

        # binary branch
        x_binary = self.upsample_binary_4_0(x_binary, max_indices2_0, output_size=stage2_input_size)
        x_binary = self.regular_binary_4_1(x_binary)
        x_binary = self.regular_binary_4_2(x_binary)
        x_binary = self.upsample_binary_5_0(x_binary, max_indices1_0, output_size=stage1_input_size)
        x_binary = self.regular_binary_5_1(x_binary)
        binary_final_logits = self.binary_transposed_conv(x_binary, output_size=input_size)

        # embedding branch
        x_embedding = self.upsample_embedding_4_0(x_embedding, max_indices2_0, output_size=stage2_input_size)
        x_embedding = self.regular_embedding_4_1(x_embedding)
        x_embedding = self.regular_embedding_4_2(x_embedding)
        x_embedding = self.upsample_embedding_5_0(x_embedding, max_indices1_0, output_size=stage1_input_size)
        x_embedding = self.regular_embedding_5_1(x_embedding)
        instance_notfinal_logits = self.embedding_transposed_conv(x_embedding, output_size=input_size)

        return binary_final_logits, instance_notfinal_logits


if __name__ == '__main__':
    test_input = torch.ones((8, 3, 256, 512))
    net = Lanenet(2, 4)
    binary_final_logits, instance_notfinal_logits = net(test_input)
    print(binary_final_logits.shape)
    print(instance_notfinal_logits.shape)

在上述文件导入的本地模块from Lanenet.cluster_loss3 import cluster_loss代码解读如下:

import torch
import torch.nn as nn
from torch_scatter import scatter # 关于这个模块的下载:https://www.cnblogs.com/cykablyat/p/14293500.html。注意要下载与cuda和python对应版本的。对我,下载链接https://pytorch-geometric.com/whl/torch-1.7.0.html。下载文件名:torch_scatter-latest+cu110-cp38-cp38-linux_x86_64.whl


class cluster_loss_helper(nn.Module): # 计算L=Lar+Ldist的损失。(二进制分支)
    def __init__(self):
        super(cluster_loss_helper, self).__init__()

    def forward(self, prediction, correct_label, delta_v, delta_d):
        """

        :param prediction: [N, 4, 256, 512]
        :param correct_label: [N, 256, 512]
        :param delta_v:
        :param delta_d:
        :return:
        """
        prediction_reshape = prediction.view(prediction.shape[0], prediction.shape[1],
                                             prediction.shape[2] * prediction.shape[3])  # [N, 4, 131072]
        correct_label_reshape = correct_label.view(correct_label.shape[0], 1,
                                                   correct_label.shape[1] * correct_label.shape[
                                                       2])  # [N, 1, 131072]

        output, inverse_indices, counts = torch.unique(correct_label_reshape, return_inverse=True,
                                                       return_counts=True)
        counts = counts.float()
        num_instances = len(output) # 车道线实例数量

        # mu_sum = scatter(prediction_reshape, inverse_indices, dim=2, reduce="sum") # [N, 4, 5]
        # muc = mu_sum/counts # [N, 4, 5]
        muc = scatter(prediction_reshape, inverse_indices, dim=2, reduce="mean")  # [N, 4, 5] 具体用法:https://blog.csdn.net/StarfishCu/article/details/108853080

        dis = torch.index_select(muc, 2, inverse_indices.view(inverse_indices.shape[-1]),
                                 out=None)  # [N, 4, 131072]
        dis = dis - prediction_reshape
        dis = torch.norm(dis, dim=1, keepdim=False, out=None, dtype=None)  # [N, 131072]
        dis = dis - delta_v
        dis = torch.clamp(dis, min=0.)  # [N, 131072]
        dis = torch.pow(dis, 2, out=None)

        L_var = scatter(dis, inverse_indices.view(inverse_indices.shape[-1]), dim=1, reduce="mean")  # [N, 3]
        L_var = torch.sum(L_var) / num_instances

        L_dist = torch.tensor(0, dtype=torch.float)
        for A in range(num_instances):
            for B in range(num_instances):
                if A != B:
                    dis = muc[:, :, A] - muc[:, :, B]
                    dis = torch.norm(dis, dim=1, keepdim=False, out=None, dtype=None)
                    dis = delta_d - dis
                    dis = torch.clamp(dis, min=0.)
                    dis = torch.pow(dis, 2, out=None)
                    L_dist = L_dist + dis
        L_dist = L_dist / (num_instances * (num_instances - 1))
        L_dist = L_dist.view([])
        total_loss = L_var + L_dist
        return total_loss


class cluster_loss(nn.Module):
    def __init__(self):
        super(cluster_loss, self).__init__()

    def forward(self, binary_logits, binary_labels,
                instance_logits, instance_labels, delta_v=0.5, delta_d=3):
        # Binary Loss
        # Since the two classes (lane/background) are highly unbalanced, we apply bounded inverse class weighting
        output, counts = torch.unique(binary_labels, return_inverse=False, return_counts=True) # output:不重复的元素。counts:元素对应数量 torch.unique:https://blog.csdn.net/t20134297/article/details/108235355。
        counts = counts.float()
        inverse_weights = torch.div(1.0, torch.log(
            torch.add(torch.div(counts, torch.sum(counts)), torch.tensor(1.02, dtype=torch.float)))) # lane/background类别不均衡:**bounded inverse class weighting**

        binary_loss = torch.nn.CrossEntropyLoss(weight=inverse_weights)
        binary_segmenatation_loss = binary_loss(binary_logits, binary_labels) # 带有权重的CrossEntropyLoss

        batch_size = instance_logits.shape[0]
        loss_set = []
        for dimen in range(batch_size):
            loss_set.append(cluster_loss_helper()) # 记录

        instance_segmenatation_loss = torch.tensor(0.)#.cuda() 

        for dimen in range(batch_size):
            instance_loss = loss_set[dimen] # 
            # prediction = instance_logits[dimen].view(1, instance_logits.shape[1], instance_logits.shape[2],
            #                                          instance_logits.shape[3])
            # correct_label = instance_labels[dimen].view(1, instance_labels.shape[1], instance_labels.shape[2])
            # instance_segmenatation_loss += instance_loss(prediction, correct_label, delta_v, delta_d)
            prediction = torch.unsqueeze(instance_logits[dimen], 0) # .cuda()
            correct_label = torch.unsqueeze(instance_labels[dimen], 0)# .cuda()
            instance_segmenatation_loss += instance_loss(prediction, correct_label, delta_v, delta_d)

        instance_segmenatation_loss = instance_segmenatation_loss / batch_size
        return binary_segmenatation_loss, instance_segmenatation_loss

运行到这里,可能会报错。错误信息如下:

Traceback (most recent call last):
File “/home/wqf/ECBM6040-Project/test.py”, line 82, in
binary_segmenatation_loss, instance_segmenatation_loss = criterion(
File “/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/wqf/ECBM6040-Project/Lanenet/cluster_loss3.py”, line 98, in forward
instance_segmenatation_loss += instance_loss(prediction, correct_label, delta_v, delta_d)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

这段话的意思就是说所有的tensor要在一个设备上运行,但这里至少找到了两个设备cuda:0和cpu。因此我们在报错地方打印所有变量的运行位置

            print(delta_d.device)
            print(delta_v.device)
            print(prediction.device)
            print(correct_label.device)

不要忘了赋值语句左边的变量:

            print(instance_segmenatation_loss)

之后可以发现是delta_d、delta_v、instance_segmenatation_loss这三个小可爱在捣鬼。因此我们将他们放到gpu上。

            delta_v = delta_v.to(device)
            delta_d = delta_d.to(device)
instance_segmenatation_loss = torch.tensor(0.).cuda()

但是又报另一个错:

Traceback (most recent call last):
File “/home/wqf/ECBM6040-Project/test.py”, line 82, in
binary_segmenatation_loss, instance_segmenatation_loss = criterion(
File “/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/wqf/ECBM6040-Project/Lanenet/cluster_loss3.py”, line 99, in forward
delta_v = delta_v.cuda()
AttributeError: ‘float’ object has no attribute 'cuda’

因此我们将delta_d、delta_v这两个变量转换为tensor格式,再放到gpu上

import numpy as np
...
    def forward(self, binary_logits, binary_labels,
                instance_logits, instance_labels, delta_v=np.float(0.5), delta_d=np.int(3)):
                ...
            delta_v = torch.tensor(delta_v)
            delta_d = torch.tensor(delta_d)

之后便可以愉快的跑代码啦~

修改之后的cluster_loss3代码如下:

# _*_ coding utf-8 _*_
# 开发团队:
# 开发人员:wqf
# 开发时间:2021/1/22 下午5:10
# 文件名称:cluster_loss3.py
# 开发工具:PyCharm
import torch
import torch.nn as nn
from torch_scatter import scatter # 关于这个模块的下载:https://www.cnblogs.com/cykablyat/p/14293500.html。注意要下载与cuda和python对应版本的。对我,下载链接https://pytorch-geometric.com/whl/torch-1.7.0.html。下载文件名:torch_scatter-latest+cu110-cp38-cp38-linux_x86_64.whl
import numpy as np

class cluster_loss_helper(nn.Module): # 计算L=Lar+Ldist的损失。(二进制分支)。对公式的代码表示
    def __init__(self):
        super(cluster_loss_helper, self).__init__()

    def forward(self, prediction, correct_label, delta_v, delta_d):
        """

        :param prediction: [N, 4, 256, 512]
        :param correct_label: [N, 256, 512]
        :param delta_v:
        :param delta_d:
        :return:
        """
        prediction_reshape = prediction.view(prediction.shape[0], prediction.shape[1],
                                             prediction.shape[2] * prediction.shape[3])  # [N, 4, 131072]
        correct_label_reshape = correct_label.view(correct_label.shape[0], 1,
                                                   correct_label.shape[1] * correct_label.shape[
                                                       2])  # [N, 1, 131072]

        output, inverse_indices, counts = torch.unique(correct_label_reshape, return_inverse=True,
                                                       return_counts=True)
        counts = counts.float()
        num_instances = len(output) # 车道线实例数量

        # mu_sum = scatter(prediction_reshape, inverse_indices, dim=2, reduce="sum") # [N, 4, 5]
        # muc = mu_sum/counts # [N, 4, 5]
        muc = scatter(prediction_reshape, inverse_indices, dim=2, reduce="mean")  # [N, 4, 5] 具体用法:https://blog.csdn.net/StarfishCu/article/details/108853080

        dis = torch.index_select(muc, 2, inverse_indices.view(inverse_indices.shape[-1]),
                                 out=None)  # [N, 4, 131072]
        dis = dis - prediction_reshape
        dis = torch.norm(dis, dim=1, keepdim=False, out=None, dtype=None)  # [N, 131072]
        dis = dis - delta_v
        dis = torch.clamp(dis, min=0.)  # [N, 131072]
        dis = torch.pow(dis, 2, out=None)

        L_var = scatter(dis, inverse_indices.view(inverse_indices.shape[-1]), dim=1, reduce="mean")  # [N, 3]
        L_var = torch.sum(L_var) / num_instances

        L_dist = torch.tensor(0, dtype=torch.float)
        for A in range(num_instances):
            for B in range(num_instances):
                if A != B:
                    dis = muc[:, :, A] - muc[:, :, B]
                    dis = torch.norm(dis, dim=1, keepdim=False, out=None, dtype=None)
                    dis = delta_d - dis
                    dis = torch.clamp(dis, min=0.)
                    dis = torch.pow(dis, 2, out=None)
                    L_dist = L_dist + dis
        L_dist = L_dist / (num_instances * (num_instances - 1))
        L_dist = L_dist.view([])
        total_loss = L_var + L_dist
        return total_loss


class cluster_loss(nn.Module):
    def __init__(self):
        super(cluster_loss, self).__init__()

    def forward(self, binary_logits, binary_labels,
                instance_logits, instance_labels, delta_v=np.float(0.5), delta_d=np.int(3)):
        # Binary Loss
        # Since the two classes (lane/background) are highly unbalanced, we apply bounded inverse class weighting
        output, counts = torch.unique(binary_labels, return_inverse=False, return_counts=True) # output:不重复的元素。counts:元素对应数量 torch.unique:https://blog.csdn.net/t20134297/article/details/108235355。
        counts = counts.float()
        inverse_weights = torch.div(1.0, torch.log(
            torch.add(torch.div(counts, torch.sum(counts)), torch.tensor(1.02, dtype=torch.float)))) # lane/background类别不均衡:**bounded inverse class weighting**

        binary_loss = torch.nn.CrossEntropyLoss(weight=inverse_weights)
        binary_segmenatation_loss = binary_loss(binary_logits, binary_labels) # 带有权重的CrossEntropyLoss

        batch_size = instance_logits.shape[0]
        loss_set = []
        for dimen in range(batch_size):
            loss_set.append(cluster_loss_helper()) # 记录batch里每张图片的嵌入损失

        instance_segmenatation_loss = torch.tensor(0.).cuda()#.cuda()

        for dimen in range(batch_size):
            instance_loss = loss_set[dimen] # 取出对应的实例化后的嵌入损失
            # prediction = instance_logits[dimen].view(1, instance_logits.shape[1], instance_logits.shape[2],
            #                                          instance_logits.shape[3])
            # correct_label = instance_labels[dimen].view(1, instance_labels.shape[1], instance_labels.shape[2])
            # instance_segmenatation_loss += instance_loss(prediction, correct_label, delta_v, delta_d)
            prediction = torch.unsqueeze(instance_logits[dimen], 0) # .cuda()
            correct_label = torch.unsqueeze(instance_labels[dimen], 0)# .cuda()

            delta_v = torch.tensor(delta_v)
            delta_d = torch.tensor(delta_d)

            delta_v = delta_v.cuda()
            delta_d = delta_d.cuda()

            instance_segmenatation_loss += instance_loss(prediction, correct_label, delta_v, delta_d)

        instance_segmenatation_loss = instance_segmenatation_loss / batch_size # 实例分割损失要除以batch数,计算单张图片的损失,而二进制损失不知道为啥不要除。待研究。
        return binary_segmenatation_loss, instance_segmenatation_loss

我这里是3060TI的显卡,训练一个epoch大概要180s。

  1. ECBM6040-Project/Train_aug.ipynb使用经数据增强处理后的数据训练LaneNet

这个代码这里就不贴了,和之前的一样的。只说一点:注意路径的书写
源代码:

torch.save(LaneNet_model.state_dict(),

               f"/TUSIMPLE/Lanenet_output/lanenet_epoch_{epoch}_batch_{8}_AUG.model")

会报错。修改如下

torch.save(LaneNet_model.state_dict(),
               f"TUSIMPLE/Lanenet_output/lanenet_epoch_{epoch}_batch_{8}_AUG.model")

注:训练时没有进行clustering。而是lanenet输出二进制分割损失和实例嵌入损失后,直接进行梯度下降。我认为,这是因为在训练时是知道每个图像有几个车道线的。而测试时不知道每张图片有几条车道线,所以要进行聚类。聚类时,为了防止将离群点作为聚类初始点,先用mean shift算法,到达样本特征点最密集的点那里。
关于mean shift算法为什么可以到到特征点密集处:https://blog.csdn.net/u014661698/article/details/84979979

在测试集上评估模型

  1. ECBM6040-Project/Notebook-experiment/Evaluation of Lanenet.ipynb去评估。
    代码解读如下:
import json
import os.path as ops
import numpy as np
import torch
import cv2
import time
import os
import matplotlib.pylab as plt
import sys
from tqdm import tqdm
sys.path.append('..') # 载入上级目录。前面有解释
from dataset.dataset_utils import TUSIMPLE
from Lanenet.model2 import Lanenet
from utils.evaluation import gray_to_rgb_emb, process_instance_embedding, video_to_clips

# Load the Model
model_path = '../TUSIMPLE/Lanenet_output/lanenet_epoch_39_batch_8.model'
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
LaneNet_model = Lanenet(2, 4)
LaneNet_model.load_state_dict(torch.load(model_path)) # 加载模型。
LaneNet_model.to(device)
print('success')

# Load the Test Dataset
# Test use TUSIMPLE test dataset `clips` and `test_label.json` and write the predit result in `test_tasks_0627.json` use the evaluation from TUSIMPLE dataset in `utils/lane.py`
# write lanes and run_time to `pred_result.json`
pred_json_path = '../TUSIMPLE/test_set/test_tasks_0627.json' # inference得到的车道线标签
json_pred = [json.loads(line) for line in open(pred_json_path).readlines()] # json.loads()函数是将字符串转化为字典。https://www.cnblogs.com/hjianhui/p/10387057.html。运行本行后,json_pred是一个2782的list

all_time_forward = [] 
all_time_clustering = []
for i, sample in enumerate(tqdm(json_pred)): # 遍历 json_pred 中的元素及其索引, 如下,i是索引,sample是json_pred中的元素
    h_samples = sample['h_samples'] # 宽度(即竖直方向)。[160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710]
    lanes = sample['lanes'] # []
    run_time = sample['run_time'] # list of float. The running time for each frame in the clip. The unit is millisecond.
    raw_file = sample['raw_file']
    img_path = ops.join('../TUSIMPLE/test_set', raw_file)
    # read the image
    gt_img_org = cv2.imread(img_path, cv2.IMREAD_UNCHANGED) # 【720,1280,3】
    org_shape = gt_img_org.shape
    gt_image = cv2.resize(gt_img_org, dsize=(512, 256), interpolation=cv2.INTER_LINEAR)
    gt_image = gt_image / 127.5 - 1.0
    gt_image = torch.tensor(gt_image, dtype=torch.float)
    gt_image = np.transpose(gt_image, (2, 0, 1)) # {Tensor:3}。(3,256,512)。(通道数,高度,长度)
    # Go through the network
    time_start=time.time()
    binary_final_logits, instance_embedding = LaneNet_model(gt_image.unsqueeze(0).cuda()) # 返回的是网络预测的二进制图片和实例嵌入
    # binary_final_logits = binary_final_logits.cpu()
    # instance_embedding = instance_embedding.cpu()
    time_end=time.time()
    # Get the final embedding image
    binary_img = torch.argmax(binary_final_logits, dim=1).squeeze().cpu().numpy() # 有车道线的地方是1,没有则是0
    binary_img[0:50,:] = 0
    clu_start = time.time()
    rbg_emb, cluster_result = process_instance_embedding(instance_embedding.cpu(), binary_img, distance=1.5, lane_num=4)
    clu_end = time.time()
    cluster_result = cv2.resize(cluster_result, dsize=(org_shape[1], org_shape[0]), 
                                interpolation=cv2.INTER_NEAREST)
    elements = np.unique(cluster_result)
    for line_idx in elements:
        if line_idx == 0: # 如果是背景,则continue
            continue
        else:
            mask = (cluster_result == line_idx) # 生成对应车道线的掩码(720,1280)
            select_mask = mask[h_samples] # 对纵坐标的掩码(56,1280)
            row_result = []
            for row in range(len(h_samples)): # 按h_samples所提供的行
                col_indexes = np.nonzero(select_mask[row])[0] # 对每一行找出非零元素(有车道线)
                if len(col_indexes) == 0:
                    row_result.append(-2) # 没有车道线
                else:
                    row_result.append(int(col_indexes.min() + (col_indexes.max()-col_indexes.min())/2)) # 记录该行车道线的中点坐标
            json_pred[i]['lanes'].append(row_result) # 运行前:{'h_samples': [160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710], 'lanes': [], 'run_time': 1000, 'raw_file': 'clips/0530/1492626760788443246_0/20.jpg
            json_pred[i]['run_time'] = time_end-time_start # 运行后:{'h_samples': [160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710], 'lanes': [[-2, -2, -2, -2, -2, -2, -2, -2, -2, 678, 686, 701, 710, 721, 732, 747, 756, 771, 783, 797, 815, 823, 838, 848, 865, 876, 890, 901, 916, 931, 941, 954, 966, 982, 994, 1009, 1021, 1037, 1052, 1062, 1078, 1090, 1103, 1113, 1128, 1138, 1153, 1168, 1180, 1196, 1208, 1221, 1231, 1246, 1253, 1262]], 'run_time': 0.5215342044830322, 'raw_file': 'clips/0530/1492626760788443246_0/20.jpg'}
            all_time_forward.append(time_end-time_start) # 前向传播时间。Lanenet inference一次所用时间
            all_time_clustering.append(clu_end-clu_start) # 聚类一次所用时间

forward_avg = np.sum(all_time_forward[500:2000])/1500 # 选择中间的1500次进行平均
cluster_avg = np.sum(all_time_clustering[500:2000])/1500

print('The Forward pass time for one image is: {}ms'.format(forward_avg*1000)) # 33.479649225870766ms
print('The Clustering time for one image is: {}ms'.format(cluster_avg*1000)) # 206.9698650042216ms
print('The total time for one image is: {}ms'.format((cluster_avg+forward_avg)*1000)) # 240.44951423009238ms

print('The speed for foreard pass is: {}fps'.format(1/forward_avg)) # 29.868891195767635fps
print('The speed for cluster pass is: {}fps'.format(1/cluster_avg)) # 4.831621260320206fps

with open('../TUSIMPLE/pred.json', 'w') as f:
    for res in json_pred:
        json.dump(res, f)
        f.write('\n')

# Evaluation using TUSIMPLE script
from utils.lane import LaneEval

result = LaneEval.bench_one_submit('../TUSIMPLE/pred.json',
                         '../TUSIMPLE/test_set/test_label.json') # 这里根据你自己的路径来写就行

print(result)
# [{"name":"Accuracy","value":0.9430533446304444,"order":"desc"},{"name":"FP","value":0.15022166307213028,"order":"asc"},{"name":"FN","value":0.07329858614905349,"order":"asc"}]
# order是什么意思不太懂。。。 

# Evaluation for aug result
model_path = '../TUSIMPLE/Lanenet_output/lanenet_epoch_39_batch_8_AUG.model'
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
LaneNet_model = Lanenet(2, 4)
LaneNet_model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
LaneNet_model.to(device)
print('success')

pred_json_path = '../TUSIMPLE/test_set/test_tasks_0627.json'
json_pred = [json.loads(line) for line in open(pred_json_path).readlines()]

all_time_forward = []
all_time_clustering = []
for i, sample in enumerate(tqdm(json_pred)):
    h_samples = sample['h_samples']
    lanes = sample['lanes']
    run_time = sample['run_time']
    raw_file = sample['raw_file']
    img_path = ops.join('../TUSIMPLE/test_set', raw_file)
    # read the image
    gt_img_org = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
    org_shape = gt_img_org.shape
    gt_image = cv2.resize(gt_img_org, dsize=(512, 256), interpolation=cv2.INTER_LINEAR)
    gt_image = gt_image / 127.5 - 1.0
    gt_image = torch.tensor(gt_image, dtype=torch.float)
    gt_image = np.transpose(gt_image, (2, 0, 1))
    # Go through the network
    time_start=time.time()
    binary_final_logits, instance_embedding = LaneNet_model(gt_image.unsqueeze(0)) 
    # binary_final_logits = binary_final_logits.cpu()
    # instance_embedding = instance_embedding.cpu()
    time_end=time.time()
    # Get the final embedding image
    binary_img = torch.argmax(binary_final_logits, dim=1).squeeze().cpu().numpy()
    binary_img[0:50,:] = 0
    clu_start = time.time()
    rbg_emb, cluster_result = process_instance_embedding(instance_embedding.cpu(), binary_img,
                                                             distance=1.5, lane_num=4)
    clu_end = time.time()
    cluster_result = cv2.resize(cluster_result, dsize=(org_shape[1], org_shape[0]), 
                                interpolation=cv2.INTER_NEAREST)
    elements = np.unique(cluster_result)
    for line_idx in elements:
        if line_idx == 0:
            continue
        else:
            mask = (cluster_result == line_idx)
            select_mask = mask[h_samples]
            row_result = []
            for row in range(len(h_samples)):
                col_indexes = np.nonzero(select_mask[row])[0]
                if len(col_indexes) == 0:
                    row_result.append(-2)
                else:
                    row_result.append(int(col_indexes.min() + (col_indexes.max()-col_indexes.min())/2))
            json_pred[i]['lanes'].append(row_result)
            json_pred[i]['run_time'] = time_end-time_start
            all_time_forward.append(time_end-time_start)
            all_time_clustering.append(clu_end-clu_start)

with open('../TUSIMPLE/pred_aug.json', 'w') as f:
    for res in json_pred:
        json.dump(res, f)
        f.write('\n')

from utils.lane import LaneEval

result = LaneEval.bench_one_submit('../TUSIMPLE/pred_aug.json',
                         '../TUSIMPLE/test_set/test_label.json') 

print(result)

在上述文件导入的本地模块utils.evaluation代码解读如下:

import os.path as ops
import numpy as np
import torch
import cv2
import time
import tqdm
import os
from sklearn.cluster import MeanShift, estimate_bandwidth


def gray_to_rgb_emb(gray_img): # 把二进制图片转换为rgb图片
    """
    :param gray_img: torch tensor 256 x 512
    :return: numpy array 256 x 512
    """
    H, W = gray_img.shape
    element = torch.unique(gray_img).numpy()
    rbg_emb = np.zeros((H, W, 3))
    color = [[0, 0, 0], [255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 215, 0], [0, 255, 255]]
    for i in range(len(element)):
        rbg_emb[gray_img == element[i]] = color[i]
    return rbg_emb/255 # 返回归一化rgb图片


def process_instance_embedding(instance_embedding, binary_img, distance=1, lane_num=5):
    embedding = instance_embedding[0].detach().numpy().transpose(1, 2, 0) # detach():不具有梯度https://blog.csdn.net/weixin_33913332/article/details/93300411。https://www.cnblogs.com/jiangkejie/p/9981707.html。embedding形状:(256,512,4)。4代表4个嵌入维度
    cluster_result = np.zeros(binary_img.shape, dtype=np.int32) # (256,512)。记录聚类结果
    cluster_list = embedding[binary_img > 0] # 只聚类二进制图片中大于0的部分。(5228,4)
    mean_shift = MeanShift(bandwidth=distance, bin_seeding=True, n_jobs=-1) # 实例化一个mean shitf。https://blog.csdn.net/weixin_41636030/article/details/88793284。https://zhuanlan.zhihu.com/p/69119285。关于mean shift算法,待学习。
    mean_shift.fit(cluster_list) # 计算。单步调试时到这里会报错如下:“Traceback (most recent call last):
  File "/home/wqf/下载/pycharm-community-2020.3/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_comm.py", line 290, in _on_run
    r = self.sock.recv(1024)
OSError: [Errno 9] 错误的文件描述符” 但不影响运行
    labels = mean_shift.labels_ # 记录每一个点的类别。(5228,)

    cluster_result[binary_img > 0] = labels + 1 # 
    cluster_result[cluster_result > lane_num] = 0 # 如果聚类结果比设定的车道线数量(lane_num)多,那么就把多出来的类别置为0(丢弃)。这里设置为4好像是因为图森数据集中说只关注当前车道和左右车道。(挖个坑)
    for idx in np.unique(cluster_result):
        if len(cluster_result[cluster_result == idx]) < 15: # 如果某一类标签小于15,也丢弃
            cluster_result[cluster_result == idx] = 0

    H, W = binary_img.shape
    rbg_emb = np.zeros((H, W, 3))
    color = [[0, 0, 0], [255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 215, 0], [0, 255, 255]]
    element = np.unique(cluster_result)
    for i in range(len(element)):
        rbg_emb[cluster_result == element[i]] = color[i] # 标注颜色

    return rbg_emb / 255, cluster_result # 返回rbg_emb / 255:归一化后的标注好的rgb图像(256,512,3)。cluster_result:聚类结果(256,512)【不同车道线用1、2、3、4等代表】0代表背景


def video_to_clips(video_name):
    test_video_dir = ops.split(video_name)[0]
    outimg_dir = ops.join(test_video_dir, 'clips')
    if ops.exists(outimg_dir):
        print('Data already exist in {}'.format(outimg_dir))
        return
    if not ops.exists(outimg_dir):
        os.makedirs(outimg_dir)
    video_cap = cv2.VideoCapture(video_name)
    frame_count = 0
    all_frames = []

    while (True):
        ret, frame = video_cap.read()
        if ret is False:
            break
        all_frames.append(frame)
        frame_count = frame_count + 1

    for i, frame in enumerate(all_frames):
        out_frame_name = '{:s}.png'.format('{:d}'.format(i + 1).zfill(6))
        out_frame_path = ops.join(outimg_dir, out_frame_name)
        cv2.imwrite(out_frame_path, frame)
    print('finish process and save in {}'.format(outimg_dir))

在上述文件导入的本地模块from utils.lane import LaneEval代码解读如下:

import numpy as np
from sklearn.linear_model import LinearRegression
import ujson as json


class LaneEval(object):
    lr = LinearRegression() # 实例化一个线性回归器
    pixel_thresh = 20 # 一个阈值
    pt_thresh = 0.85 # 阈值

    @staticmethod
    def get_angle(xs, y_samples): # 计算角度(与图像底边)
        xs, ys = xs[xs >= 0], y_samples[xs >= 0] # 得到有意义的车道线点
        if len(xs) > 1:
            LaneEval.lr.fit(ys[:, None], xs) # 拟合
            k = LaneEval.lr.coef_[0] # 得到系数
            theta = np.arctan(k)
        else:
            theta = 0
        return theta # 返回角度

    @staticmethod
    def line_accuracy(pred, gt, thresh): # 计算拟合准确率
        pred = np.array([p if p >= 0 else -100 for p in pred])
        gt = np.array([g if g >= 0 else -100 for g in gt])
        return np.sum(np.where(np.abs(pred - gt) < thresh, 1., 0.)) / len(gt) # 如果pred和gt(均是x)差距小于阈值,则认为拟合成功,否则失败

    @staticmethod
    def bench(pred, gt, y_samples, running_time):
        if any(len(p) != len(y_samples) for p in pred): 
            raise Exception('Format of lanes error.')
        if running_time > 200 or len(gt) + 2 < len(pred): # 如果运行时间超过200ms,或者真实车道线数量比预测的车道线数量**(这里不懂什么意思)**
            return 0., 0., 1.
        angles = [LaneEval.get_angle(np.array(x_gts), np.array(y_samples)) for x_gts in gt] # gt:一张图片中真实车道线的gt标注。x_gts:图片中一条车道线的真实x坐标。y_samples:纵坐标。angles:记录车道线与x轴夹角
        threshs = [LaneEval.pixel_thresh / np.cos(angle) for angle in angles] # 对每一个角度的车道线允许x方向差值不同。角度越大的,阈值越大(因为更不容易把)
        line_accs = []
        fp, fn = 0., 0.
        matched = 0.
        for x_gts, thresh in zip(gt, threshs): # gt:一张图片中真实车道线的gt标注。x_gts:图片中一条车道线的真实x坐标。 # threshs:车道线允许的偏差阈值。thresh:一条车道线具体的偏差阈值
            accs = [LaneEval.line_accuracy(np.array(x_preds), np.array(x_gts), thresh) for x_preds in pred] # x_preds:一条车道线的预测x坐标。x_gts:一条车道线的真实x坐标。计算一张图中每一条车道线有多少点拟合对了(差值小于阈值即认为正确)。
            max_acc = np.max(accs) if len(accs) > 0 else 0. # 由于不知道pred中哪个标注是与x_gts对应的,因此遍历pred,找出pred中与当前x_gts最对应的那条车道线。
            if max_acc < LaneEval.pt_thresh: # 如果x_gts在pred中没有找到对应的,则认为这条车道线错过了。
                fn += 1 # fn=错过的gt车道线数量/所有gt车道线的数量。这里应该是记录的错过的gt车道线
            else:
                matched += 1 # 找到了
            line_accs.append(max_acc)
        fp = len(pred) - matched # fp=错误预测的车道线数量/预测的车道线数量。pred是预测的车道小,matched是预测到的车道线。
        if len(gt) > 4 and fn > 0: # 如果一张图中车道线数量大于4了,并且fn中有值。则认为这是因为模型问题,不算在fn中(因为文中说到有的gt是大于4的,而我们好像认为车道数量最多是4来建立模型的)
            fn -= 1
        s = sum(line_accs)
        if len(gt) > 4: # 原因同上
            s -= min(line_accs)
        return s / max(min(4.0, len(gt)), 1.), fp / len(pred) if len(pred) > 0 else 0., fn / max(min(len(gt), 4.) , 1.) # 这里是对acc和fp、fn的公式实现

    @staticmethod # 静态方法。可以不实例化调用。https://www.runoob.com/python/python-func-staticmethod.html
    def bench_one_submit(pred_file, gt_file):
        try:
            json_pred = [json.loads(line) for line in open(pred_file).readlines()] # 记录所有预测json
        except BaseException as e:
            raise Exception('Fail to load json file of the prediction.')
        json_gt = [json.loads(line) for line in open(gt_file).readlines()] # 记录所有真实json
        if len(json_gt) != len(json_pred): # 2782
            raise Exception('We do not get the predictions of all the test tasks')
        gts = {l['raw_file']: l for l in json_gt} #json_gt:[{'lanes': [[-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 648, 636, 626, 615, 605, 595, 585, 575, 565, 554, 545, 536, 526, 517, 508, 498, 489, 480, 470, 461, 452, 442, 433, 424, 414, 405, 396, 386, 377, 368, 359, 349, 340, 331, 321, 312, 303, 293, 284, 275, 265, 256, 247, 237, 228, 219], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 681, 692, 704, 716, 728, 741, 754, 768, 781, 794, 807, 820, 834, 847, 860, 873, 886, 900, 913, 926, 939, 952, 966, 979, 992, 1005, 1018, 1032, 1045, 1058, 1071, 1084, 1098, 1111, 1124, 1137, 1150, 1164, 1177, 1190, 1203, 1216, 1230, 1243, 1256, 1269], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 713, 746, 778, 811, 845, 880, 916, 951, 986, 1022, 1057, 1092, 1128, 1163, 1198, 1234, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 754, 806, 858, 909, 961, 1013, 1064, 1114, 1164, 1213, 1263, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2]], 'h_samples': [160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710], 'raw_file': 'clips/0530/1492626760788443246_0/20.jpg'} ……共计2782] gts:{'clips/0530/1492626760788443246_0/20.jpg': {'lanes': [[-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 648, 636, 626, 615, 605, 595, 585, 575, 565, 554, 545, 536, 526, 517, 508, 498, 489, 480, 470, 461, 452, 442, 433, 424, 414, 405, 396, 386, 377, 368, 359, 349, 340, 331, 321, 312, 303, 293, 284, 275, 265, 256, 247, 237, 228, 219], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 681, 692, 704, 716, 728, 741, 754, 768, 781, 794, 807, 820, 834, 847, 860, 873, 886, 900, 913, 926, 939, 952, 966, 979, 992, 1005, 1018, 1032, 1045, 1058, 1071, 1084, 1098, 1111, 1124, 1137, 1150, 1164, 1177, 1190, 1203, 1216, 1230, 1243, 1256, 1269], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 713, 746, 778, 811, 845, 880, 916, 951, 986, 1022, 1057, 1092, 1128, 1163, 1198, 1234, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 754, 806, 858, 909, 961, 1013, 1064, 1114, 1164, 1213, 1263, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2]], 'h_samples': [160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710], 'raw_file': 'clips/0530/1492626760788443246_0/20.jpg'}, ……共计2782}。
        accuracy, fp, fn = 0., 0., 0.
        for pred in json_pred: # pred:记录一条预测的图片信息
            if 'raw_file' not in pred or 'lanes' not in pred: #or 'run_time' not in pred:
                raise Exception('raw_file or lanes or run_time not in some predictions.')
            raw_file = pred['raw_file']
            pred_lanes = pred['lanes']
            # run_time = pred['run_time']
            run_time = 100
            if raw_file not in gts:
                raise Exception('Some raw_file from your predictions do not exist in the test tasks.')
            gt = gts[raw_file] # 提取pred对应的gt图片信息
            gt_lanes = gt['lanes']
            y_samples = gt['h_samples']
            try:
                a, p, n = LaneEval.bench(pred_lanes, gt_lanes, y_samples, run_time) # 静态方法。不实例化就可调用。传入预测的车道线pred_lanes:[[-2, -2, -2, -2, -2, -2, -2, -2, -2, 677, 684, 701, 708, 722, 732, 746, 756, 771, 782, 797, 813, 825, 838, 848, 865, 875, 888, 902, 916, 930, 941, 956, 966, 982, 995, 1009, 1019, 1036, 1051, 1065, 1081, 1092, 1106, 1117, 1132, 1143, 1158, 1173, 1187, 1202, 1212, 1227, 1236, 1250, 1257, 1262], [-2, -2, -2, -2, -2, -2, -2, -2, -2, 658, 653, 638, 629, 617, 609, 598, 589, 579, 571, 559, 548, 541, 529, 522, 512, 504, 493, 484, 474, 464, 456, 444, 437, 426, 418, 408, 401, 390, 378, 371, 361, 353, 343, 335, 323, 315, 305, 293, 286, 273, 266, 254, 246, 236, 227, 216], [-2, -2, -2, -2, -2, -2, -2, -2, -2, 701, 722, 762, 789, 823, 846, 883, 909, 946, 974, 1015, 1053, 1076, 1116, 1145, 1187, 1216, 1251, 1266, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 910, 939, 1001, 1040, 1106, 1161, 1209, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2]];真实车道线:[[-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 648, 636, 626, 615, 605, 595, 585, 575, 565, 554, 545, 536, 526, 517, 508, 498, 489, 480, 470, 461, 452, 442, 433, 424, 414, 405, 396, 386, 377, 368, 359, 349, 340, 331, 321, 312, 303, 293, 284, 275, 265, 256, 247, 237, 228, 219], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 681, 692, 704, 716, 728, 741, 754, 768, 781, 794, 807, 820, 834, 847, 860, 873, 886, 900, 913, 926, 939, 952, 966, 979, 992, 1005, 1018, 1032, 1045, 1058, 1071, 1084, 1098, 1111, 1124, 1137, 1150, 1164, 1177, 1190, 1203, 1216, 1230, 1243, 1256, 1269], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 713, 746, 778, 811, 845, 880, 916, 951, 986, 1022, 1057, 1092, 1128, 1163, 1198, 1234, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2], [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, 754, 806, 858, 909, 961, 1013, 1064, 1114, 1164, 1213, 1263, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2]];标注的y点y_samples:[160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710];run_time:lanenet模型模型预测所用时间
            except BaseException as e:
                raise Exception('Format of lanes error.')
            accuracy += a
            fp += p # False Positive : False(检测模型不能成功地) Positive (判定出结果是Positive的)FP=预测错了的车道线/预测的车道线

            fn += n # False Negative : False(检测模型不能成功地) Negative (判定出结果是Negative的) FN=错过的ground-truth车道数/所有ground-truth车道数
        num = len(gts)
        # the first return parameter is the default ranking parameter
        return json.dumps([
            {'name': 'Accuracy', 'value': accuracy / num, 'order': 'desc'},
            {'name': 'FP', 'value': fp / num, 'order': 'asc'},
            {'name': 'FN', 'value': fn / num, 'order': 'asc'}
        ])


if __name__ == '__main__':
    import sys
    try:
        if len(sys.argv) != 3:
            raise Exception('Invalid input arguments')
        print(LaneEval.bench_one_submit(sys.argv[1], sys.argv[2]))
    except Exception as e:
        print(e.message)
        sys.exit(e.message)

关于sklearn中LinearRegression的例子,看网上好多教程也没说清。还是直接看源码的例子给力:

    Examples
    --------
    >>> import numpy as np
    >>> from sklearn.linear_model import LinearRegression
    >>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
    >>> # y = 1 * x_0 + 2 * x_1 + 3
    >>> y = np.dot(X, np.array([1, 2])) + 3
    >>> reg = LinearRegression().fit(X, y)  #拟合
    >>> reg.score(X, y) # 拟合质量
    1.0
    >>> reg.coef_ # 系数
    array([1., 2.])
    >>> reg.intercept_ # 截距项
    3.0000...
    >>> reg.predict(np.array([[3, 5]])) # 预测
    array([16.])
    """
  1. 结果如下:

在这里插入图片描述

在这里插入图片描述在用增强后的数据测试时,报错了:

Traceback (most recent call last):
  File "/home/wqf/ECBM6040-Project/Notebook-experiment/Evaluation of Lanenet_AUG.py", line 55, in <module>
    binary_final_logits, instance_embedding = LaneNet_model(gt_image.unsqueeze(0))  #
  File "/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wqf/ECBM6040-Project/Lanenet/model2.py", line 516, in forward
    x = self.initial_block(x)
  File "/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wqf/ECBM6040-Project/Lanenet/model2.py", line 59, in forward
    main = self.main_branch(x)
  File "/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/wqf/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

Process finished with exit code 1

这个信息又是CPU和GPU的问题。

从上往下改bug,在File "/home/wqf/ECBM6040-Project/Notebook-experiment/Evaluation of Lanenet_AUG.py", line 55文件中:

binary_final_logits, instance_embedding = LaneNet_model(gt_image.unsqueeze(0))  #

确认赋值号左右都是在GPU上的。经检查发现gt_image在GPU上,所以改该行代码:

binary_final_logits, instance_embedding = LaneNet_model(gt_image.unsqueeze(0).cuda())

便调试成功。

产生GIF动图

通过ECBM6040-Project/Notebook-experiment/Generate Video and show the result.ipynb文件,你可以产生GIF动图,动图输出在:ECBM6040-Project/TUSIMPLE/gif_output

# _*_ coding utf-8 _*_
# 开发团队:
# 开发人员:wqf
# 开发时间:2021/1/27 下午5:28
# 文件名称:Generate Video and show the result.py
# 开发工具:PyCharm
import os.path as ops
import numpy as np
import torch
import cv2
import time
import os
import matplotlib.pylab as plt
import sys
from tqdm import tqdm
import imageio

sys.path.append('..')
from dataset.dataset_utils import TUSIMPLE
from Lanenet.model2 import Lanenet
from utils.evaluation import gray_to_rgb_emb, process_instance_embedding, video_to_clips

# Load the Model
model_path = '../TUSIMPLE/Lanenet_output/lanenet_epoch_39_batch_8.model'
LaneNet_model = Lanenet(2, 4)
LaneNet_model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))

# Load the Test Dataset
root = '../TUSIMPLE/txt_for_local'
train_set = TUSIMPLE(root=root, flag='train')
valid_set = TUSIMPLE(root=root, flag='valid')
test_set = TUSIMPLE(root=root, flag='test')

print('train_set length {}'.format(len(train_set)))
print('valid_set length {}'.format(len(valid_set)))
print('test_set length {}'.format(len(test_set)))

gt, bgt, igt = test_set[20]
print('image type {}'.format(type(gt)))
print('image size {} \n'.format(gt.size()))

print('gt binary image type {}'.format(type(bgt)))
print('gt binary image size {}'.format(bgt.size()))
print('items in gt binary image {} \n'.format(torch.unique(bgt)))

print('gt instance type {}'.format(type(igt)))
print('gt instance size {}'.format(igt.size()))
print('items in gt instance {} \n'.format(torch.unique(igt)))

data_loader_test = torch.utils.data.DataLoader(test_set, batch_size=1, shuffle=False,
                                               num_workers=0)

# Get one output from the network
binary_final_logits, instance_embedding = LaneNet_model(gt.unsqueeze(0))
print('binary_final_logits shape: {}'.format(binary_final_logits.shape))
print('instance_embedding shape: {}'.format(instance_embedding.shape))

# Show one result on Test Dataset
# For ground truth
gt_image_show = ((gt.numpy() + 1) * 127.5).astype(int)  # [3, 256, 512]

plt.figure(figsize=(20, 10))
ax1 = plt.subplot(221)
image_show = gt_image_show.transpose(1, 2, 0)
image_show = image_show[..., ::-1]
plt.imshow(image_show)

ax1 = plt.subplot(222)
plt.imshow(bgt, cmap='gray')

ax1 = plt.subplot(223)
rbg_emb = gray_to_rgb_emb(igt)
plt.imshow(rbg_emb)

ax1 = plt.subplot(224)
a = 0.7
plt.imshow(a * image_show / 255 + (1 - a) * rbg_emb)

# For binary_final_logits
binary_img = torch.argmax(binary_final_logits, dim=1).squeeze().numpy()  # binary_img:有车道线的地方是1,没有则是0
# plt.imshow(binary_img, cmap='gray')

# For instance_embedding
rbg_emb, cluster_result = process_instance_embedding(instance_embedding, binary_img,
                                                     distance=1, lane_num=5)

# Show result
plt.figure(figsize=(20, 10))
ax1 = plt.subplot(221)
plt.imshow(image_show)
plt.title('Original image')

ax1 = plt.subplot(222)
plt.imshow(binary_img, cmap='gray')
plt.title('Binary lane segmentation')

ax1 = plt.subplot(223)
plt.imshow(rbg_emb)
plt.title('Pixel embeddings')

ax1 = plt.subplot(224)
a = 0.7
plt.imshow(a * image_show / 255 + (1 - a) * rbg_emb)
plt.title('Final result')


# Generate videos
### Read test_video.mp4 and make to clips
# video_name = '../TUSIMPLE/test_video/test_video.mp4' video_to_clips(video_name)

# Read clips into dataset
# test_clips_root = '/Users/smiffy/Documents/GitHub/TUSIMPLE/test_video/clips'
def clips_to_gif(test_clips_root, git_root): # test_clips_root:源目录'../TUSIMPLE/test_clips/1494452927854312215' git_root:'../TUSIMPLE/gif_output/1494452927854312215.gif'
    img_paths = []
    for img_name in os.listdir(test_clips_root):
        img_paths.append(ops.join(test_clips_root, img_name))
    img_paths.sort() # 列表数据结构的内置方法:排序
    gif_frames = []
    for i, img_name in enumerate(img_paths):
        gt_img_org = cv2.imread(img_name, cv2.IMREAD_UNCHANGED)
        org_shape = gt_img_org.shape # (720, 1280, 3)
        gt_image = cv2.resize(gt_img_org, dsize=(512, 256),
                              interpolation=cv2.INTER_LINEAR)
        gt_image = gt_image / 127.5 - 1.0
        gt_image = torch.tensor(gt_image, dtype=torch.float)
        gt_image = np.transpose(gt_image, (2, 0, 1)) # (3,720,1280)

        binary_final_logits, instance_embedding = LaneNet_model(gt_image.unsqueeze(0))
        binary_img = torch.argmax(binary_final_logits, dim=1).squeeze().numpy()
        binary_img[0:65, :] = 0
        rbg_emb, cluster_result = process_instance_embedding(instance_embedding,
                                                             binary_img,
                                                             distance=1.5, lane_num=4)

        rbg_emb = cv2.resize(rbg_emb, dsize=(org_shape[1], org_shape[0]),
                             interpolation=cv2.INTER_LINEAR)
        a = 0.6
        frame = a * gt_img_org[..., ::-1] / 255 + rbg_emb * (1 - a)
        frame = np.rint(frame * 255)
        frame = frame.astype(np.uint8)
        gif_frames.append(frame)
    imageio.mimsave(git_root, gif_frames, fps=5) # git_root:生成图片的文件名称,要生成gif的素材图片。这里是20帧图片。fps:一个参数,应该是一秒显示多少帧,网上没找到


clips_root = '../TUSIMPLE/test_clips'
gif_dir = '../TUSIMPLE/gif_output'
if not os.path.exists(gif_dir):
    os.makedirs(gif_dir)
for dir_name in os.listdir(clips_root):
    if dir_name == '.DS_Store':
        continue
    print('Pdrocess the clip {} \n'.format(dir_name))
    test_clips_root = ops.join(clips_root, dir_name)
    git_root = ops.join(gif_dir, dir_name) + '.gif'
    clips_to_gif(test_clips_root, git_root)

呼~收工。

©️2022 CSDN 皮肤主题:大白 设计师:CSDN官方博客 返回首页

打赏作者

Wincher_Fan

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值