（7-5-2）行为预测算法：Social LSTM轨迹预测算法

最新推荐文章于 2024-08-13 16:08:21 发布

码农三叔

最新推荐文章于 2024-08-13 16:08:21 发布

阅读量968

点赞数 20

分类专栏：《自动驾驶核心算法实战---Python卷》文章标签：深度学习 lstm rnn python 计算机视觉

本文链接：https://blog.csdn.net/asd343442/article/details/136960977

版权

《自动驾驶核心算法实战---Python卷》专栏收录该内容

75 篇文章 27 订阅

订阅专栏

本文详细描述了一个基于LSTM的Social-LSTM模型，用于轨迹预测，考虑了位置、网格掩码和社交关系。模型通过神经网络结构处理输入，用于训练和测试，并通过前向传播进行预测。同时介绍了如何通过网格离散化捕捉行人之间的交互关系。

摘要由CSDN通过智能技术生成

7.5.4 Social-LSTM 模型

在本项目中，文件model.py 实现了 Social LSTM 模型，该模型用于轨迹预测。模型的主体是一个基于 LSTM 的循环神经网络，其中包括嵌入层、LSTM 单元、输出层等组件。模型通过考虑输入位置、网格掩码以及隐藏状态等信息，利用 Social Tensor 来捕捉轨迹之间的社交关系。该模型在训练和测试阶段都可以使用，并通过 forward 方法进行前向传播。通过隐藏状态和细胞状态的更新，模型能够对未来轨迹进行有效预测。

class SocialLSTM(nn.Module):
    '''
    表示 Social LSTM 模型的类
    '''
    def __init__(self, args, infer=False):
        '''
        初始化函数
        参数：
        args: 训练参数
        infer: 训练或测试时间（如果是测试时间为True）
        '''
        super(SocialLSTM, self).__init__()

        self.args = args
        self.infer = infer

        if infer:
            # 测试时间
            self.seq_length = 1
        else:
            # 训练时间
            self.seq_length = args.seq_length

        # 存储所需的尺寸
        self.rnn_size = args.rnn_size
        self.grid_size = args.grid_size
        self.embedding_size = args.embedding_size
        self.input_size = args.input_size
        self.output_size = args.output_size

        # LSTM 单元
        self.cell = nn.LSTMCell(2 * self.embedding_size, self.rnn_size)

        # 用于嵌入输入位置的线性层
        self.input_embedding_layer = nn.Linear(self.input_size, self.embedding_size)
        # 用于嵌入社交张量的线性层
        self.tensor_embedding_layer = nn.Linear(
            self.grid_size * self.grid_size * self.rnn_size, self.embedding_size
        )

        # 将 LSTM 的隐藏状态映射到输出的线性层
        self.output_layer = nn.Linear(self.rnn_size, self.output_size)

        # ReLU 和 dropout 单元
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(args.dropout)

    def getSocialTensor(self, grid, hidden_states):
        '''
        计算给定网格掩码和所有行人的隐藏状态的社交张量
        参数：
        grid: 网格掩码
        hidden_states: 所有行人的隐藏状态
        '''
        # 行人的数量
        numNodes = grid.size()[0]
        # 构造变量
        social_tensor = Variable(torch.zeros(numNodes, self.grid_size * self.grid_size, self.rnn_size))
        # 对于每个行人
        for node in range(numNodes):
            # 计算社交张量
            social_tensor[node] = torch.mm(torch.t(grid[node]), hidden_states)

        # 重塑社交张量
        social_tensor = social_tensor.view(numNodes, self.grid_size * self.grid_size * self.rnn_size)
        return social_tensor

    def forward(self, nodes, grids, nodesPresent, hidden_states, cell_states):
        '''
        模型的前向传播
        参数：
        nodes: 输入位置
        grids: 网格掩码
        nodesPresent: 每帧中存在的行人
        hidden_states: 行人的隐藏状态
        cell_states: 行人的细胞状态

        返回：
        outputs_return: 对应于双变量高斯分布的输出
        hidden_states
        cell_states
        '''
        # 序列中的行人数量
        numNodes = nodes.size()[1]

        # 构造输出变量
        outputs = Variable(torch.zeros(self.seq_length * numNodes, self.output_size))

        # 对于序列中的每一帧
        for framenum in range(self.seq_length):
            # 当前帧中存在的行人
            nodeIDs = nodesPresent[framenum]

            if len(nodeIDs) == 0:
                # 如果没有行人，则转到下一帧
                continue

            # 节点的列表
            list_of_nodes = Variable(torch.LongTensor(nodeIDs))

            # 选择相应的输入位置
            nodes_current = torch.index_select(nodes[framenum], 0, list_of_nodes)

            # 获取相应的网格掩码
            grid_current = grids[framenum]

            # 获取相应的隐藏状态和细胞状态
            hidden_states_current = torch.index_select(hidden_states, 0, list_of_nodes)
            cell_states_current = torch.index_select(cell_states, 0, list_of_nodes)

            # 计算社交张量
            social_tensor = self.getSocialTensor(grid_current, hidden_states_current)

            # 嵌入输入
            input_embedded = self.dropout(self.relu(self.input_embedding_layer(nodes_current)))
            # 嵌入社交张量
            tensor_embedded = self.dropout(self.relu(self.tensor_embedding_layer(social_tensor)))

            # 连接输入
            concat_embedded = torch.cat((input_embedded, tensor_embedded), 1)

            # LSTM 的一步
            h_nodes, c_nodes = self.cell(concat_embedded, (hidden_states_current, cell_states_current))

            # 计算输出
            outputs[framenum * numNodes + list_of_nodes.data] = self.output_layer(h_nodes)

            # 更新隐藏和细胞状态
            hidden_states[list_of_nodes.data] = h_nodes
            cell_states[list_of_nodes.data] = c_nodes

        # 重塑输出
        outputs_return = Variable(torch.zeros(self.seq_length, numNodes, self.output_size))

        for framenum in range(self.seq_length):
            for node in range(numNodes):
                outputs_return[framenum, node, :] = outputs[framenum * numNodes + node, :]

        return outputs_return, hidden_states, cell_states

7.5.5 网格来离散化

在本项目中，文件grid.py的主要作用是为 Social LSTM 模型提供输入数据的预处理，具体而言，它生成表示行人在相互网格中占用情况的二进制掩码。这些掩码用于捕捉行人之间的社交关系，有助于模型在训练和推理阶段更好地理解和预测行人的运动行为。Social LSTM 模型依赖于这些生成的掩码，以更全面的方式建模行人之间的相互作用，从而提高对复杂场景中行人运动的建模能力。

在文件grid.py中定义了两个函数，用于生成表示行人在相互网格中占用情况的二进制掩码。其中函数getGridMask 通过计算给定框架中每个行人与其他行人之间的相对位置，生成二进制掩码。它考虑了邻域大小和网格离散化，并为每个行人生成相应的掩码。函数getGridMaskInference 在推理阶段执行类似的操作，但接受的输入格式略有不同。这两个函数的输出是表示行人之间关系的二进制掩码序列，用于实现Social-LSTM 模型的输入。

def getGridMask(frame, dimensions, neighborhood_size, grid_size):
    '''
    计算表示每个行人在其他行人网格中占用情况的二进制掩码
    参数:
    frame: MNP x 3 矩阵，每行为 [pedID, x, y]
    dimensions: 包含宽度和高度的列表 [width, height]
    neighborhood_size: 考虑的邻域大小的标量值
    grid_size: 网格离散化的大小的标量值
    '''

    # 最大行人数量
    mnp = frame.shape[0]
    width, height = dimensions[0], dimensions[1]

    frame_mask = np.zeros((mnp, mnp, grid_size**2))

    width_bound, height_bound = (neighborhood_size/(width*1.0))*2, (neighborhood_size/(height*1.0))*2

    # 对于每个帧中的行人（存在和不存在的）
    for pedindex in range(mnp):

        # 获取当前行人的 x 和 y
        current_x, current_y = frame[pedindex, 1], frame[pedindex, 2]

        width_low, width_high = current_x - width_bound/2, current_x + width_bound/2
        height_low, height_high = current_y - height_bound/2, current_y + height_bound/2

        # 对于所有其他行人
        for otherpedindex in range(mnp):

            # 如果另一个行人的 ID 与当前行人的 ID 相同
            if frame[otherpedindex, 0] == frame[pedindex, 0]:
                # 行人不能计入自己的网格
                continue

            # 获取另一个行人的 x 和 y
            other_x, other_y = frame[otherpedindex, 1], frame[otherpedindex, 2]

            if other_x >= width_high or other_x < width_low or other_y >= height_high or other_y < height_low:
                # 行人不在周围，因此二进制掩码应为零
                continue

            # 如果在周围，计算网格单元
            cell_x = int(np.floor(((other_x - width_low)/width_bound) * grid_size))
            cell_y = int(np.floor(((other_y - height_low)/height_bound) * grid_size))

            if cell_x >= grid_size or cell_x < 0 or cell_y >= grid_size or cell_y < 0:
                continue

            # 其他行人在当前行人对应的网格单元中
            frame_mask[pedindex, otherpedindex, cell_x + cell_y*grid_size] = 1

    return frame_mask

def getGridMaskInference(frame, dimensions, neighborhood_size, grid_size):
    mnp = frame.shape[0]
    width, height = dimensions[0], dimensions[1]

    frame_mask = np.zeros((mnp, mnp, grid_size**2))

    width_bound, height_bound = (neighborhood_size/(width*1.0))*2, (neighborhood_size/(height*1.0))*2

    # 对于每个帧中的行人（存在和不存在的）
    for pedindex in range(mnp):
        # 获取当前行人的 x 和 y
        current_x, current_y = frame[pedindex, 0], frame[pedindex, 1]

        width_low, width_high = current_x - width_bound/2, current_x + width_bound/2
        height_low, height_high = current_y - height_bound/2, current_y + height_bound/2

        # 对于所有其他行人
        for otherpedindex in range(mnp):
            # 如果另一个行人的 ID 与当前行人的 ID 相同
            if otherpedindex == pedindex:
                # 行人不能计入自己的网格
                continue

            # 获取另一个行人的 x 和 y
            other_x, other_y = frame[otherpedindex, 0], frame[otherpedindex, 1]
            if other_x >= width_high or other_x < width_low or other_y >= height_high or other_y < height_low:
                # 行人不在周围，因此二进制掩码应为零
                continue

            # 如果在周围，计算网格单元
            cell_x = int(np.floor(((other_x - width_low)/width_bound) * grid_size))
            cell_y = int(np.floor(((other_y - height_low)/height_bound) * grid_size))

            if cell_x >= grid_size or cell_x < 0 or cell_y >= grid_size or cell_y < 0:
                continue
            
            # 其他行人在当前行人对应的网格单元中
            frame_mask[pedindex, otherpedindex, cell_x + cell_y*grid_size] = 1

    return frame_mask

def getSequenceGridMask(sequence, dimensions, neighborhood_size, grid_size):
    '''
    获取序列中所有帧的网格掩码
    参数:
    sequence: 形状为 SL x MNP x 3 的 numpy 矩阵
    dimensions: 包含宽度和高度的列表 [width, height]
    neighborhood_size: 考虑的邻域大小的标量值
    grid_size: 网格离散化的大小的标量值
    '''

    processed_sequence = np.transpose(sequence,(2,0,1))
    sl = len(processed_sequence)
    sequence_mask = []

    for i in range(sl):
        # sequence_mask[i, :, :, :] = getGridMask(sequence[i, :, :], dimensions, neighborhood_size, grid_size)
        #sequence_mask.append(Variable(torch.from_numpy(getGridMask(sequence[i], dimensions, neighborhood_size, grid_size)).float()).cuda())
        sequence_mask.append(Variable(torch.from_numpy(getGridMask(processed_sequence[i], dimensions, neighborhood_size, grid_size)).float()))

    return sequence_mask

7.5.6 训练采样

在本项目中，文件train_helper.py的作用是提供了训练过程中的辅助函数，特别是 sample 函数，它实现了模型的采样功能。这个文件中的函数通过对 Social LSTM 模型的运用，能够在给定输入轨迹的情况下，生成模型预测的轨迹样本。这在训练过程中是很关键的，因为它允许评估模型对于轨迹生成任务的性能，同时也为模型的监督学习提供了辅助。文件中还包含了一些其他的辅助函数，如提取生成模型输出参数的 getCoef 和根据参数进行高斯分布采样的 sample_gaussian_2d。这些函数一起构成了一个完整的训练辅助模块，支持 Social LSTM 模型的训练和评估。

def sample(nodes, nodesPresent, grid, args, net, true_nodes, true_nodesPresent, true_grid, saved_args, dimensions):
    '''
    采样函数
    参数:
    nodes: 输入位置
    nodesPresent: 每帧存在的行人
    args: 参数
    net: 模型
    true_nodes: 真实位置
    true_nodesPresent: 真实每帧存在的行人
    true_grid: 真实网格掩码
    saved_args: 训练参数
    dimensions: 数据集的维度
    '''
    # 序列中的行人数
    numNodes = nodes.size()[1]

    # 构建隐藏状态和细胞状态的变量
    hidden_states = Variable(torch.zeros(numNodes, net.args.rnn_size), volatile=True)
    cell_states = Variable(torch.zeros(numNodes, net.args.rnn_size), volatile=True)

    # 对轨迹的观测部分
    for tstep in range(args.obs_length-1):
        # 进行前向传播
        out_obs, hidden_states, cell_states = net(nodes[tstep].view(1, numNodes, 2), [grid[tstep]], [nodesPresent[tstep]], hidden_states, cell_states)

    # 初始化返回数据结构
    ret_nodes = Variable(torch.zeros(args.obs_length+args.pred_length, numNodes, 2), volatile=True)
    ret_nodes[:args.obs_length, :, :] = nodes.clone()

    # 最后看到的网格
    prev_grid = grid[-1].clone()

    # 对轨迹的预测部分
    for tstep in range(args.obs_length-1, args.pred_length + args.obs_length - 1):
        # 进行前向传播
        outputs, hidden_states, cell_states = net(ret_nodes[tstep].view(1, numNodes, 2), [prev_grid], [nodesPresent[args.obs_length-1]], hidden_states, cell_states)

        # 提取双变量高斯分布的均值、标准差和相关性
        mux, muy, sx, sy, corr = getCoef(outputs)
        
        # 从双变量高斯分布中采样
        next_x, next_y = sample_gaussian_2d(mux.data, muy.data, sx.data, sy.data, corr.data, nodesPresent[args.obs_length-1])

        # 存储预测位置
        ret_nodes[tstep + 1, :, 0] = next_x
        ret_nodes[tstep + 1, :, 1] = next_y

        # 最后一个时间步的节点列表（假设它们一直存在直到结束）
        list_of_nodes = Variable(torch.LongTensor(nodesPresent[args.obs_length-1]), volatile=True)

        # 获取它们的预测位置
        current_nodes = torch.index_select(ret_nodes[tstep+1], 0, list_of_nodes)

        # 使用预测位置计算新的网格掩码
        prev_grid = getGridMaskInference(current_nodes.data.cpu().numpy(), dimensions, saved_args.neighborhood_size, saved_args.grid_size)
        prev_grid = Variable(torch.from_numpy(prev_grid).float(), volatile=True)

    return ret_nodes


# 提取均值、标准差和相关性
def getCoef(outputs):
    '''
    提取均值、标准差和相关性
    参数:
    outputs : SRNN 模型的输出
    '''
    mux, muy, sx, sy, corr = outputs[:, :, 0], outputs[:, :, 1], outputs[:, :, 2], outputs[:, :, 3], outputs[:, :, 4]

    sx = torch.exp(sx)
    sy = torch.exp(sy)
    corr = torch.tanh(corr)
    return mux, muy, sx, sy, corr


# 从二维高斯分布中采样
def sample_gaussian_2d(mux, muy, sx, sy, corr, nodesPresent):
    '''
    参数
    ==========

    mux, muy, sx, sy, corr : 形状为 1 x numNodes 的张量
    包含 x-均值、y-均值、x-标准差、y-标准差和相关性

    nodesPresent : 一个帧中存在的节点ID列表

    返回
    =======

    next_x, next_y : 形状为 numNodes 的张量
    包含从二维高斯分布中采样的值
    '''
    o_mux, o_muy, o_sx, o_sy, o_corr = mux[0, :], muy[0, :], sx[0, :], sy[0, :], corr[0, :]

    numNodes = mux.size()[1]

    next_x = torch.zeros(numNodes)
    next_y = torch.zeros(numNodes)
    for node in range(numNodes):
        if node not in nodesPresent:
            continue
        mean = [o_mux[node], o_muy[node]]
        cov = [[o_sx[node]*o_sx[node], o_corr[node]*o_sx[node]*o_sy[node]], [o_corr[node]*o_sx[node]*o_sy[node], o_sy[node]*o_sy[node]]]

        next_values = np.random.multivariate_normal(mean, cov, 1)
        next_x[node] = next_values[0][0]
        next_y[node] = next_values[0][1]

    return next_x, next_y

未完待续

码农三叔

关注

20
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
0
评论
（7-5-2）行为预测算法：Social LSTM轨迹预测算法

在本项目中，文件train_helper.py的作用是提供了训练过程中的辅助函数，特别是 sample 函数，它实现了模型的采样功能。这个文件中的函数通过对 Social LSTM 模型的运用，能够在给定输入轨迹的情况下，生成模型预测的轨迹样本。这在训练过程中是很关键的，因为它允许评估模型对于轨迹生成任务的性能，同时也为模型的监督学习提供了辅助。在本项目中，文件grid.py的主要作用是为 Social LSTM 模型提供输入数据的预处理，具体而言，它生成表示行人在相互网格中占用情况的二进制掩码。
复制链接

扫一扫