【ChatBot开发笔记】语料处理——数据整形

最新推荐文章于 2024-09-27 17:31:02 发布

Mars_阿火

最新推荐文章于 2024-09-27 17:31:02 发布

阅读量143

点赞数

分类专栏： ChatBot 文章标签：人工智能 python 自然语言处理

本文链接：https://blog.csdn.net/qq_44776055/article/details/115985112

版权

ChatBot 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

def collate_fn(batch):
    """
    计算该batch中的所有sample的最长的input，并且通过末尾补0将其他input的长度向其对齐
    """
    global pad_id
    input_ids = []
    btc_size = len(batch)
    # 该batch中最长的input，用于该batch的数据对齐
    max_input_len = 0  
    # 计算该batch中input的最大长度
    for btc_idx in range(btc_size):
        if max_input_len < len(batch[btc_idx]):
            max_input_len = len(batch[btc_idx])
    # 使用pad_id对小于max_input_len的input_id进行补全
    for btc_idx in range(btc_size):
        input_len = len(batch[btc_idx])
        input_ids.append(batch[btc_idx])
        input_ids[btc_idx].extend([pad_id] * (max_input_len - input_len))
    return torch.tensor(input_ids, dtype=torch.long)