自定义collate_fn函数：应对报错RuntimeError: stack expects each tensor to be equal size

Coisíní℘

已于 2024-03-25 21:07:00 修改

阅读量742

点赞数 5

文章标签： python 人工智能深度学习

于 2024-03-25 21:05:31 首次发布

本文链接：https://blog.csdn.net/qq_43858783/article/details/137025343

版权

本文讨论了在使用BERT和ResNet生成文本和图像特征时遇到的维度不匹配问题，通过自定义collate_fn函数实现数据的动态调整，确保不同长度数据在DataLoader中的正确处理，以应用于多模态虚假新闻检测任务。

摘要由CSDN通过智能技术生成

使用BERT或ResNet分别生成文本和图像特征时，由于文本自身长度和图像大小的限制，导致最后形成的特征数据在送入DataLoader时会因为维度不同而报错：

RuntimeError: stack expects each tensor to be equal size

此时就需要自定义collate_fn函数实现数据的自定义加载功能，下面首先看一下装入Dataset中的数据是什么：
在这里插入图片描述
可以看到：这里的batch是一个批量的数据，这和超参数batch_size大小相关联。它是一个list类型的数据，其中每一个元素是一个包含了(数据1,数据2,...,数据n,label)形式的元组，例如：

这里数据个数n取决于你的Dataset中究竟是什么样的数据。以这个项目为例，这是一个多模态虚假新闻检测的例子中生成的数据，其中下标为0的数据是我们根据一张图片检测后形成的锚框以及整张图片的feature两者concat形成的特征值。具体可见下面代码段：

class UEMDataset(Dataset):
    def __init__(self,df,root_dir,image_id,text_id,image_vec_dir,text_vec_dir):
        # super(UNDataset, self).__init__()
        self.df = df
        self.root_dir = root_dir
        self.image_id = image_id
        self.text_id = text_id
        self.image_vec_dir = image_vec_dir
        self.text_vec_dir = text_vec_dir
        self.adaptive_pooling = nn.AdaptiveAvgPool1d(768)

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        # filenames for the idx
        file_name_image = self.df[self.image_id][idx].split(",")[0]
        file_name_text = self.df[self.text_id][idx]

        file_name = f"{self.root_dir}{self.image_vec_dir}{file_name_image}_full_image.npy"
        new_file_path = ''.join(file_name.split())

        image_vec_full = np.load(new_file_path)
        ## Load node embeddings for objects present in the text
        try:
            image_vec = np.load(f'{self.root_dir}{self.image_vec_dir}{file_name_image}.npy')
            all_image_vec = np.concatenate([image_vec_full, image_vec], axis=0)
            image_vec = all_image_vec
        except:
            image_vec = image_vec_full

        ## Resize the image vectors to match the text embedding dimension
        image_vec = self.adaptive_pooling(torch.tensor(image_vec).float().unsqueeze(0)).squeeze(0)
        ## Load node embeddings for tokens present in the text
        text_vec = np.load(f'{self.root_dir}{self.text_vec_dir}{file_name_text}.npy')

        ## Load full image node embedding
        text_vec_full = np.load(f'{self.root_dir}{self.text_vec_dir}{file_name_text}_full_text.npy')

        all_text_vec = np.concatenate([text_vec_full, text_vec], axis=0)
        text_vec = all_text_vec
        text_vec = torch.from_numpy(text_vec)
        print(type(text_vec))
        ## Node embeddings for the multimodal graph
        all_vec = np.concatenate([text_vec, image_vec], axis=0)

        ## find the label
        if self.df['label'][idx] == 'real':
            label = 0
        elif self.df['label'][idx] == 'fake':
            label = 1
        return image_vec,text_vec,label

以上就是用来生成数据的Dataset类，但是这里有一个问题：如何区分训练集和测试集数据，此时就需要借助你的数据样本了。一般的数据样本是分为两个文件存储的，这里的两个文件分别是下图中的tweetsTrain和tweetsTest文件，：在这里插入图片描述
所以只需要在文件中采用pandas读取两者的数据，分别装入两个Dataset中就可以了。类似于：

def set_up_mediaeval2015():
    df_train = pd.read_csv(f'{config.root_dir}{config.me15_train_csv_name}',encoding='utf-8',encoding_errors='ignore')
    df_train = df_train.dropna().reset_index(drop=True)
    df_test = pd.read_csv(f'{config.root_dir}{config.me15_test_csv_name}',encoding='utf-8',encoding_errors='ignore')
    df_test = df_test.dropna().reset_index(drop=True)
    dataset_train = Dataset.UEMDataset(df_train, config.root_dir, "imageId(s)", "tweetId",
                                         config.me15_image_vec_dir, config.me15_text_vec_dir)

    dataset_test = Dataset.UEMDataset(df_test, config.root_dir, "imageId(s)", "tweetId",
                                        config.me15_image_vec_dir, config.me15_text_vec_dir)
    return dataset_train, dataset_test

说明数据的来源后，下面看一下形成一个batch后的数据是什么样子的：
在这里插入图片描述
需要注意：这里列表中的每一个元素都是你送入model最终的数据，而且pytorch要求每一个batch中的数据的形状要保持一致，所以需要将list中第一个和第二个元素（tensor形式）的第一个维度都放大到它们之中最大的维度。具体做法为：

# 自定义collate_fn函数
def collate_fn(batch):
    # 修改代码
    # 找到tensor1和tensor2的第一个维度的最大长度
    max_length_dim1 = max(item[0].shape[0] for item in batch)
    max_length_dim2 = max(item[1].shape[0] for item in batch)

    # 扩展tensor1和tensor2的第一个维度
    expanded_data = [
        (torch.cat([item[0], torch.zeros(max_length_dim1 - item[0].shape[0], item[0].shape[1])]),
         torch.cat([item[1], torch.zeros(max_length_dim2 - item[1].shape[0], item[1].shape[1])]),
         item[2])
        for item in batch
    ]
    return expanded_data