AutoGluon处理多模态数据方法及案例——Multimodal Data Tables: Tabular, Text, and Image

本文链接：https://blog.csdn.net/weixin_52561314/article/details/124732874

教程介绍了使用AutoGluon处理包含表格、文本和图像的多模态数据，以PetFinder数据集为例，涵盖了数据预处理、特征元数据构建、超参数设置和模型训练。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

多模式数据表：表格、文本和图像

注意：本教程需要 GPU 才能训练图像和文本模型。此外，具有适当 CUDA 版本的 MXNet 和 Torch 需要安装 GPU。

PetFinder 数据集

我们将使用PetFinder 数据集。PetFinder 数据集提供有关收容所动物的信息，这些信息出现在其收养档案中，目的是预测动物的收养率。最终目标是让救援避难所使用预测的收养率来识别可以改善其档案的动物，以便他们找到一个家。

每只动物的收养档案都包含各种信息，例如动物的图片、动物的文字描述以及各种表格特征，例如年龄、品种、名称、颜色等。

首先，我们首先需要下载数据集。包含图像的数据集需要的不仅仅是 CSV 文件，因此数据集在 S3 中打包在一个 zip 文件中。我们将首先下载它并解压缩内容：

download_dir = './ag_petfinder_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_kaggle.zip'

from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)

现在数据已经下载并解压，我们来看看内容：

import os
os.listdir(download_dir)


['petfinder_processed', 'file.zip']

“file.zip”是我们下载的原始 zip 文件，“petfinder_processed”是包含数据集文件的目录。

dataset_path = download_dir + '/petfinder_processed'
os.listdir(dataset_path)


['test.csv', 'dev.csv', 'test_images', 'train_images', 'train.csv']

在这里，我们可以看到 train、test 和 dev CSV 文件，以及两个目录：“test_images”和“train_images”，其中包含图像 JPG 文件。

注意：我们将使用 dev 数据作为测试数据，因为 dev 包含用于显示分数的基本事实标签predictor.leaderboard。让我们看一下“train_images”目录中的前 10 个文件：

os.listdir(dataset_path + '/train_images')[:10]

['ca587cb42-1.jpg',
 'ae00eded4-4.jpg',
 '6e3457b81-2.jpg',
 'acb248693-1.jpg',
 '0bd867d1b-1.jpg',
 'fa53dd6cd-1.jpg',
 '9726ab93e-1.jpg',
 '39818f12c-2.jpg',
 '90ce48a71-2.jpg',
 '2ece6b26b-1.jpg']

接下来，我们将加载 train 和 dev CSV 文件：

import pandas as pd

train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/dev.csv', index_col=0)

train_data.head(3)

	Type	Name	Age	Breed1	Breed2	Gender	Color1	Color2	Color3	MaturitySize	...	Quantity	Fee	State	RescuerID	VideoAmt	Description	PetID	PhotoAmt	AdoptionSpeed	Images
10721	1	Elbi	2	307	307	2	5	0	0	3	...	1	0	41336	e9a86209c54f589ba72c345364cf01aa	0	I'm looking for people to adopt my dog	e4b90955c	4.0	4	train_images/e4b90955c-1.jpg;train_images/e4b9...
13114	2	Darling	4	266	0	1	1	0	0	2	...	1	0	41401	01f954cdf61526daf3fbeb8a074be742	0	Darling was born at the back lane of Jalan Alo...	a0c1384d1	5.0	3	train_images/a0c1384d1-1.jpg;train_images/a0c1...
13194	1	Wolf	3	307	0	1	1	2	0	2	...	1	0	41332	6e19409f2847326ce3b6d0cec7e42f81	0	I found Wolf about a month ago stuck in a drai...	cf357f057	7.0	4	train_images/cf357f057-1.jpg;train_images/cf35..

3行×25列

查看前 3 个示例，我们可以看出有多种表格特征、文本描述（'Description'）和图像路径（'Images'）。

对于 PetFinder 数据集，我们将尝试预测动物的收养速度（“AdoptionSpeed”），分为 5 个类别。这意味着我们正在处理一个多类分类问题。

label = 'AdoptionSpeed'
image_col = 'Images'

让我们看一下图像列中的值是什么样的：

train_data[image_col].iloc[0]

'train_images/e4b90955c-1.jpg;train_images/e4b90955c-2.jpg;train_images/e4b90955c-3.jpg;train_images/e4b90955c-4.jpg'

目前，AutoGluon 仅支持每行一张图像。由于 PetFinder 数据集每行包含一个或多个图像，我们首先需要对图像列进行预处理，使其仅包含每行的第一个图像。

train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0])
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])

train_data[image_col].iloc[0]


'train_images/e4b90955c-1.jpg'

AutoGluon 根据图像列提供的文件路径加载图像。

在这里，我们更新路径以指向磁盘上的正确位置：

def path_expander(path, base_folder):
    path_l = path.split(';')
    return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l]