多模态微调 最麻烦的其实是数据准备:大体如下
一个字段要包含一个对应的图片
{ "id": "000000033471", "image": ["coco/train2017/000000033471.jpg"], "conversations": [ { "from": "human", "value": "<image>\nWhat are the colors of the bus in the image?" }, { "from": "gpt", "value": "The bus in the image is white and red." } ] }