Datawhale Al 夏令营：从零入门多模态大模型数据合成1.0-CSDN博客

本文链接：https://blog.csdn.net/m0_58854572/article/details/141271380

测评指标（两者取平均）

28k+含有文字的图像（可用OCR识别）和45k+问题，图像来自Open Images，问题包含问句和Ground Truth答案（453k+）

数据示例：

{"question": "what type of plane is this?", 
"image_id": "073f668cdc671c37", 
"image_classes": ["Tree", "Vehicle", "Airplane", "Aircraft"], 
"flickr_original_url":"https://farm8.staticflickr.com/8292/7496758474_eea4bc6745_o.jpg, 
"flickr_300k_url": "https://c3.staticflickr.com/9/8292/7496758474_ef1827aaff_z.jpg", 
"image_width": 1024, 
"image_height": 683, 
"answers": ["south african", "hidehi matsui", "south african", "south african", "south african", "south african", "south african ", "south africa", "south african", "707"], 
"question_tokens": ["what", "type", "of", "plane", "is", "this"], 
"question_id": 33, "set_name": "train"},

-MMBench (2023) 2307.06281 (arxiv.org)

MMBench: Is Your Multi-modal Model an All-around Player?

问题为单项选择

多维度评测模型的理解能力，参考多模态模型评测神器 | OpenCompass MMBench 了解一下！-CSDN博客

2024/8/16 23:18

夏令营排行榜（8.17截止）前五有证书

挑战赛排行榜（8.23截止）前十进决赛

1、官方baseline

【镜像】：独立软件包，包括【环境】和基于环境开发的软件

——>选择镜像创建实例

——>激活conda环境dj for data juicer

——>git clone赛事相关包

——>安装必要工具pip apt

----------------------------------------数据集：download.sh-->fulldownload.sh---------------------------------

要下载到数据盘而不是系统盘，不然空间不够

【basemodel】model_zoo/LLM/gemma

【seed：都是图片】input/pretrain_stage_1_10k

——>input/pretrain_stage_1

全量数据是400k，线上赛只能使用至多200k

# for training data
echo "[2] Downloading seed datasets..."
mkdir -p ${SCRIPT_DIR}/input
cd ${SCRIPT_DIR}/input
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/pretrain_stage_1.tar.gz
tar zxvf pretrain_stage_1.tar.gz && rm -rf pretrain_stage_1.tar.gz
cd pretrain_stage_1
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/mgm_pretrain_stage_1.jsonl
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/stage_1.json

【finetune】input/finetuning_stage_1_12k

【evaluation】toolkit/training/data

-----------------------------------------模型：download_blip.py------------------------------------------------------

from modelscope import snapshot_download

model_dir = snapshot_download('goldsj/blip2-opt-2.7b',
cache_dir='/root/autodl-tmp/better_synth_baseline_autoDL/models',
revision='master')

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (arxiv.org)2022年

============================关机，保存镜像。==============================

要等一会儿且保存的时候不可以开机，并且创建新实例使用保存的镜像要重新下载数据（选择多卡增加后续合成数据的速度but增加开机困难，无卡开机下载数据）