测评指标(两者取平均)
-TextVQA:TextVQAhttps://textvqa.org/
28k+含有文字的图像(可用OCR识别)和45k+问题,图像来自Open Images,问题包含问句和Ground Truth答案(453k+)
数据示例:
{"question": "what type of plane is this?", "image_id": "073f668cdc671c37", "image_classes": ["Tree", "Vehicle", "Airplane", "Aircraft"], "flickr_original_url":"https://farm8.staticflickr.com/8292/7496758474_eea4bc6745_o.jpg, "flickr_300k_url": "https://c3.staticflickr.com/9/8292/7496758474_ef1827aaff_z.jpg", "image_width": 1024, "image_height": 683, "answers": ["south african", "hidehi matsui", "south african", "south african", "south african", "south african", "south african ", "south africa", "south african", "707"], "question_tokens": ["what", "type", "of", "plane", "is", "this"], "question_id": 33, "set_name": "train"},
-MMBench (2023) 2307.06281 (arxiv.org)
MMBench: Is Your Multi-modal Model an All-around Player?
问题为单项选择
多维度评测模型的理解能力,参考多模态模型评测神器 | OpenCompass MMBench 了解一下!-CSDN博客
2024/8/16 23:18
夏令营排行榜(8.17截止)前五有证书
挑战赛排行榜(8.23截止)前十进决赛
1、官方baseline
【镜像】:独立软件包,包括【环境】和基于环境开发的软件
——>选择镜像创建实例
——>激活conda环境dj for data juicer
——>git clone赛事相关包
——>安装必要工具pip apt
----------------------------------------数据集:download.sh-->fulldownload.sh---------------------------------
要下载到数据盘而不是系统盘,不然空间不够
【basemodel】model_zoo/LLM/gemma
【seed:都是图片】input/pretrain_stage_1_10k
——>input/pretrain_stage_1
全量数据是400k,线上赛只能使用至多200k
# for training data
echo "[2] Downloading seed datasets..."
mkdir -p ${SCRIPT_DIR}/input
cd ${SCRIPT_DIR}/input
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/pretrain_stage_1.tar.gz
tar zxvf pretrain_stage_1.tar.gz && rm -rf pretrain_stage_1.tar.gz
cd pretrain_stage_1
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/mgm_pretrain_stage_1.jsonl
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/stage_1.json
【finetune】input/finetuning_stage_1_12k
【evaluation】toolkit/training/data
-----------------------------------------模型:download_blip.py------------------------------------------------------
from modelscope import snapshot_download
model_dir = snapshot_download('goldsj/blip2-opt-2.7b',
cache_dir='/root/autodl-tmp/better_synth_baseline_autoDL/models',
revision='master')
============================关机,保存镜像。==============================
要等一会儿且保存的时候不可以开机,并且创建新实例使用保存的镜像要重新下载数据(选择多卡增加后续合成数据的速度but增加开机困难,无卡开机下载数据)
激活环境conda dj
进入路径autodl-tmp/better_synth_baseline_autoDL
-----------------------------------数据合成 Data-Juicer: .yaml——>-------------------------------------------------
dj-process --config ./image_split_10.yaml——>
dj-process --config ./image_captioning_10.yaml——>
-----------------------------------多模态大模型训练策略(MGM training)测评----------------------------------
bash train_mgm_2b_stage_one_card.sh
-----------------------------------------------------用正确的格式提交----------------------------------------------------