Datawhale Al 夏令营:从零入门多模态大模型数据合成1.0

测评指标(两者取平均)

-TextVQA:TextVQAicon-default.png?t=N7T8https://textvqa.org/

28k+含有文字的图像(可用OCR识别)和45k+问题,图像来自Open Images,问题包含问句和Ground Truth答案(453k+)

数据示例:

{"question": "what type of plane is this?", 
"image_id": "073f668cdc671c37", 
"image_classes": ["Tree", "Vehicle", "Airplane", "Aircraft"], 
"flickr_original_url":"https://farm8.staticflickr.com/8292/7496758474_eea4bc6745_o.jpg, 
"flickr_300k_url": "https://c3.staticflickr.com/9/8292/7496758474_ef1827aaff_z.jpg", 
"image_width": 1024, 
"image_height": 683, 
"answers": ["south african", "hidehi matsui", "south african", "south african", "south african", "south african", "south african ", "south africa", "south african", "707"], 
"question_tokens": ["what", "type", "of", "plane", "is", "this"], 
"question_id": 33, "set_name": "train"},

-MMBench (2023)       2307.06281 (arxiv.org)

MMBench: Is Your Multi-modal Model an All-around Player?

问题为单项选择

多维度评测模型的理解能力,参考多模态模型评测神器 | OpenCompass MMBench 了解一下!-CSDN博客

2024/8/16        23:18

夏令营排行榜(8.17截止)前五有证书

挑战赛排行榜(8.23截止)前十进决赛

1、官方baseline

【镜像】:独立软件包,包括【环境】和基于环境开发的软件

——>选择镜像创建实例

——>激活conda环境dj for data juicer

——>git clone赛事相关包

——>安装必要工具pip apt

----------------------------------------数据集:download.sh-->fulldownload.sh---------------------------------

要下载到数据盘而不是系统盘,不然空间不够

【basemodel】model_zoo/LLM/gemma

【seed:都是图片】input/pretrain_stage_1_10k

——>input/pretrain_stage_1

全量数据是400k,线上赛只能使用至多200k

# for training data
echo "[2] Downloading seed datasets..."
mkdir -p ${SCRIPT_DIR}/input
cd ${SCRIPT_DIR}/input
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/pretrain_stage_1.tar.gz
tar zxvf pretrain_stage_1.tar.gz && rm -rf pretrain_stage_1.tar.gz
cd pretrain_stage_1
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/mgm_pretrain_stage_1.jsonl
axel -n 5 http://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/dj-competition/better_synth/data/stage_1/stage_1.json

【finetune】input/finetuning_stage_1_12k

【evaluation】toolkit/training/data

-----------------------------------------模型:download_blip.py------------------------------------------------------

from modelscope import snapshot_download

model_dir = snapshot_download('goldsj/blip2-opt-2.7b', 
                              cache_dir='/root/autodl-tmp/better_synth_baseline_autoDL/models', 
                              revision='master')

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (arxiv.org)2022年

============================关机,保存镜像。==============================

要等一会儿且保存的时候不可以开机,并且创建新实例使用保存的镜像要重新下载数据(选择多卡增加后续合成数据的速度but增加开机困难,无卡开机下载数据)

激活环境conda dj

进入路径autodl-tmp/better_synth_baseline_autoDL

-----------------------------------数据合成 Data-Juicer: .yaml——>-------------------------------------------------

dj-process --config ./image_split_10.yaml——>

dj-process --config ./image_captioning_10.yaml——>

-----------------------------------多模态大模型训练策略(MGM training)测评----------------------------------

bash train_mgm_2b_stage_one_card.sh

-----------------------------------------------------用正确的格式提交----------------------------------------------------

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值