LLAVA数据集下载

LLAVA数据集下载

1. Data

Data file nameSize
llava_instruct_150k.json229 MB
llava_instruct_80k.json229 MB
conversation_58k.json126 MB
detail_23k.json20.5 MB
complex_reasoning_77k.json79.6 MB

1.1 Pretraining Dataset

The pretraining dataset used in this release is a subset of CC-3M dataset, filtered with a more balanced concept coverage distribution. Please see here for a detailed description of the dataset structure and how to download the images.

If you already have CC-3M dataset on your disk, the image names follow this format: GCC_train_000000000.jpg. You may edit the image field correspondingly if necessary.

DataChat FileMeta DataSize
CC-3M Concept-balanced 595Kchat.jsonmetadata.json211 MB
LAION/CC/SBU BLIP-Caption Concept-balanced 558Kblip_laion_cc_sbu_558k.jsonmetadata.json181 MB

Important notice: Upon the request from the community, as ~15% images of the original CC-3M dataset are no longer accessible, we upload images.zip for better reproducing our work in research community. It must not be used for any other purposes. The use of these images must comply with the CC-3M license. This may be taken down at any time when requested by the original CC-3M dataset owner or owners of the referenced images.

1.2 GPT-4 Prompts

We provide our prompts and few-shot samples for GPT-4 queries, to better facilitate research in this domain. Please check out the prompts folder for three kinds of questions: conversation, detail description, and complex reasoning.

They are organized in a format of system_message.txt for system message, pairs of abc_caps.txt for few-shot sample user input, and abc_conv.txt for few-shot sample reference output.

Note that you may find them in different format. For example, conversation is in jsonl, and detail description is answer-only. The selected format in our preliminary experiments works slightly better than a limited set of alternatives that we tried: jsonl, more natural format, answer-only. If interested, you may try other variants or conduct more careful study in this. Contributions are welcomed!

2. Visual Instruction Tuning

---------2.1 指令调整数据(instruction tuning data)---------:

LLaVA-Instruct-150K

官方 llava_v1_5_mix665k.json

---------2.2 图像(images)---------

COCO

官方train2017

GQA

官方images

OCR-VAQ

官方download script
多线程下载(速度更快)Github解决方案 以及 CSDN解决方案
处理好的数据集下载(方便快捷)Huggingface

TextVQA

官方train_val_images

VisualGenome

官方part1, part2

playground
	├──data
	│	├── coco
	│	│   └── train2017
	│	├── gqa
	│	│   └── images
	│	├── ocr_vqa
	│	│   └── images
	│	├── textvqa
	│	│   └── train_images
	│	└── vg
	│	    ├── VG_100K
	│	    └── VG_100K_2
	└── ...   

3. Pretrained Model

---------3.1 语言大模型---------
vicuna-13b-v1.5
vicuna-7b-v1.5
---------3.2 视觉大模型---------
clip-vit-large-patch14-336
---------3.3 LLAVA-1.5预训练模型---------
LLAVA-1.5-13b
LLAVA-1.5-7b
---------3.4 LLAVA-lora微调训练的模型---------
LLAVA-1.5–13b-lora
LLAVA-1.5–7b-lora

  • 20
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

SoaringPigeon

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值