Multimodal-CoT 项目教程

最新推荐文章于 2024-08-15 09:30:27 发布

段琳惟

最新推荐文章于 2024-08-15 09:30:27 发布

阅读量393

点赞数 3

本文链接：https://blog.csdn.net/gitblog_00206/article/details/141154015

版权

Multimodal-CoT 项目教程

mm-cotOfficial implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)项目地址:https://gitcode.com/gh_mirrors/mm/mm-cot

项目介绍

Multimodal-CoT 是一个由亚马逊科学团队开发的大型语言模型，专注于结合视觉和语言特征进行复杂推理任务。该模型通过思维链（CoT）提示技术，在多模态环境中表现出色。项目的主要创新在于通过融合视觉和语言特征来微调小型语言模型，以执行 CoT 推理，从而减少模型产生幻觉推理模式的倾向。

项目快速启动

环境设置

首先，克隆项目仓库到本地：

git clone https://github.com/amazon-science/mm-cot.git
cd mm-cot

安装依赖

安装所需的Python包：

pip install -r requirements.txt

运行示例

以下是运行基本推理任务的示例代码：

# 基本原理生成
CUDA_VISIBLE_DEVICES=0 1 2 3 python main.py \
  --data_root data/ScienceQA/data \
  --caption_file data/instruct_captions.json \
  --model declare-lab/flan-alpaca-large \
  --user_msg rationale --img_type vit \
  --bs 2 --eval_bs 4 --epoch 50 --lr 5e-5 --output_len 512 \
  --use_caption --use_generate --prompt_format QCM-E \
  --output_dir experiments --evaluate_dir models/mm-cot-large-rationale

# 答案推理
CUDA_VISIBLE_DEVICES=0 1 2 3 python main_central.py \
  --data_root data/ScienceQA/data \
  --caption_file data/instruct_captions.json \
  --model declare-lab/flan-alpaca-large \
  --user_msg answer --img_type vit \
  --bs 4 --eval_bs 8 --epoch 50 --lr 5e-5 --output_len 64