VisRAG 使用教程-CSDN博客

本文链接：https://blog.csdn.net/gitblog_00035/article/details/147089489

VisRAG 使用教程

VisRAG Parsing-free RAG supported by VLMs 项目地址: https://gitcode.com/gh_mirrors/vis/VisRAG

1. 项目介绍

VisRAG（Vision-based Retrieval-augmented Generation）是一个基于视觉语言模型（VLM）的检索增强生成管道。该管道通过直接将文档以图像的形式嵌入VLM中，然后进行检索，以增强VLM的生成能力。与传统基于文本的RAG相比，VisRAG能够最大程度地保留和利用原始文档中的数据信息，消除了解析过程中引入的信息损失。

2. 项目快速启动

在开始之前，确保已经安装了Python环境，以下是基于VisRAG的快速启动步骤：

# 克隆项目
git clone https://github.com/OpenBMB/VisRAG.git

# 创建并激活虚拟环境
conda create --name VisRAG python==3.10.8
conda activate VisRAG

# 安装依赖
conda install nvidia/label/cuda-11.8.0::cuda-toolkit
cd VisRAG
pip install -r requirements.txt
pip install -e .
cd timm_modified
pip install -e .
cd ..

# 训练检索模型
bash scripts/train_retriever/train.sh 2048 16 8 0.02 1 true false config/deepspeed.json 1e-5 false wmean causal 1 true 2 false <model_dir> <repo_name_or_path>

# 评估检索模型
bash scripts/eval_retriever/eval.sh 512 2048 16 8 wmean causal ArxivQA,ChartQA,MP-DocVQA,InfoVQA,PlotQA,SlideVQA <ckpt_path>

请在上述命令中替换<model_dir>和<repo_name_or_path>以及<ckpt_path>为实际路径。