LLaVa安装记录

模型结构

LlavaLlamaForCausalLM(                                                                                                     
  (model): LlavaLlamaModel(                                                                                                                                             
    (embed_tokens): Embedding(32000, 5120, padding_idx=0)                                                                                                               
    (layers): ModuleList(                                                                                                                                               
      (0-39): 40 x LlamaDecoderLayer(                                                                                                                                   
        (self_attn): LlamaAttention(                                                                                                                                    
          (q_proj): Linear(in_features=5120, out_features=5120, bias=False)                                                                                             
          (k_proj): Linear(in_features=5120, out_features=5120, bias=False)                                                                                             
          (v_proj): Linear(in_features=5120, out_features=5120, bias=False)                                                                                             
          (o_proj): Linear(in_features=5120, out_features=5120, bias=False)                                                                                             
          (rotary_emb): LlamaRotaryEmbedding()                                                                                                                          
        )                                                                           
        (mlp): LlamaMLP(                                                            
          (gate_proj): Linear(in_features=5120, out_features=13824, bias=False)                                                                                         
          (up_proj): Linear(in_features=5120, out_features=13824, bias=False)                                                                                           
          (down_proj): Linear(in_features=13824, out_features=5120, bias=False)                                                                                         
          (act_fn): SiLUActivation()                                                
        )                                                                           
        (input_layernorm): LlamaRMSNorm()                                           
        (post_attention_layernorm): LlamaRMSNorm()                                                                                                                      
      )                                                                             
    )                                                                               
    (norm): LlamaRMSNorm()                                                          
    (vision_tower): CLIPVisionTower()                                               
    (mm_projector): Linear(in_features=1024, out_features=5120, bias=True)                                                                                              
  )                                                                                 
  (lm_head): Linear(in_features=5120, out_features=32000, bias=False)                                                                                                   
)                         

配置环境

装conda

wget https://repo.anaconda.com/archive/Anaconda3-5.3.0-Linux-x86_64.sh
chmod +x Anaconda3-5.3.0-Linux-x86_64.sh
./Anaconda3-5.3.0-Linux-x86_64.sh
export PATH=~/anaconda3/bin:$PATH # 或者写到环境保护变量
# 不会弄看这吧 https://blog.csdn.net/wyf2017/article/details/118676765
  1. Clone this repository and navigate to LLaVA folder
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install ninja
pip install flash-attn==1.0.8 --no-build-isolation

配置拉模型的工具

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
apt-get install git-lfs
git lfs install

下载llama模型

模型选择:

Base LLMVision EncoderPretrain DataPretraining scheduleFinetuning DataFinetuning scheduleLLaVA-Bench-ConvLLaVA-Bench-DetailLLaVA-Bench-ComplexLLaVA-Bench-OverallDownload
Vicuna-13B-v1.3CLIP-L-336pxLCS-558K1eLLaVA-Instruct-80Kproj-1e, lora-1e64.355.981.770.1LoRA LoRA-Merged
LLaMA-2-13B-ChatCLIP-LLCS-558K1eLLaVA-Instruct-80Kfull_ft-1e56.758.680.067.9ckpt
LLaMA-2-7B-ChatCLIP-LLCS-558K1eLLaVA-Instruct-80Klora-1e51.258.971.662.8LoRA

下载LLaMA-2-13B-Chat示例:

git lfs clone https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview

下载Projector权重

When using these projector weights to instruction tune your LMM, please make sure that these options are correctly set as follows,

--mm_use_im_start_end False
--mm_use_im_patch_token False
Base LLMVision EncoderPretrain DataPretraining scheduleDownload
LLaMA-2-13B-ChatCLIP-L-336pxLCS-558K1eprojector
LLaMA-2-7B-ChatCLIP-L-336pxLCS-558K1eprojector
LLaMA-2-13B-ChatCLIP-LLCS-558K1eprojector
LLaMA-2-7B-ChatCLIP-LLCS-558K1eprojector
Vicuna-13B-v1.3CLIP-L-336pxLCS-558K1eprojector
Vicuna-7B-v1.3CLIP-L-336pxLCS-558K1eprojector
Vicuna-13B-v1.3CLIP-LLCS-558K1eprojector
Vicuna-7B-v1.3CLIP-LLCS-558K1eprojector
git lfs clone https://huggingface.co/liuhaotian/llava-pretrain-llama-2-13b-chat

下载完的结构

./llava-llama-2-13b-chat-lightning-preview
	├── config.json
	├── generation_config.json
	├── LICENSE
	├── mm_projector.bin
	├── pytorch_model-00001-of-00003.bin
	├── pytorch_model-00002-of-00003.bin
	├── pytorch_model-00003-of-00003.bin
	├── pytorch_model.bin.index.json
	├── README.md
	├── special_tokens_map.json
	├── tokenizer_config.json
	└── tokenizer.model
./llava-pretrain-llama-2-13b-chat
	├── config.json
	├── mm_projector.bin
	└── README.md

测试demo

# 第一个终端运行
python3 -m llava.serve.controller --host 0.0.0.0 --port 10000
# 第二个终端运行
python3 -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share
# 第三个终端运行
python3 -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path 你的路径/llava-llama-2-13b-chat-lightning-preview

测试结果

在这里插入图片描述
选择刚才加载的模型后,随便问就行了。

在这里插入图片描述

训练

预训练就不跑了,直接拿他训好的权重finetune就是了,所以跑下finetune
下载标注文件:

git lfs clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K

下载训练数据:

wget http://images.cocodataset.org/zips/train2017.zip

训练数据格式:
外层是list,里面其中一条长这样

  {
    "id": "000000379143",
    "image": "000000379143.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "What is the man doing in the back of the pickup truck?\n<image>"
      },
      {
        "from": "gpt",
        "value": "The man is sitting on top of piled objects or belongings loaded into the back of a pickup truck."
      },
      {
        "from": "human",
        "value": "Is the man holding anything in his hands?"
      },
      {
        "from": "gpt",
        "value": "Yes, the man is holding a beer in his hand while sitting on top of the objects in the back of the pickup truck."
      },
      {
        "from": "human",
        "value": "What color is the pickup truck?"
      },
      {
        "from": "gpt",
        "value": "The pickup truck is white."
      },
      {
        "from": "human",
        "value": "Is the man sitting or standing?"
      },
      {
        "from": "gpt",
        "value": "The man is sitting on top of the piled objects in the back of the pickup truck."
      },
      {
        "from": "human",
        "value": "What could be the possible reasons for the man sitting on top of the possessions in the back of the pickup truck?"
      },
      {
        "from": "gpt",
        "value": "There could be several reasons for the man sitting on top of his possessions in the back of the pickup truck:\n\n1. Moving: The man might be moving to a new location and needed to transport his items in a pickup truck, utilizing available space efficiently. By sitting on top of the belongings, he could be helping to stabilize and secure the items during the move, preventing them from falling or shifting during transportation.\n\n2. Lack of seating: If the cab of the pickup truck is already at full capacity or there isn't enough space for him to sit inside, the man may have chosen to sit on his possessions as an alternative seating arrangement.\n\n3. Road trip or outing: The man might be on a road trip or a casual outing with friends or family, where he is using the back of the pickup truck as an open-air seating area. By sitting on top of the loaded items, he may be enjoying the journey while savoring his beer.\n\n4. Keeping an eye on belongings: The man could be safeguarding his possessions by staying close to them, ensuring that no items are lost, stolen or damaged during the journey.\n\nRegardless of the specific reason, the image shows a person making the most of their situation, adding a touch of lightheartedness or adventure to an otherwise mundane scene."
      }
    ]
  },
  • 5
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 13
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

只会git clone的程序员

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值