部署Llama2的方法（Linux）

快乐啊啊啊啊啊

已于 2023-08-10 16:56:10 修改

阅读量3k

点赞数 2

文章标签： linux windows 运维

于 2023-08-10 16:34:42 首次发布

本文链接：https://blog.csdn.net/weixin_50321412/article/details/132208861

版权

Llama2,一款开源大语言模型。Github仓库地址：

facebookresearch/llama: Inference code for LLaMA models (github.com)zzhttps://github.com/facebookresearch/llama

中文地址：

GitHub - FlagAlpha/Llama2-Chinese: Llama中文社区，最好的中文Llama大模型，完全开源可商用Llama中文社区，最好的中文Llama大模型，完全开源可商用. Contribute to FlagAlpha/Llama2-Chinese development by creating an account on GitHub.https://github.com/FlagAlpha/Llama2-Chinese接下来将分享在Linux系统中部署这款模型的方法。一开始尝试了Windows，但Llama2在Windows系统中无法使用GPU运行，如果想使用GPU，可以考虑另一款Llama2开源模型LLC-LLM：

https://mlc.ai/mlc-llm/docs/get_started/try_out.htmlhttps://mlc.ai/mlc-llm/docs/get_started/try_out.html

Linux系统中的部署方法

我实在autodl的算力平台上部署的，在部署前查阅文档，发现13B和70B对算力要求过高，于是选择7B进行尝试。

1. 克隆github仓库

git clone https://github.com/facebookresearch/llama.git

2. 进入Llama文件夹

cd llama

3. 配置依赖

pip install -e .

4. demo代码

但是我发现单单运行官方给出的demo会遇到HTTPError,大意是说你没有限权访问meta-llama/Llama-2-7b-chat-hf，因为这个模型是现场从huggingface(一款开源模型网站）上下下来的。所以，要对代码进行一些小小的修改。（以下为官方demo）https://huggingface.co/

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['<s>Human: 介绍一下中国\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

以下为修改（第3/6行）

model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True,use_auth_token="你的Token")
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False,use_auth_token="你的Token")

我们需要添加一个Token，获取的方式为：进入以下网址，然后注册登录一系列操作以后生成Token，貌似只有write的token才能生效。

Hugging Face – The AI community builg the future.We’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/settings/tokens

如果报错说你没有Transformer包，或者accelerate包，pip即可！

pip install transformers
pip install accelerate

模型下载完成！（需要等一段时间）,PS:Llama-2-7b-chat-hf是主要用于聊天的，还有Llama-2-7b-hf，Llama-2-13b-chat-hf，Llama-2-13b-hf，等等版本，大家可以根据自己的需求下载。

然后运行就可以输出结果了