Qwen-72B模型的安装与使用教程

最新推荐文章于 2025-03-19 10:09:33 发布

宫韧季

最新推荐文章于 2025-03-19 10:09:33 发布

阅读量1.3k

点赞数 13

本文链接：https://blog.csdn.net/gitblog_02821/article/details/144421359

版权

Qwen-72B模型的安装与使用教程

Qwen-72B 项目地址: https://gitcode.com/hf_mirrors/ai-gitcode/Qwen-72B

介绍

通义千问-72B（Qwen-72B）是阿里云研发的通义千问大模型系列的720亿参数规模的模型。Qwen-72B是基于Transformer的大语言模型, 在超大规模的预训练数据上进行训练得到。预训练数据类型多样，覆盖广泛，包括大量网络文本、专业书籍、代码等。同时，在Qwen-72B的基础上，我们使用对齐机制打造了基于大语言模型的AI助手Qwen-72B-Chat。本仓库为Qwen-72B的仓库。

通义千问-72B（Qwen-72B）主要有以下特点：

大规模高质量训练语料：使用超过3万亿tokens的数据进行预训练，包含高质量中、英、多语言、代码、数学等数据，涵盖通用及专业领域的训练语料。通过大量对比实验对预训练语料分布进行了优化。
强大的性能：Qwen-72B在多个中英文下游评测任务上（涵盖常识推理、代码、数学、翻译等），效果显著超越现有的开源模型。具体评测结果请详见下文。
覆盖更全面的词表：相比目前以中英词表为主的开源模型，Qwen-72B使用了约15万大小的词表。该词表对多语言更加友好，方便用户在不扩展词表的情况下对部分语种进行能力增强和扩展。
较长的上下文支持：Qwen-72B支持32k的上下文长度。

如果您想了解更多关于通义千问72B开源模型的细节，我们建议您参阅GitHub代码库。

要求

python 3.8及以上版本
pytorch 1.12及以上版本，推荐2.0及以上版本
建议使用CUDA 11.4及以上（GPU用户、flash-attention用户等需考虑此选项）
运行BF16或FP16模型需要多卡至少144GB显存（例如2xA100-80G或5xV100-32G）；运行Int4模型至少需要48GB显存（例如1xA100-80G或2xV100-32G）。

依赖项

运行Qwen-72B，请确保满足上述要求，再执行以下pip命令安装依赖库

pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed

另外，推荐安装flash-attention库（当前已支持flash attention 2），以实现更高的效率和更低的显存占用。

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# 下方安装可选，安装可能比较缓慢。
# Below are optional. Installing them might be slow.
# pip install csrc/layer_norm
# 如果你的flash-attn版本高于2.1.1，下方不需要安装。
# If the version of flash-attn is higher than 2.1.1, the following is not needed.
# pip install csrc/rotary

快速使用

您可以通过以下代码轻松调用：

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-72B", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B", device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
# model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-72B", trust_remote_code=True)

inputs = tokenizer('蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# 蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是亚的斯亚贝巴（Addis Ababa）...

关于更多的使用说明，请参考我们的GitHub repo获取更多信息。

Tokenizer

注：作为术语的“tokenization”在中文中尚无共识的概念对应，本文档采用英文表达以利说明。

基于tiktoken的分词器有别于其他分词器，比如sentencepiece分词器。尤其在微调阶段，需要特别注意特殊token的使用。关于tokenizer的更多信息，以及微调时涉及的相关使用，请参阅文档。

模型细节

Qwen-72B模型规模基本情况如下所示：

| Hyperparameter | Value | |:----------------|:-------| | n_layers | 80 | | n_heads | 64 | | d_model | 8192 | | vocab size | 151851 | | sequence length | 32768 |

在位置编码、FFN激活函数和normalization的实现方式上，我们也采用了目前最流行的做法，即RoPE相对位置

Qwen-72B 项目地址: https://gitcode.com/hf_mirrors/ai-gitcode/Qwen-72B

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考