直追 GPT-4 的国产开源大模型 Qwen1.5 详细介绍-CSDN博客

本文链接：https://blog.csdn.net/universsky2015/article/details/136250374

Qwen1.5是阿里云团队开发的大型语言模型，基于Transformer架构，具备基础能力、人类偏好对齐、多语言处理和长序列支持等功能。模型在多个基准测试中表现出色，与GPT-4等顶尖模型竞争。Qwen1.5已在Hugging Face、ModelScope和LMStudio等平台开源，并提供了多种部署和调用方式，如vLLM和SGLang。此外，模型支持多语言和链接外部系统，适用于聊天、代码生成等任务。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

🏆 LMSYS Chatbot Arena Leaderboard

Quickstart

🤗 Hugging Face Transformers

Text generation web UI

Qwen1.5，作为一个AI模型，其底层架构主要基于Transformer模型，这是由Google在2017年提出的深度学习模型，主要用于自然语言处理任务。Transformer的核心是自注意力（Self-Attention）机制和位置编码（Positional Encoding），这两个组件使得模型能够并行处理输入序列，解决了RNN（循环神经网络）等模型在处理长序列时的效率问题。

具体来说，Qwen1.5的架构可能包括以下几个部分：

嵌入层（Embedding Layer）：将输入的文本转化为向量表示。
多头自注意力机制（Multi-Head Self-Attention）：这是Transformer的主要创新点，它允许模型在不同位置的词之间建立联系，理解上下文信息。
前馈神经网络（Feed-Forward Network）：对自注意力机制处理后的向量进行进一步的非线性变换。
残差连接（Residual Connection）和层归一化（Layer Normalization）：这两部分用于缓解深度网络训练中的梯度消失和爆炸问题。
位置编码（Positional Encoding）：由于Transformer没有像RNN那样内在的时间顺序处理，所以需要位置编码来引入位置信息。
编码器-解码器结构（Encoder-Decoder Structure）：在机器翻译等任务中，编码器负责理解输入序列，解码器负责生成输出序列。
优化和损失函数：通常使用Adam优化器和交叉熵损失函数进行模型训练。

🏆 LMSYS Chatbot Arena Leaderboard

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

Github： GitHub - QwenLM/Qwen1.5: Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.

hugging face: https://huggingface.co/Qwen

在线 Demo： https://huggingface.co/spaces/Qwen/Qwen1.5-72B-Chat

Quickstart

🤗 Hugging Face Transformers

Here we show a code snippet to show you how to use the chat model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-72B-Chat",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-72B-Chat")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

For quantized models, we advise you to use the GPTQ and AWQ correspondents, namely Qwen1.5-7B-Chat-GPTQ-Int8, Qwen1.5-7B-Chat-AWQ.

🤖 ModelScope

We strongly advise users especially those in mainland China to use ModelScope. snapshot_download can help you solve issues concerning downloading checkpoints.

💻 Run locally

llama.cpp

Download our provided GGUF files or create them by yourself, and you can directly use them with the latest llama.cpp with a one-line command:

./main -m <path-to-file> -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

Ollama

We are now on Ollama, and you can use pull and run to make things work.

ollama run qwen

You can also add things like ::14B to choose different models. Visit ollama.ai for more information.

LMStudio

Qwen1.5 has already been supported by lmstudio.ai. You can directly use LMStudio with our GGUF files.

Web UI

Text generation web UI

You can directly use text-generation-webui for creating a web UI demo. If you use GGUF, remember to install the latest wheel of llama.cpp with the support of Qwen1.5.

llamafile

Clone llamafile, run source install, and then create your own llamafile with the GGUF file following the guide here. You are able to run one line of command, say ./qwen.llamafile, to create a demo.

Deployment

Now, Qwen1.5 is supported by multiple inference frameworks. Here we demonstrate the usage of vLLM and SGLang.

Note

Neither vLLM nor SGLang currently offer built-in support for function calling. If you require tool use capabilities, please refer to Qwen-Agent, which provides a wrapper around these APIs to support function calling.

vLLM

We advise you to use vLLM>=0.3.0 to build OpenAI-compatible API service. Start the server with a chat model, e.g. Qwen1.5-7B-Chat:

python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-7B-Chat

Then use the chat API as demonstrated below:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen/Qwen1.5-7B-Chat",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me something about large language models."}
    ]
    }'

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localho