直追 GPT-4 的国产开源大模型 Qwen1.5 详细介绍

目录

         🏆 LMSYS Chatbot Arena Leaderboard

Quickstart

🤗 Hugging Face Transformers

🤖 ModelScope

💻 Run locally

llama.cpp

Ollama

LMStudio

Web UI

Text generation web UI

llamafile

Deployment

vLLM

SGLang

Finetuning

API

License Agreement

Qwen1.5 介绍 | Qwen 

简介

模型效果

基础能力

人类偏好对齐

多语言能力

长序列

链接外部系统

使用Qwen1.5开发

小结

Authors


Qwen1.5,作为一个AI模型,其底层架构主要基于Transformer模型,这是由Google在2017年提出的深度学习模型,主要用于自然语言处理任务。Transformer的核心是自注意力(Self-Attention)机制和位置编码(Positional Encoding),这两个组件使得模型能够并行处理输入序列,解决了RNN(循环神经网络)等模型在处理长序列时的效率问题。

具体来说,Qwen1.5的架构可能包括以下几个部分:

  1. 嵌入层(Embedding Layer):将输入的文本转化为向量表示。

  2. 多头自注意力机制(Multi-Head Self-Attention):这是Transformer的主要创新点,它允许模型在不同位置的词之间建立联系,理解上下文信息。

  3. 前馈神经网络(Feed-Forward Network):对自注意力机制处理后的向量进行进一步的非线性变换。

  4. 残差连接(Residual Connection)层归一化(Layer Normalization):这两部分用于缓解深度网络训练中的梯度消失和爆炸问题。

  5. 位置编码(Positional Encoding):由于Transformer没有像RNN那样内在的时间顺序处理,所以需要位置编码来引入位置信息。

  6. 编码器-解码器结构(Encoder-Decoder Structure):在机器翻译等任务中,编码器负责理解输入序列,解码器负责生成输出序列。

  7. 优化和损失函数:通常使用Adam优化器和交叉熵损失函数进行模型训练。

 

🏆 LMSYS Chatbot Arena Leaderboard

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

Github: GitHub - QwenLM/Qwen1.5: Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.

hugging face: https://huggingface.co/Qwen

在线 Demo: https://huggingface.co/spaces/Qwen/Qwen1.5-72B-Chat

 

Quickstart

🤗 Hugging Face Transformers

Here we show a code snippet to show you how to use the chat model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-72B-Chat",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-72B-Chat")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

For quantized models, we advise you to use the GPTQ and AWQ correspondents, namely Qwen1.5-7B-Chat-GPTQ-Int8Qwen1.5-7B-Chat-AWQ.

🤖 ModelScope

We strongly advise users especially those in mainland China to use ModelScope. snapshot_download can help you solve issues concerning downloading checkpoints.

💻 Run locally

llama.cpp

Download our provided GGUF files or create them by yourself, and you can directly use them with the latest llama.cpp with a one-line command:

./main -m <path-to-file> -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
Ollama

We are now on Ollama, and you can use pull and run to make things work.

ollama run qwen

You can also add things like ::14B to choose different models. Visit ollama.ai for more information.

LMStudio

Qwen1.5 has already been supported by lmstudio.ai. You can directly use LMStudio with our GGUF files.

Web UI

Text generation web UI

You can directly use text-generation-webui for creating a web UI demo. If you use GGUF, remember to install the latest wheel of llama.cpp with the support of Qwen1.5.

llamafile

Clone llamafile, run source install, and then create your own llamafile with the GGUF file following the guide here. You are able to run one line of command, say ./qwen.llamafile, to create a demo.

Deployment

Now, Qwen1.5 is supported by multiple inference frameworks. Here we demonstrate the usage of vLLM and SGLang.

Note

Neither vLLM nor SGLang currently offer built-in support for function calling. If you require tool use capabilities, please refer to Qwen-Agent, which provides a wrapper around these APIs to support function calling.

vLLM

We advise you to use vLLM>=0.3.0 to build OpenAI-compatible API service. Start the server with a chat model, e.g. Qwen1.5-7B-Chat:

python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-7B-Chat

Then use the chat API as demonstrated below:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen/Qwen1.5-7B-Chat",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me something about large language models."}
    ]
    }'
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localho
  • 25
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

禅与计算机程序设计艺术

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值