目前 Qwen、智谱、Moonshot 等 LLM 的 API 都做了和 OpenAI 的兼容,所以按照 OpenAI-Compatible Endpoints
教程来配置即可:
https://docs.litellm.ai/docs/providers/openai_compatible
如果采用 Custom API Server (Custom Format)
的方式,可以调通,但 token 不会写入 usage 记录。
https://docs.litellm.ai/docs/providers/custom_llm_server
也可能是我操作使用有问题,欢迎反馈。
配置模型
方式一:
在 后台管理页面,添加模型时,选择 OpenAI-Compatible...
方式二:编写配置文件
config.yaml 写入如下
api_key 写各个平台申请的key
# general_settings: master_key
model_list:
- model_name: mistralai--Mistral-Nemo-Instruct-2407
litellm_params:
model: huggingface/mistralai/Mistral-Nemo-Instruct-2407
api_key: os.environ/HUGGINGFACE_API_KEY
- model_name: "my-custom-model"
litellm_params:
model: "my-custom-llm/my-model"
- model_name: "glm-4"
litellm_params:
model: "openai/glm-4"
api_key: "6eeeb...abPJyrc8e"
api_base: "https://open.bigmodel.cn/api/paas/v4/"
- model_name: "qwen-plus"
litellm_params:
model: "openai/qwen-plus"
api_key: 'sk-40d1c7...1d4'
api_base: "https://dashscope.aliyuncs.com/compatible-mode/v1"
- model_name: "moonshot-v1-8k"
litellm_params:
model: "openai/moonshot-v1-8k"
api_key: 'sk-d7YcsMzyml...SzPCzQMB5e'
api_base: "https://api.moonshot.cn/v1"
运行服务
litellm --config /Users/.../config.yaml --detailed_debug
openai 调用测试
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
model_name = "glm-4"
model_name = 'qwen-plus'
model_name = 'moonshot-v1-8k'
response = client.chat.completions.create(
model=model_name,
messages = [
{
"role": "user",
"content": "你吃了吗?"
}
],
)
print(response)
返回
ChatCompletion(
id='chatcmpl-66e4e28a9dce9c9fee5e5960',
choices=[ Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content='作为一个人工智能助手,我没有生理需求,所以不需要吃饭。但是,我很高兴为您提供帮助。请问有什么问题我可以帮您解答吗?',
refusal=None, role='assistant',
function_call=None, tool_calls=None)
)],
created=1726276234,
model='moonshot-v1-8k',
object='chat.completion',
service_tier=None,
system_fingerprint=None,
usage=CompletionUsage(
completion_tokens=28,
prompt_tokens=11,
total_tokens=39,
completion_tokens_details=None
)
)
配置 Embedding 模型
官方说明:https://docs.litellm.ai/docs/embedding/supported_embedding
参考:Dashscope Embedding 模型 对 OpenAI 的兼容
https://help.aliyun.com/zh/dashscope/developer-reference/openai-embedding-interface
这里我在 config.yaml 中添加
- model_name: "text-embedding-v1"
litellm_params:
model: "openai/text-embedding-v1"
api_key: 'sk-40d1c7...11d4'
api_base: "https://dashscope.aliyuncs.com/compatible-mode/v1"
- model_name: "zhipu--Embedding-3"
litellm_params:
model: "openai/Embedding-3"
api_key: "6eeeb...yrc8e"
api_base: "https://open.bigmodel.cn/api/paas/v4/"
注:zhipu 官方模型页面,回调地址是 https://open.bigmodel.cn/api/paas/v4/embeddings
,但这里配置时,不需要 embeddings,否则请求会报错,NotFoundError ...'path': '/v4/embeddings/embeddings'
litellm 代码调用
from litellm import embedding
api_base = "http://0.0.0.0:4000/"
# 无论配置模型的 model_name 前面是否有 openai, 这里必须加上 openai/
# model_name = 'openai/zhipu--Embedding-3' # 也可以
model_name = 'openai/text-embedding-v1'
response = embedding(model = model_name, api_base=api_base, input=["good morning from litellm"], api_key='sk-1234' )
返回:
EmbeddingResponse(model='Embedding-3',
data=[
{'embedding': [-0.019012451, 0.001613617, -0.0066719055, -0.0015325546, -0.013648987, 0.009605408, 0.0064048767, 0.02810669, 0.0038928986, 0.021697998, 0.016098022, 0.01928711, -0.0015888214, -0.0029773712, 0.011268616, 0.020355225, 0.011779785, -0.013755798, -0.023406982, 0.034942627, 0.010437012, ..., 0.0014047623, -0.026107788, 0.01939392, 0.011260986, -0.048828125, -0.011276245, 0.01448822, 0.0005726814, -0.011695862, 0.012634277, 0.011489868, -0.021652222, 0.02947998, 0.0013313293, 0.040405273, -0.022705078, 0.04095459, -0.02406311, -0.004421234],
'index': 0, 'object': 'embedding'}
],
object='list',
usage=Usage(
completion_tokens=0, prompt_tokens=9,
total_tokens=9, completion_tokens_details=None
)
)
openai - embeddings
import openai
from openai import OpenAI
llm_base_url = 'http://localhost:4000/'
llm_api_key = 'sk-1234'
chat_model_id = 'zhipu--GLM-4-Flash'
embed_model_name = 'openai/text-embedding-v1' # ali
embed_model_name = 'openai/Embedding-3' # zhipu
client = OpenAI(
base_url = llm_base_url,
api_key = llm_api_key,
)
num_embeddings = 10000
response = client.embeddings.create(
input="Your text goes here",
model=embed_model_name
)
print(response)
embedding_data = response.data[0].embedding
print(len(embedding_data))
llama_index 调用
尝试了
LiteLLM 官方给出的 AzureOpenAI,和 llama_index 推出的 Openai like,效果都不尽如人意。
后来发现 llama_index 推出了 litellm 拓展方案
https://docs.llamaindex.ai/en/stable/examples/llm/litellm/
方法如下:
1、安装
pip install llama-index
pip install llama-index-llms-litellm
pip install llama-index-embeddings-litellm
2、litellm 中的模型配置
...
- model_name: "zhipu--glm-4"
litellm_params:
model: "openai/glm-4"
api_key: "6ee...yrc8e"
api_base: "https://open.bigmodel.cn/api/paas/v4/"
- model_name: "text-embedding-v1"
litellm_params:
model: "openai/text-embedding-v1"
api_key: 'sk-40...11d4'
api_base: "https://dashscope.aliyuncs.com/compatible-mode/v1"
...
之所以列这个,是因为,外部调用时,模型名字 和 此处的设置关系很大。说明并不清晰。
3、调用
import os
from llama_index.llms.litellm import LiteLLM
from llama_index.core.llms import ChatMessage
from llama_index.core import Settings
message = ChatMessage(role="user", content="Hey! how's it going?")
litellm_key = "sk-1234"
litellm_base_url = 'http://localhost:4000/'
# model_name = 'openai/glm-4' # 不行
model_name = 'openai/zhipu--glm-4'
llm = LiteLLM(
model=model_name,
api_key=litellm_key,
api_base=litellm_base_url
)
message = ChatMessage(role="user", content="Hey! how's it going?")
llm.chat([message])
chat_response = llm.chat([message])
# embedding
embed_model_name = 'openai/text-embedding-v1'
embed_model = LiteLLMEmbedding(
model_name=embed_model_name,
api_key=litellm_key,
api_base=litellm_base_url
)
embed_data = embed_model._get_text_embedding('hello')
# 设置全局
Settings.llm = llm
Settings.embed_model = embed_model
LangChain 中使用
参考:https://docs.litellm.ai/docs/proxy/user_keys
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os
os.environ["OPENAI_API_KEY"] = "sk-1234"
# 以下4个模型名字都可以
model_name = 'openai/qwen-plus'
model_name = 'qwen-plus'
model_name = 'openai/GLM-4-Flash'
model_name = "zhipu--GLM-4-Flash"
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",
model = model_name,
temperature=0.1,
extra_body={
"metadata": {
"generation_name": "ishaan-generation-langchain-client",
"generation_id": "langchain-client-gen-id22",
"trace_id": "langchain-client-trace-id22",
"trace_user_id": "langchain-client-user-id2"
}
}
)
messages = [
SystemMessage(
content="你是一个有用的生活小助手"
),
HumanMessage(
content="今晚吃什么"
),
]
response = chat(messages)
print('-- ', response)
2024-09-14(买五送一的周六)