将本地LLM部署为OPENAI的API进行调用

牛右刀薛面

已于 2024-10-29 15:19:56 修改

阅读量1.4k

点赞数 4

分类专栏： Agent 文章标签： llm langgraph agent vllm fastchat

于 2024-10-29 10:49:30 首次发布

本文链接：https://blog.csdn.net/xdg15294969271/article/details/143321604

版权

Agent 专栏收录该内容

3 篇文章

订阅专栏

接上一篇使用langgraph搭建agent。有一个问题是能不能用本地的llm来接入langgraph，这里先介绍一下把模型部署为OPENAI的API进行调用。

Vllm

部署

# CUDA_DEVICES_VISIBLE=0
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000  --model /mnt/nvme2/xuedongge/LLM/llama-3.1-8B-Instruct --trust-remote-code  --gpu-memory-utilization 0.90

调用

inference_server_url = "http://localhost:8000/v1"
#其中model就是上面shell命令里面那个模型路径
model = ChatOpenAI(
    model="/mnt/nvme2/xuedongge/LLM/llama-3.1-8B-Instruct",
    openai_api_key="none",
    openai_api_base=inference_server_url,
    max_tokens=500,
    temperature=0,
)
re = model("I'm the best!")

FastChat

部署

python3 -m fastchat.serve.controller

# CUDA_DEVICES_VISIBLE=0
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo" --model-path /mnt/nvme2/xuedongge/LLM/llama-3.1-8B-Instruct

python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

调用

inference_server_url = "http://localhost:8000/v1"
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-3.5-turbo",
    openai_api_key="none",
    openai_api_base=inference_server_url,
    max_tokens=500,
    temperature=0,
)
print(model)
re = model.invoke("who are you?")

目前试了fastchat方式部署的OPENAI的API服务能接入到langgraph框架中

Ollama

部署

ollama serve
ollama run llama3.1

在这里插入图片描述

调用

#注意这里端口号改了
inference_server_url = "http://localhost:11434/v1"
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt5",
    openai_api_key="none",
    openai_api_base=inference_server_url,
    max_tokens=500,
    temperature=0,
)
# print(model)
re = model.invoke("who are you?")

在这里插入图片描述

参考资料

https://www.bilibili.com/video/BV1GF8ye2E1E/?spm_id_from=333.337.search-card.all.click&vd_source=aed07ada24cdf555789faf75cbc507a4
https://github.com/lm-sys/FastChat/blob/main/docs/langchain_integration.md
https://github.com/ollama/ollama?tab=readme-ov-file