接上一篇使用langgraph搭建agent。有一个问题是能不能用本地的llm来接入langgraph,这里先介绍一下把模型部署为OPENAI的API进行调用。
Vllm
部署
# CUDA_DEVICES_VISIBLE=0
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --model /mnt/nvme2/xuedongge/LLM/llama-3.1-8B-Instruct --trust-remote-code --gpu-memory-utilization 0.90
调用
inference_server_url = "http://localhost:8000/v1"
#其中model就是上面shell命令里面那个模型路径
model = ChatOpenAI(
model="/mnt/nvme2/xuedongge/LLM/llama-3.1-8B-Instruct",
openai_api_key="none",
openai_api_base=inference_server_url,
max_tokens=500,
temperature=0,
)
re = model("I'm the best!")
FastChat
部署
python3 -m fastchat.serve.controller
# CUDA_DEVICES_VISIBLE=0
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo" --model-path /mnt/nvme2/xuedongge/LLM/llama-3.1-8B-Instruct
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
调用
inference_server_url = "http://localhost:8000/v1"
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="gpt-3.5-turbo",
openai_api_key="none",
openai_api_base=inference_server_url,
max_tokens=500,
temperature=0,
)
print(model)
re = model.invoke("who are you?")
目前试了fastchat方式部署的OPENAI的API服务能接入到langgraph框架中
Ollama
部署
ollama serve
ollama run llama3.1
调用
#注意这里端口号改了
inference_server_url = "http://localhost:11434/v1"
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="gpt5",
openai_api_key="none",
openai_api_base=inference_server_url,
max_tokens=500,
temperature=0,
)
# print(model)
re = model.invoke("who are you?")
参考资料
https://www.bilibili.com/video/BV1GF8ye2E1E/?spm_id_from=333.337.search-card.all.click&vd_source=aed07ada24cdf555789faf75cbc507a4
https://github.com/lm-sys/FastChat/blob/main/docs/langchain_integration.md
https://github.com/ollama/ollama?tab=readme-ov-file