参考:
https://huggingface.co/THUDM/glm-4-9b-chat
直接运行vllm后端服务:
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# GLM-4-9B-Chat-1M
# max_model_len, tp_size
参考:
https://huggingface.co/THUDM/glm-4-9b-chat
直接运行vllm后端服务:
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# GLM-4-9B-Chat-1M
# max_model_len, tp_size