2.1.2 以命令行形式连接API服务器
2.1.3 以Gradio网页形式连接API服务器
2.2 LMDeploy Lite——内存下降16G
2.2.2 设置在线 kv cache int4/int8 量化
2.2.3 W4A16 模型量化和部署
报错
ValueError: The repository for ptb_text_only contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ptb_text_only.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
pip install datasets==2.16.1
模型大小对比 1.5G V.S. 3.6G
运行占用33G
3.1.2 W4A16 量化+ KV cache+KV cache 量化
全都要之后占用为18G,比量化前少2G
API调用
4 LMDeploy之FastAPI与Function call
4.2 Function call