记录一下部署Vicuna api时遇到的问题

最新推荐文章于 2023-12-25 10:22:27 发布

weixin_51762856

最新推荐文章于 2023-12-25 10:22:27 发布

阅读量345

点赞数 2

文章标签：深度学习 python 自然语言处理 nlp

本文链接：https://blog.csdn.net/weixin_51762856/article/details/134155800

版权

部署vicuna 13b api的时候出现问题

按照官方openai_api.md
执行 python3 -m fastchat.serve.controller
出现以下问题

2023-11-01 10:21:06 | INFO | controller | args: Namespace(dispatch_method='shortest_queue', host='localhost', port=21001, ssl=False)
2023-11-01 10:21:06 | ERROR | stderr | INFO:     Started server process [1131]
2023-11-01 10:21:06 | ERROR | stderr | INFO:     Waiting for application startup.
2023-11-01 10:21:06 | ERROR | stderr | INFO:     Application startup complete.
2023-11-01 10:21:06 | ERROR | stderr | ERROR:    [Errno 99] error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address
2023-11-01 10:21:06 | ERROR | stderr | INFO:     Waiting for application shutdown.
2023-11-01 10:21:06 | ERROR | stderr | INFO:     Application shutdown complete.

加个参数--host 0.0.0.0即可
python3 -m fastchat.serve.controller --host 0.0.0.0
运行结果

2023-11-01 10:22:06 | INFO | controller | args: Namespace(dispatch_method='shortest_queue', host='0.0.0.0', port=21001, ssl=False)
2023-11-01 10:22:06 | ERROR | stderr | INFO:     Started server process [1163]
2023-11-01 10:22:06 | ERROR | stderr | INFO:     Waiting for application startup.
2023-11-01 10:22:06 | ERROR | stderr | INFO:     Application startup complete.
2023-11-01 10:22:06 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:21001 (Press CTRL+C to quit)
2023-11-01 10:29:23 | INFO | controller | Register a new worker: http://localhost:21002
2023-11-01 10:29:23 | INFO | controller | Register done: http://localhost:21002, {'model_names': ['Vicuna-13b-v1.5'], 'speed': 1, 'queue_length': 0}
2023-11-01 10:29:23 | INFO | stdout | INFO:     127.0.0.1:34186 - "POST /register_worker HTTP/1.1" 200 OK
2023-11-01 10:30:08 | INFO | controller | Receive heart beat. http://localhost:21002
2023-11-01 10:30:08 | INFO | stdout | INFO:     127.0.0.1:34212 - "POST /receive_heart_beat HTTP/1.1" 200 OK

然后运行（我是在4090上运行的，load 8 bit的话24g显存才够用）
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.3 --host 0.0.0.0 --load-8bit
运行成功

2023-11-01 10:28:40 | INFO | model_worker | args: Namespace(awq_ckpt=None, awq_groupsize=-1, awq_wbits=16, controller_address='http://localhost:21001', conv_template=None, cpu_offloading=False, device='cuda', dtype=None, embed_in_truncate=False, gptq_act_order=False, gptq_ckpt=None, gptq_groupsize=-1, gptq_wbits=16, gpus=None, host='0.0.0.0', limit_worker_concurrency=5, load_8bit=True, max_gpu_memory=None, model_names=None, model_path='../Vicuna-13b-v1.5/', no_register=False, num_gpus=1, port=21002, revision='main', seed=None, stream_interval=2, worker_address='http://localhost:21002')
2023-11-01 10:28:40 | INFO | model_worker | Loading the model ['Vicuna-13b-v1.5'] on worker 51f7ff39 ...
Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.
  0%|                                                                                                                                                                                                   | 0/3 [00:00<?, ?it/s]
 33%|██████████████████████████████████████████████████████████████▎                                                                                                                            | 1/3 [00:08<00:17,  8.71s/it]
 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                              | 2/3 [00:22<00:11, 11.76s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 15.34s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 14.07s/it]
2023-11-01 10:29:23 | ERROR | stderr | 
2023-11-01 10:29:23 | INFO | model_worker | Register to controller
2023-11-01 10:29:23 | ERROR | stderr | INFO:     Started server process [1472]
2023-11-01 10:29:23 | ERROR | stderr | INFO:     Waiting for application startup.
2023-11-01 10:29:23 | ERROR | stderr | INFO:     Application startup complete.
2023-11-01 10:29:23 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:21002 (Press CTRL+C to quit)

最后运行
python3 -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000

INFO:     Started server process [2099]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)