部署vicuna 13b api的时候出现问题
按照官方openai_api.md
执行
python3 -m fastchat.serve.controller
出现以下问题
2023-11-01 10:21:06 | INFO | controller | args: Namespace(dispatch_method='shortest_queue', host='localhost', port=21001, ssl=False)
2023-11-01 10:21:06 | ERROR | stderr | INFO: Started server process [1131]
2023-11-01 10:21:06 | ERROR | stderr | INFO: Waiting for application startup.
2023-11-01 10:21:06 | ERROR | stderr | INFO: Application startup complete.
2023-11-01 10:21:06 | ERROR | stderr | ERROR: [Errno 99] error while attempting to bind on address ('::1', 21001, 0, 0): cannot assign requested address
2023-11-01 10:21:06 | ERROR | stderr | INFO: Waiting for application shutdown.
2023-11-01 10:21:06 | ERROR | stderr | INFO: Application shutdown complete.
加个参数--host 0.0.0.0
即可
python3 -m fastchat.serve.controller --host 0.0.0.0
运行结果
2023-11-01 10:22:06 | INFO | controller | args: Namespace(dispatch_method='shortest_queue', host='0.0.0.0', port=21001, ssl=False)
2023-11-01 10:22:06 | ERROR | stderr | INFO: Started server process [1163]
2023-11-01 10:22:06 | ERROR | stderr | INFO: Waiting for application startup.
2023-11-01 10:22:06 | ERROR | stderr | INFO: Application startup complete.
2023-11-01 10:22:06 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:21001 (Press CTRL+C to quit)
2023-11-01 10:29:23 | INFO | controller | Register a new worker: http://localhost:21002
2023-11-01 10:29:23 | INFO | controller | Register done: http://localhost:21002, {'model_names': ['Vicuna-13b-v1.5'], 'speed': 1, 'queue_length': 0}
2023-11-01 10:29:23 | INFO | stdout | INFO: 127.0.0.1:34186 - "POST /register_worker HTTP/1.1" 200 OK
2023-11-01 10:30:08 | INFO | controller | Receive heart beat. http://localhost:21002
2023-11-01 10:30:08 | INFO | stdout | INFO: 127.0.0.1:34212 - "POST /receive_heart_beat HTTP/1.1" 200 OK
然后运行(我是在4090上运行的,load 8 bit的话24g显存才够用)
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.3 --host 0.0.0.0 --load-8bit
运行成功
2023-11-01 10:28:40 | INFO | model_worker | args: Namespace(awq_ckpt=None, awq_groupsize=-1, awq_wbits=16, controller_address='http://localhost:21001', conv_template=None, cpu_offloading=False, device='cuda', dtype=None, embed_in_truncate=False, gptq_act_order=False, gptq_ckpt=None, gptq_groupsize=-1, gptq_wbits=16, gpus=None, host='0.0.0.0', limit_worker_concurrency=5, load_8bit=True, max_gpu_memory=None, model_names=None, model_path='../Vicuna-13b-v1.5/', no_register=False, num_gpus=1, port=21002, revision='main', seed=None, stream_interval=2, worker_address='http://localhost:21002')
2023-11-01 10:28:40 | INFO | model_worker | Loading the model ['Vicuna-13b-v1.5'] on worker 51f7ff39 ...
Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`, it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.
0%| | 0/3 [00:00<?, ?it/s]
33%|██████████████████████████████████████████████████████████████▎ | 1/3 [00:08<00:17, 8.71s/it]
67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 2/3 [00:22<00:11, 11.76s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 15.34s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 14.07s/it]
2023-11-01 10:29:23 | ERROR | stderr |
2023-11-01 10:29:23 | INFO | model_worker | Register to controller
2023-11-01 10:29:23 | ERROR | stderr | INFO: Started server process [1472]
2023-11-01 10:29:23 | ERROR | stderr | INFO: Waiting for application startup.
2023-11-01 10:29:23 | ERROR | stderr | INFO: Application startup complete.
2023-11-01 10:29:23 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:21002 (Press CTRL+C to quit)
最后运行
python3 -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000
INFO: Started server process [2099]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)