LLM训推适配-[昇腾910B]-大模型量化推理
推理框架MindIE
目前支持w8a8/w8a16/w8a16三种量化方式
0.环境介绍
软件 | 版本 |
---|---|
驱动 | 24.1.rc1 |
cann | 8.0.T37 |
Ascend-mindie | MindIE 1.0.T61.B010 |
mindie-rt | 1.0.T61.B010 |
mindie-torch | 1.0.T61.B010 |
mindie-service | 1.0.T61.B010 |
mindie-llm | 1.0.T61.B010 |
自建镜像或者使用ascendhub发布的镜像
1.qwen2-72B-Chat w8a16量化
#1. 启动容器
docker run --rm -it -u root --name=mindie_t61 --net=host --privileged=true -w /home --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/slog/:/var/log/npu/slog -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /var/log/npu/:/usr/slog -v /etc/hccn.conf:/etc/hccn.conf -v /path/to/local/workspace/:/opt/files -v /tmp:/tmp mindie-service:t61 /bin/bash
#2. 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/mindie/set_env.sh
source /usr/local/Ascend/mindie/latest/mindie-service/set_env.sh
source /opt/atb-models/set_env.sh
#3. 执行量化脚本
cd /opt/files/src/infer/mindie_llm/atb_models/latest/examples/models/qwen/
python quant_qwen2_72b_w8a16-fast.py /opt/files/pretrained_models/Qwen2-72B-Chat /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16
...
2024-08-07 01:44:35,764 - modelslim-logger - INFO - Calibration end!
2024-08-07 01:44:35,779 - modelslim-logger - WARNING - write directory not exists, creating directory /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16
2024-08-07 01:45:29,963 - modelslim-logger - INFO - invalid safetensors_name, set safetensors_name to default quant_model_weight_w8a16.safetensors
2024-08-07 01:45:29,964 - modelslim-logger - INFO - invalid json_name, set json_name to default quant_model_description_w8a16.json
2024-08-07 01:45:30,274 - modelslim-logger - INFO - Path of quant_model_weight.safetensors is /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/quant_model_weight_w8a16.safetensors
#4.拷贝模型配置
cp /opt/files/pretrained_models/Qwen2-72B-Chat/Qwen2-72B-Chat/*.json /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/
cp /opt/files/pretrained_models/Qwen2-72B-Chat/Qwen2-72B-Chat/*.txt /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/
#5.模型配置文件中加入量化标记
/opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/config.json 中加入
"quantize":"w8a16",
2. qwen2-72B-sft w8a8量化
1-2 同w8a16步骤
#3. 执行量化脚本
python convert_quant_weights.py --model_path /opt/files/pretrained_models/Qwen2-72B-Chat --save_directory /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/ --w_bit 8 --a_bit 8 --disable_level L0 --device_type npu --calib_file ../../convert/model_slim/boolq.jsonl
...
100%|██████████████████████████████████████████████████████████████████████| 50/50 [01:55<00:00, 2.30s/it]
2024-08-19 05:51:11,629 - msmodelslim-logger - INFO - Calibration end!
2024-08-19 05:51:11,732 - msmodelslim-logger - INFO - write directory exists, write file to directory /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/
2024-08-19 05:51:55,977 - msmodelslim-logger - INFO - invalid safetensors_name, set safetensors_name to default quant_model_weight_w8a8.safetensors
2024-08-19 05:51:55,978 - msmodelslim-logger - INFO - invalid json_name, set json_name to default quant_model_description_w8a8.json
2024-08-19 05:51:58,881 - msmodelslim-logger - INFO - Path of quant_model_weight.safetensors is /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//quant_model_weight_w8a8.safetensors
2024-08-19 05:52:53,524 - msmodelslim-logger - INFO - Save quant_model_weight.safetensors success!
2024-08-19 05:52:53,524 - msmodelslim-logger - INFO - Path of quant_model_description_json is /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//quant_model_description_w8a8.json
2024-08-19 05:52:53,557 - msmodelslim-logger - INFO - Save quant_model_description_json success!
#4.拷贝模型配置
cp /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//*.json /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/-w8a8
cp /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//*.txt /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/-w8a8
#5.模型配置文件中加入量化标记
/opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/config.json 中加入
"quantize":"w8a8",