LLM训推适配-[昇腾910B]-大模型量化推理-qwen2-72B

LLM训推适配-[昇腾910B]-大模型量化推理

推理框架MindIE
目前支持w8a8/w8a16/w8a16三种量化方式

0.环境介绍

软件版本
驱动24.1.rc1
cann8.0.T37
Ascend-mindieMindIE 1.0.T61.B010
mindie-rt1.0.T61.B010
mindie-torch1.0.T61.B010
mindie-service1.0.T61.B010
mindie-llm1.0.T61.B010

自建镜像或者使用ascendhub发布的镜像

1.qwen2-72B-Chat w8a16量化


#1. 启动容器
docker run --rm  -it -u root --name=mindie_t61 --net=host --privileged=true -w /home --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/  -v /usr/local/sbin/:/usr/local/sbin/ -v /var/log/npu/slog/:/var/log/npu/slog -v /var/log/npu/profiling/:/var/log/npu/profiling -v /var/log/npu/dump/:/var/log/npu/dump -v /var/log/npu/:/usr/slog -v /etc/hccn.conf:/etc/hccn.conf -v /path/to/local/workspace/:/opt/files -v /tmp:/tmp  mindie-service:t61 /bin/bash
#2. 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/mindie/set_env.sh
source /usr/local/Ascend/mindie/latest/mindie-service/set_env.sh
source /opt/atb-models/set_env.sh
#3. 执行量化脚本

cd /opt/files/src/infer/mindie_llm/atb_models/latest/examples/models/qwen/
python quant_qwen2_72b_w8a16-fast.py /opt/files/pretrained_models/Qwen2-72B-Chat /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16
...
2024-08-07 01:44:35,764 - modelslim-logger - INFO - Calibration end!
2024-08-07 01:44:35,779 - modelslim-logger - WARNING - write directory not exists, creating directory /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16
2024-08-07 01:45:29,963 - modelslim-logger - INFO - invalid safetensors_name, set safetensors_name to default quant_model_weight_w8a16.safetensors
2024-08-07 01:45:29,964 - modelslim-logger - INFO - invalid json_name, set json_name to default quant_model_description_w8a16.json
2024-08-07 01:45:30,274 - modelslim-logger - INFO - Path of quant_model_weight.safetensors is /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/quant_model_weight_w8a16.safetensors

#4.拷贝模型配置
cp /opt/files/pretrained_models/Qwen2-72B-Chat/Qwen2-72B-Chat/*.json /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/
cp /opt/files/pretrained_models/Qwen2-72B-Chat/Qwen2-72B-Chat/*.txt /opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/

#5.模型配置文件中加入量化标记
/opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/config.json 中加入

  "quantize":"w8a16",

2. qwen2-72B-sft w8a8量化

1-2  同w8a16步骤

#3. 执行量化脚本
python convert_quant_weights.py --model_path /opt/files/pretrained_models/Qwen2-72B-Chat --save_directory /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/ --w_bit 8 --a_bit 8 --disable_level L0 --device_type npu --calib_file ../../convert/model_slim/boolq.jsonl 
...
100%|██████████████████████████████████████████████████████████████████████| 50/50 [01:55<00:00,  2.30s/it]
2024-08-19 05:51:11,629 - msmodelslim-logger - INFO - Calibration end!
2024-08-19 05:51:11,732 - msmodelslim-logger - INFO - write directory exists, write file to directory /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/
2024-08-19 05:51:55,977 - msmodelslim-logger - INFO - invalid safetensors_name, set safetensors_name to default quant_model_weight_w8a8.safetensors
2024-08-19 05:51:55,978 - msmodelslim-logger - INFO - invalid json_name, set json_name to default quant_model_description_w8a8.json
2024-08-19 05:51:58,881 - msmodelslim-logger - INFO - Path of quant_model_weight.safetensors is /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//quant_model_weight_w8a8.safetensors 
2024-08-19 05:52:53,524 - msmodelslim-logger - INFO - Save quant_model_weight.safetensors success!
2024-08-19 05:52:53,524 - msmodelslim-logger - INFO - Path of quant_model_description_json is /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//quant_model_description_w8a8.json 
2024-08-19 05:52:53,557 - msmodelslim-logger - INFO - Save quant_model_description_json success!

#4.拷贝模型配置
cp /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//*.json /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/-w8a8
cp /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8//*.txt /opt/files/pretrained_models/Qwen2-72B-Chat-w8a8/-w8a8

#5.模型配置文件中加入量化标记
/opt/files/pretrained_models/Qwen2-72B-Chat-w8a16/config.json 中加入

  "quantize":"w8a8",

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值