[paddlepaddle文本分类样例代码]使用预训练模型Fine-tune完成中文文本分类任务
一、电脑运行环境
显卡:单块Nvidia RTX3090
驱动: 460.73.01
CUDA版本: 11.2
cudnn版本:8.2.0
paddlepaddle版本:paddlepaddle-gpu==2.1.3
二、初始化代码运行环境
#初始化conda环境
conda create -n paddlenlp python=3.7
conda activate paddlenlp
#安装paddlepaddle
conda install paddlepaddle-gpu==2.1.3 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
pip install --upgrade paddlenlp
#获取文本分类测试代码
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/examples/text_classification/pretrained_models/
三、训练模型
python -m paddle.distributed.launch --gpus "0" train.py --device gpu --save_dir ./checkpoints
最后训练结果
eval loss: 0.22238, accu: 0.94167
global step 810, epoch: 3, batch: 210, loss: 0.02384, accu: 0.98438, speed: 3.04 step/s
global step 820, epoch: 3, batch: 220, loss: 0.05576, accu: 0.97969, speed: 16.15 step/s
global step 830, epoch: 3, batch: 230, loss: 0.03062, accu: 0.98229, speed: 16.15 step/s
global step 840, epoch: 3, batch: 240, loss: 0.06318, accu: 0.98438, speed: 16.10 step/s
global step 850, epoch: 3, batch: 250, loss: 0.16337, accu: 0.98438, speed: 15.47 step/s
global step 860, epoch: 3, batch: 260, loss: 0.07645, accu: 0.98490, speed: 15.83 step/s
global step 870, epoch: 3, batch: 270, loss: 0.02989, accu: 0.98661, speed: 16.12 step/s
global step 880, epoch: 3, batch: 280, loss: 0.14670, accu: 0.98711, speed: 15.94 step/s
global step 890, epoch: 3, batch: 290, loss: 0.06679, accu: 0.98507, speed: 16.11 step/s
global step 900, epoch: 3, batch: 300, loss: 0.07022, accu: 0.98594, speed: 16.18 step/s
eval loss: 0.21322, accu: 0.94667
INFO 2021-10-21 10:49:35,730 launch.py:268] Local processes completed.
四、导出模型
python deploy/python/predict.py --model_dir=./export
输出结果
[2021-10-21 10:55:41,299] [ INFO] - Already cached /home/ubuntu/.paddlenlp/models/ernie-tiny/vocab.txt
[2021-10-21 10:55:41,299] [ INFO] - Already cached /home/ubuntu/.paddlenlp/models/ernie-tiny/spm_cased_simp_sampled.model
[2021-10-21 10:55:41,299] [ INFO] - Already cached /home/ubuntu/.paddlenlp/models/ernie-tiny/dict.wordseg.pickle
W1021 10:55:44.848312 31977 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.2, Runtime API Version: 11.2
W1021 10:55:44.850705 31977 device_context.cc:422] device: 0, cuDNN Version: 8.1.
Data: 这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般 Label: negative
Data: 怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片!开始还怀疑是不是赠送的个别现象,可是后来发现每张DVD后面都有!真不知道生产商怎么想的,我想看的是猫和老鼠,不是米老鼠!如果厂家是想赠送的话,那就全套米老鼠和唐老鸭都赠送,只在每张DVD后面添加一集算什么??简直是画蛇添足!! Label: negative
五、使用Paddle Serving API进行推理部署
#安装环境依赖
wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && \
tar xf centos_ssl.tar && rm -rf centos_ssl.tar && \
mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && \
ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && \
ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && \
ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && \
ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
#安装代码依赖
pip install paddle-serving-app paddle-serving-client paddle-serving-server-gpu
#Serving的模型和配置导出
python -u deploy/serving/export_servable_model.py \
--inference_model_dir ./export/ \
--model_file inference.pdmodel \
--params_file inference.pdiparams
#服务启动
python -m python -m paddle_serving_server.serve \
--model ./serving_server \
--port 8090 \
--gpu_id 0
六、客户端预测
python deploy/serving/client.py \
--client_config_file ./serving_client/serving_client_conf.prototxt \
--server_ip_port 127.0.0.1:8090 \
--max_seq_length 128
预测结果
[2021-10-20 16:51:27,305] [ INFO] - Already cached /home/ubuntu/.paddlenlp/models/ernie-tiny/vocab.txt
[2021-10-20 16:51:27,306] [ INFO] - Already cached /home/ubuntu/.paddlenlp/models/ernie-tiny/spm_cased_simp_sampled.model
[2021-10-20 16:51:27,306] [ INFO] - Already cached /home/ubuntu/.paddlenlp/models/ernie-tiny/dict.wordseg.pickle
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1020 16:51:30.886700 29786 naming_service_thread.cpp:209] brpc::policy::ListNamingService("127.0.0.1:8090"): added 1
Data: 这个宾馆比较陈旧了,特价的房间也很一般。总体来说一般 Label: negative
Data: 怀着十分激动的心情放映,可是看着看着发现,在放映完毕后,出现一集米老鼠的动画片 Label: positive
Data: 作为老的四星酒店,房间依然很整洁,相当不错。机场接机服务很好,可以在车上办理入住手续,节省时间。 Label: positive
参考: