PaddleOCR 表格识别，docker部署，cpu版本

hongkid

已于 2024-09-27 15:47:21 修改

阅读量493

点赞数 18

文章标签： docker 容器运维 PaddleOCR 表格识别

于 2024-09-27 14:55:36 首次发布

本文链接：https://blog.csdn.net/hongkid/article/details/142591838

版权

前置环境

centeros7

docker

拉取镜像

docker pull registry.baidubce.com/paddlepaddle/paddle:2.6.1

参考：开始使用_飞桨-源于产业实践的开源深度学习平台

这里拉取的镜像并不能立马用，只是内置好运行环境

随便找个目录下载paddleocr的代码

git clone https://gitee.com/paddlepaddle/PaddleOCR.git

启动并进入docker环境

docker run -p 9997:9997 --name paddle -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.6.1 /bin/bash

启动后会进入容器，按我这个例子容器里面的/paddle/等同宿主机的/home/paddleocr/

下面所有命令都在docker里面进行

安装依赖

cd /paddle/PaddleOCR 

pip3 install -r requirements.txt

下载模型

cd /paddle/PaddleOCR 
mkdir inference && cd inference

# 下载并解压 OCR 文本检测配置
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && tar -xf ch_PP-OCRv3_det_infer.tar

# 下载并解压 OCR 文本识别模型
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar &&  tar -xf ch_PP-OCRv3_rec_infer.tar

# 下载并解压 OCR 文本方向分类模型
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_infer.tar

#下载基于SLANet的中文表格识别模型
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
#下载PP-Structure 系列模型
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar && tar -xf picodet_lcnet_x1_0_fgd_layout_infer.tar

修改表格识别服务(structure_table)配置

/home/paddleocr/PaddleOCR/deploy/hubserving/structure_table/params.py

structure_table默认配置为英文表格识别模型和英文字典，需要调整为中文识别模板和对应的中文字典文件，修改完成保存即可。

 #调整模型文件路径为./inference/ch_ppstructure_mobile_v2.0_SLANet_infer/
  #调整字典文件路径为./ppocr/utils/dict/table_structure_dict_ch.txt

修改structure_system服务配置

/home/paddleocr/PaddleOCR/deploy/hubserving/structure_system/params.py

 cfg.layout_model_dir = './inference/picodet_lcnet_x1_0_fgd_layout_infer/'
  cfg.layout_dict_path = './ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt'

安装paddlehub

 pip3 install paddlehub

hubserving服务安装

cd /paddle/PaddleOCR 

#安装ocr_system服务
hub install deploy/hubserving/ocr_system
#安装structure_table服务
hub install deploy/hubserving/structure_table
#安装structure_system服务
hub install deploy/hubserving/structure_system

这里可能会遇到一些坑

1.protobuf版本要降级

解决：

pip uninstall protobuf
pip install protobuf==3.20.2

2.cannot import name 'preserve_channel_dim' from 'albucore.utils'

解决：

pip install albucore==0.0.16

然后可以正常安装了

hubserving服务启动

#以后台形式启动ocr_system structure_table 服务
nohup hub serving start -m ocr_system structure_table structure_system -p 9997 &

#查看启动日志
tail -f nohup.out

接口访问路径

ocr_system: http://127.0.0.1:9997/predict/ocr_system
structure_table: http://127.0.0.1:9997/predict/structure_table

接口说明

参数	说明
请求类型	post
Content-Type	application/json
参数格式	{“images”:[“图片 base64串”]}

参考：

https://blog.csdn.net/zhoushanmin/article/details/142258823

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/deploy/hubserving/readme.md

https://github.com/PaddlePaddle/PaddleOCR

PaddleOCR 服务化部署(基于PaddleHub Serving)_paddleocr服务器部署-CSDN博客

https://paddlepaddle.github.io/PaddleOCR/ppocr/infer_deploy/paddle_server.html

https://www.jianshu.com/p/5f39426a9152

Docker 部署 PaddleOCR 图文识别技术应用_网络存储_什么值得买

hongkid

关注

18
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫