前言
由于公司是2G,所以一些收费的公网api不能用(同时也不安全),以至于内部尝试了多种开源ocr框架。首先是使用golang封装的一个ocr模块gosseract,使用英文模型多数字字母识别准确率高一点,不过也只有80%多的准确率。后面就尝试用gunicorn+flask+PaddleOCR 简单开发了一个web服务。
gosseract(自己弄一个unbuntu的基础镜像)
dockerfile
RUN echo 'deb http://mirrors.163.com/ubuntu/ bionic main restricted universe multiverse \n\
deb http://mirrors.163.com/ubuntu/ bionic-security main restricted universe multiverse \n\
deb http://mirrors.163.com/ubuntu/ bionic-updates main restricted universe multiverse \n\
deb http://mirrors.163.com/ubuntu/ bionic-proposed main restricted universe multiverse \n\
deb http://mirrors.163.com/ubuntu/ bionic-backports main restricted universe multiverse \n\
deb-src http://mirrors.163.com/ubuntu/ bionic main restricted universe multiverse \n\
deb-src http://mirrors.163.com/u