使用pytorch模型学习框架easyocr模块识别行程码图片文字并使用Flask Web返回指定信息json字符串

全栈工程师修炼指南

已于 2022-05-26 11:09:21 修改

阅读量1.4k

点赞数 1

文章标签： python java 大数据人工智能编程语言

于 2022-05-25 23:25:30 首次发布

本文链接：https://blog.csdn.net/u013072756/article/details/124977322

版权

关注「WeiyiGeek」公众号

设为「特别关注」每天带你玩转网络安全运维、应用开发、物联网IOT学习！

本章目录：

使用pytorch模型学习框架easyocr模块行识别程码图片

安装部署
实践使用
入坑出坑

原文地址: https://www.weiyigeek.top

前言简述

描述: 公司有业务需求做一个行程码识别, 当前是调用某云的文字识别接口来识别行程码, 而其按照调用次数进行计费, 所以为了节约成本就要Python参考了Github上大佬的们项目, 截取部分函数，并使用Flask Web 框架进行封装，从而实现通过网页进行请求调用，并返回JSON字符串。

项目地址: https://github.com/JaidedAI/EasyOCR

使用pytorch模型学习框架easyocr模块识别行程码图片

安装部署

环境依赖

Python 建议 3.8 以上版本 (原本我的环境是Python 3.7安装时各种稀奇古怪的错误都出来，不得已abandon放弃)
flask 模块
torch 、torchvision 模块
easyocr 模块

安装流程
步骤 01.flask 和 easyocr及其依赖模块的安装。

pip install flask -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
pip install easyocr -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

步骤 02.为了防止使用时长时间拉取训练模型，我们可以手动下载模型并安装到指定位置，下载地址: https://www.jaided.ai/easyocr/modelhub/

# 主要下载以下模型
english_g2 : https://github.com/JaidedAI/EasyOCR/releases/download/v1.3/english_g2.zip
zh_sim_g2 : https://github.com/JaidedAI/EasyOCR/releases/download/v1.3/zh_sim_g2.zip
CRAFT : https://github.com/JaidedAI/EasyOCR/releases/download/pre-v1.1.6/craft_mlt_25k.zip

# 模型安装位置
# windows
C:\Users\WeiyiGeek\.EasyOCR\model

# Linux
/home/weiyigeek/.EasyOCR\model

实践使用

步骤 01.项目路径以及图片路径 D:\Study\Project

PS D:\Study\Project> ls
    目录: D:\Study\Project
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         2022/5/25     15:59                img
d-----         2022/5/25     17:07                upfile
-a----         2022/5/25     19:34           3966 index.py

步骤 02.基于Flask web框架下进行调用EasyOCR执行图片文字识别的python代码.

# -*- coding: utf-8 -*-
# ####################################################################
# Author: WeiyiGeek
# Description: 基于easyocr实现大数据通信行程卡图片识别信息获取-Flask项目。
# Time: 2022年5月25日 17点31分
# ====================================================================
# 环境依赖与模块安装, 建议 Python 3.8.x 的环境下进行
# pip install flask
# pip install easyocr
# #####################################################################
import re
import os
import glob
import json
import easyocr
from flask import Flask, jsonify, request

app = Flask(__name__)

# 项目与行程码图片
HOMEDIR=r"D:\Study\Project"

# 使用easyocr模块中的Reader方法, 设置识别中英文两种语言
reader = easyocr.Reader(['ch_sim', 'en'], gpu=False) 

def information_filter(text_str,file_path):
  """
  函数说明: 提出ocr识别的行程码
  参数值：字符串,文件名称
  返回值：有效信息组成的字典
  """
  # 健康码字段
  re_healthcode = re.compile('请收下(.{,2})行程卡')
  healthcode = re_healthcode.findall(text_str)[0]
  # 电话字段
  re_phone = re.compile('[0-9]{3}\*{4}[0-9]{4}')
  phone_str = re_phone.findall(text_str)[0]
  # 日期字段
  re_data = re.compile('2022\.[0-1][0-9]\.[0-3][0-9]')
  data_str = re_data.findall(text_str)[0]
  # 时间字段
  re_time = re.compile('[0-9][0-9]:[0-9][0-9]:[0-9][0-9]')
  time_str = re_time.findall(text_str)[0]
  # 地区城市字段
  citys_re = re.compile('到达或途经:(.+)结果包含')
  citys_str = citys_re.findall(text_str)[0].strip().split('(')[0]
  result_dic = {"filename": file_path ,"类型": healthcode, "电话": phone_str, "日期": data_str, "时间": time_str, "行程": citys_str}
  print(result_dic)
  return result_dic

# Flask 路由 - 首页
@app.route('/')
@app.route('/index')
def Index():
  return "<h4 style='text-algin:center'>https://blog.weiyigeek.top</h4><script>window.location.href='https://blog.weiyigeek.top'</script>"

# Flask 路由
@app.route('/tools/ocr',methods=["GET"])
def Travelcodeocr():
  """
  请求路径: /tools/ocr
  请求参数: ?file=test.png
  """
  filename = request.args.get("file")
  dirname = request.args.get("dir")
  if (filename):
    img_path = os.path.join(HOMEDIR, filename)
    print(img_path)  # 打印路径
    if (os.path.exists(img_path)):
      text = reader.readtext(img_path, detail=0) 
      text_str = "".join(text)
      try:
        result_dic = information_filter(text_str,os.path.basename(img_path))
      except Exception as err:
        print(err)
        return json.dumps({"status":"err", "img": filename}).encode('utf-8'), 200, {"Content-Type":"application/json"}
      return json.dumps(result_dic, ensure_ascii=False).encode('utf-8'), 200, {"Content-Type":"application/json"}
    else:
      return jsonify({"status": "err","msg": "文件"+img_path+"路径不存在!"})
  elif (dirname and os.path.join(HOMEDIR+dirname)):
    result_dic_all = []
    result_dic_err = []
    img_path_all = glob.glob(HOMEDIR+dirname+"\*.png")      # 支持正则匹配
    for img_path in img_path_all:
      print(img_path)  # 打印路径
      text = reader.readtext(img_path, detail=0)  # 支持图片路径和url,返回列表
      text_str = "".join(text)
      try:
        result_dic = information_filter(text_str,os.path.basename(img_path))
      except Exception as err:
        print(img_path,"-->>",err) # 错误输出
        result_dic_err.append(img_path)
        continue
      result_dic_all.append(result_dic)
    print(result_dic_err)
    return json.dumps(result_dic_all, ensure_ascii=False).encode('utf-8'), 200, {"Content-Type":"application/json"}
  else:
    return jsonify({"status": "err","msg": "请求参数有误!"})

# Flask 程序入口
if __name__ == '__main__':
  app.run(host='0.0.0.0',port=8000,debug=True)

步骤 03.运行该脚本并使用浏览进行指定行程码图片路径以及识别提取。

python .\index.py
  # Using CPU. Note: This module is much faster with a GPU.
  # * Serving Flask app 'index' (lazy loading)
  # * Environment: production
  #   WARNING: This is a development server. Do not use it in a production deployment.
  #   Use a production WSGI server instead.
  # * Debug mode: on
  # * Running on all addresses (0.0.0.0)
  #   WARNING: This is a development server. Do not use it in a production deployment.
  # * Running on http://127.0.0.1:8000
  # * Running on http://10.20.172.106:8000 (Press CTRL+C to quit)
  # * Restarting with stat
  # Using CPU. Note: This module is much faster with a GPU.
  # * Debugger is active!
  # * Debugger PIN: 115-313-307

温馨提示: 从上面的Python脚本中可以看出我们可使用file参数指定图片路径或者使用dir参数指定行程码图片存放目录。
例如，获取单个行程码图片信息，我本地浏览器访问http://127.0.0.1:8000/tools/ocr?file=img/00e336dbde464c809ef1f6ea568d4621.png地址,将会返回如下JSON字符串。

{"filename": "00e336dbde464c809ef1f6ea568d4621.png", "类型": "绿色", "电话": "157****2966", "日期": "2022.05.25", "时间": "09:03:56", "行程": "重庆市"}

例如，获取多个行程码图片信息，我本地浏览器访问http://127.0.0.1:8000/tools/ocr?file=/img/地址,将会返回如下图所示结果。

入坑出坑

问题1.通过pip install 安装easyocr离线的whl包是报ERROR: No matching distribution found for torch

错误信息:

pip install ./easyocr-1.4.2-py3-none-any.whl -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
ERROR: Could not find a version that satisfies the requirement torch (from easyocr) (from versions: none)
ERROR: No matching distribution found for torch

解决办法: python.exe -m pip install --upgrade pip

问题2.在Python3.7的环境中安装easyocr依赖的torch模块的whl安装包报not a supported wheel on this platform.错误

错误信息:

$ pip install torch-1.8.0+cpu-cp37-cp37m-win_amd64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple/
WARNING: Requirement 'torch-1.8.0+cpu-cp37-cp37m-win_amd64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/
ERROR: torch-1.8.0+cpu-cp37-cp37m-win_amd64.whl is

错误原因: 平台与下载的whl不符合, 此处我遇到的问题明显不是这个导致的，百度后我想是由于pip版本与python版本、以及系统平台联合导致。
解决办法:

# 解决1.假如,你是linux你可以通过 https://download.pytorch.org/whl/torch_stable.html 找到所需版本。
文件名解释：cpu或显卡/文件名-版本号-python版本-应该是编译格式-平台-cpu类型（intel也选amd64）
# torch-1.8.0+cpu-cp37-cp37m-win_amd64.whl

# 解决2.将 torch-1.8.0+cpu-cp37-cp37m-win_amd64.whl 更名为 torch-1.8.0+cpu-cp37-cp37m-win32.whl

问题3.在执行调用torch模块的py脚本时报Error loading "D:\****\lib\site-packages\torch\lib\asmjit.dll" or one of its dependencies.错误

错误信息:

Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.
It can be downloaded at https://aka.ms/vs/16/release/vc_redist.x64.exe
Traceback (most recent call last):
.....
OSError: [WinError 193] <no description> Error loading "D:\Program Files (x86)\Python37-32\lib\site-packages\torch\lib\asmjit.dll" or one of its dependencies.

解决办法: 在你的电脑上下载安装 https://aka.ms/vs/16/release/vc_redist.x64.exe 缺少的C++运行库，重启电脑。

问题4.在安装opencv_python_headless进行依赖模块安装时报ERROR: No matching distribution found for torchvision>=0.5错误

错误信息:

Using cached https://mirrors.aliyun.com/pypi/packages/a4/0a/39b102047bcf3b1a58ee1cc83a9269b2a2c4c1ab3062a65f5292d8df6594/opencv_python_headless-4.5.4.60-cp37-cp37m-win32.whl (25.8 MB)
ERROR: Could not find a version that satisfies the requirement torchvision>=0.5 (from easyocr) (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3)
ERROR: No matching distribution found for torchvision>=0.5