【日常折腾】Python识别图片文字并对图片改名

豆奶豆豆奶

已于 2022-07-28 10:29:00 修改

阅读量2.1k

点赞数

分类专栏：日常折腾DIY 技巧记录，吸收魔法。文章标签： python 开发语言 opencv

于 2022-07-27 23:59:43 首次发布

本文链接：https://blog.csdn.net/Doudou_Nai/article/details/119987553

版权

日常折腾DIY 同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

技巧记录，吸收魔法。

4 篇文章 0 订阅

订阅专栏

1.前言

接到一个任务，将图片格式的专利的命名改为专利名称，效果如下。

在这里插入图片描述

2. 前期准备

安装openCV以及tesseract插件，tesseract插件需要设置为中文，推荐直接在PyCharm平台使用，借助pip直接安装。

3. 文字识别


# 头文件
import pytesseract
from PIL import Image

# 利用tesseract进行文字识别Image.open（）括号里的是目标图片位置
image = Image.open('C:/Users/yangp/Desktop/2ext/新建文件夹/202207.jpg')
word = pytesseract.image_to_string(image, lang='chi_sim')
print(word)

上段代码可以直接对任意图片进行文字识别读者可以自己试一下。

4. 文字过滤

大部分的文字被识别出来，我们可以直接copy图片中的专利名，用于更改图片名字，但这种改法，效率低，而且存在空格。
在这里插入图片描述

增加过滤函数remove。利用string.replace(“|”, “”)，将括号中前一个字符如空格换成无。

def remove(string):
    string = string.replace(" ", "")
    string = string.replace("|", "")
    string = string.replace(":", "")
    string = string.replace(";", "")
    string = string.replace(".", "")
    string = string.replace("“", "")
    return string.replace("\n", "");

过滤后完整代码如下

import pytesseract
from PIL import Image


def remove(string):
    string = string.replace(" ", "")
    string = string.replace("|", "")
    string = string.replace(":", "")
    string = string.replace(";", "")
    string = string.replace(".", "")
    string = string.replace("“", "")
    return string.replace("\n", "");


image = Image.open('C:/Users/yangp/Desktop/2ext/新建文件夹/202207.jpg')
word = pytesseract.image_to_string(image, lang='chi_sim')
cword = remove(word)[10:500]

print(cword)

remove(code)[10:500]这里增加了一个字段范围的筛选，只要10~500之间，减少显示字数。

在这里插入图片描述

过滤后的文字删去多余字符，可以直接copy，这里有个错别字，看来这个文字识别效果一般。

6 图片改名

改名代码

newpath = 'C:/Users/yangp/Desktop/2ext/新建文件夹/'  # 保存图片位置
newname = "一种低压铸造电机外壳螺旋水道砂芯清理机及其操作方法"  # 0 是新,可以修改
out = '2' + '+' + newname + '.jpg'
print(out1)
image.save(newpath + out)

newpath:改名后保存地址
newname：改的名字：2+一种无风扇散热装置，可以增加序号。
image.save(newpath + out) 保存！

7.完整代码

import pytesseract
from PIL import Image


def remove(string):
    string = string.replace(" ", "")
    string = string.replace("|", "")
    string = string.replace(":", "")
    string = string.replace(";", "")
    string = string.replace(".", "")
    string = string.replace("“", "")
    return string.replace("\n", "");


image = Image.open('C:/Users/yangp/Desktop/2ext/新建文件夹/202207.jpg')
word = pytesseract.image_to_string(image, lang='chi_sim')
cword = remove(word)[10:500]

print(cword)

newpath = 'C:/Users/yangp/Desktop/2ext/新建文件夹/'  # 保存图片位置
newname = "一种无风扇散热装置"  #  可以将提取的文字人工筛选一下，放在这里，或者直接让cword=newname
out = '2' + '+' + newname + '.jpg'
print(out)
image.save(newpath + out)

8. 总结

其实最好效果是，识别出图片中想要的文字，然后自动给图片改名，搜索相关文献后，发现一种可以识别图片特定位置的文字，但是，对特定位置的识别中，识别图形效率高，识别文字差，且由于图片的变形，识别文字效果一般，受限于笔者技术，这里的代码算是半成品，抛砖引玉和大家交流。

豆奶豆豆奶

关注

0
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
【日常折腾】Python识别图片文字并对图片改名

项目场景：Processing，问题描述：提示：这里描述项目中遇到的问题：例如：数据传输过程中数据不时出现丢失的情况，偶尔会丢失一部分数据APP 中接收数据代码：@Override public void run() { bytes = mmInStream.read(buffer); mHandler.obtainMessage(READ_DATA, bytes, -1, buffer).sendToTarget();
复制链接

扫一扫