【自动化】pytesseract获取指定文字在图片中的坐标，并裁剪文字所在的区域

本文链接：https://blog.csdn.net/qq_31812017/article/details/133875221

本文介绍了如何使用Python的PIL和pytesseract库定位图片中特定的文字“清君侧”，通过滑动窗口思想和字符轮廓数据来找到文字坐标，并进行裁剪以输出指定区域的图片。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

示例效果

要从如下图片中查找指定文字“清君侧”的具体坐标，并裁剪文字所在的区域
在这里插入图片描述
实际输出如下：

裁剪保存的图片如下：

具体代码

#!/usr/bin/env python
# -*- coding:utf-8 -*-
from PIL import Image
import pytesseract


def find_substring_indicds(str_dict, word):
    """
    滑动窗口思想查找指定字符串
    :param str_dict: 从图片中提取的字符轮廓字典数据
    :param word: 待查找的字符串
    :return:
    """
    dict_num = len(str_dict['char'])
    word_num = len(word)
    start = 0
    end = 0
    while end < dict_num:
        str = "".join(str_dict['char'][start:end+1])
        if str == word:
            print([start, end])
            return start, end
        elif len(str) < word_num:
            end += 1
        else:
            start += 1
    return False


# 打开图片转灰度
img = Image.open('input/纯文本-黑白3.png').convert('L')
# 识别多行文本
custom_oem_psm_config = r'--oem 2 --psm 6'

# 提取图片中的字符轮廓数据，以字典格式返回
# 要注意 pytesseract.image_to_boxes 的坐标原点在图片的左下角（确实有点不习惯）
boxes_dict = pytesseract.image_to_boxes(img, config=custom_oem_psm_config, lang='chi_sim', output_type='dict')
print(boxes_dict)

start, end = find_substring_indicds(boxes_dict, '清君侧')

print(f"char：{''.join(boxes_dict['char'][start:end+1])}")
left_1 = boxes_dict['left'][start]
top_1 = boxes_dict['top'][start]
bottom_1 = boxes_dict['bottom'][start]
left_2 = boxes_dict['left'][end]
top_2 = boxes_dict['top'][end]
bottom_2 = boxes_dict['bottom'][end]
x_center = (left_1 + left_2)/2
y_center = (top_1 + bottom_2)/2

print(f"以下坐标的坐标原点为左下角\n坐标1：{left_1} {top_1} {bottom_1}")
print(f"坐标2：{left_2} {top_2} {bottom_2}")
print(f"图片尺寸：{img.size}")
print(f"x_center：{ x_center }")
print(f"y_center：{y_center}")
print('{} {} {} {}'.format(left_1, img.size[1]-top_2, left_2, img.size[1]-top_1))

# 裁剪图片便于检查，坐标原点为图片左上角
img.crop(box=(left_1, img.size[1]-top_2, left_2, img.size[1]-bottom_1)).save('crop.png')