# Python Project: pillow, tesseract, and opencv_2

Python Project: pillow, tesseract, and opencv_2

一、Tesseract

Tesseract is an engine that was developed at HP Bristol Laboratory between 1985 and 1995, it was used to It was used to scan text, and Google restarted the project after 2006. But now it is a little out of time.

1. How to link tesseract with python

materials should be prepared: tesseract, pycharm , module pytesseract

import PIL
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

2. Discern text from image

We can use method help to learn about the way to deal with desseract

#part_1
# print(dir(pytesseract))
# help(Image.Image.resize)   #resize
# help(pytesseract.image_to_string)  #deep_learning

And the base feature of this software can be used in python like the code below:


help(Image.Image.resize)
image=Image.open("下载.png")
image.show()
# #display(image)   #using in jupyter
text=pytesseract.image_to_string(image)  #
print(text)

3. Some assignments

Imagine that what can we do if the text in image is to small or the color background confuse tesseract?

# help(image.convert)

#part_2
image=Image.open("下载 (1).png")
image_1=image.resize((image.width-1,image.height-1), PIL.Image.ANTIALIAS)  #resize tuple---width,height
image_2=image_1.convert("L")    #reset the channel of color
image_3=image_1.convert("1")

def binarize(image_to_transform, threshold):
    output_image=image_to_transform.convert("L")
    for x in range(output_image.width):
        for y in range(output_image.height):
            if output_image.getpixel((x,y))< threshold: #note that the first parameter is actually a tuple object
                output_image.putpixel( (x,y), 0 )   #just like its name putpixel can put a chosen num between 1-255 into a pixel
            else:
                output_image.putpixel( (x,y), 255 )
    return output_image
# this function can make each pixel become 1 or 255 with the help of the value of it.
thresh=10
for thresh in range(0,257,64):
    print("Trying with threshold " + str(thresh))
    binarize(Image.open('下载 (1).png'), thresh).show
    print(pytesseract.image_to_string(binarize(Image.open('下载.PNG'), thresh)))

And the list “options” below has some common function which have been integrate in the module:

options=[Image.NEAREST, Image.BOX, Image.BILINEAR, Image.HAMMING, Image.BICUBIC, Image.LANCZOS]
for option in options:
    # lets print the option name
    print(option)
    # lets display what this option looks like on our little sign
    display(image_3.resize( new_size, option))   #we can use help(...resize) and find that there a few of
    #function can be put into the() directly and it has special Features on dealing with them such as BICUBIC
    #we can also use img_1=bicubic(img,calue) as wellw7
image_3.show()
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值