I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.
I am using the following code for getting the words:
import tesseract
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)
mImgFile = "test.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result
This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well
解决方案
tesseract.GetBoxText() method returns the exact position of each character in an array.
Besides, there is a command line option tesseract test.jpg result hocr that will generate a result.html file with each recognized word's coordinates in it. But I'm not sure whether it can be called through python script.