Tesseract-OCR 进行文字识别 VS2010

最新推荐文章于 2023-03-29 10:08:43 发布

lanbing510

最新推荐文章于 2023-03-29 10:08:43 发布

阅读量7.5k

点赞数 1

分类专栏：计算机视觉

本文链接：https://blog.csdn.net/lanbing510/article/details/38405003

版权

计算机视觉专栏收录该内容

55 篇文章 5 订阅

订阅专栏

近日做铸件文字识别的项目，需要识别铸件上的字符和数字，找到开源的识别库Tesseract，下面简单记录下怎么使用。

首先在项目主页http://code.google.com/p/tesseract-ocr/ 下载库和相应的字库。由于本人使用的是VS2010，其lib和include等库使用的VS2008进行编译的，所以一直出错。用VS2010的同学可以在这里下载编译好的VS2010的相应的库。

然后进行配置，和其他库的配置类似，include lib dll。

#include "allheaders.h"
#include "baseapi.h"
#include "strngs.h"
#include <cv.h>
#include <highgui.h>
#include <iostream>
using namespace cv;
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
        char *image_path="zj.jpg";
	tesseract::TessBaseAPI  api;
	api.Init(NULL,"eng",tesseract::OEM_DEFAULT);

	api.SetPageSegMode(tesseract::PSM_AUTO);

	FILE* fin = fopen(image_path, "rb");
	if (fin == NULL) {
		printf("Cannot open input file: %s\n", image_path);
		exit(2);
	}
	fclose(fin);

	PIX   *pixs;
	if ((pixs = pixRead(image_path)) == NULL) {
		printf("Unsupported image type.\n");
		exit(3);
	}
	pixDestroy(&pixs);

	STRING text_out;
	if (!api.ProcessPages(image_path, NULL, 0, &text_out)) {
		printf("Error during processing.\n");
	}

	cout<<"识别结果为："<<text_out.string();
         
         return 0;
}