使用Emgu.CV.OCR

27 篇文章 4 订阅

安装emgucv.

Emgu CV - Browse /emgucv/4.1.1 at SourceForge.net

一、下载tesseract-ocr

http://digi.bib.uni-mannheim.de/tesseract/

二、安装

安装完毕后设置环境变量

TESSDATA_PREFIX       X:\Program Files\Tesseract-OCR\

将X:\Tesseract-OCR”添加到环境变量Path中  (可以省略,设置该变量后可以用于训练)

参考:Emgu.CV.OCR Unable to create ocr model using Path and language_jzdzhiyun的博客-CSDN博客

tesseract-ocr的安装及使用_褶皱的包子的博客-CSDN博客_tesseract-ocr

三、下载其他语言识别包 ,也可在安装时直接勾选要安装的语言包,放入tessdata文件夹。

https://github.com/tesseract-ocr/tessdata

GitHub - tesseract-ocr/tessdata_best: Best (most accurate) trained LSTM models.

四、使用   

using Emgu.CV.Structure;
using Emgu.CV;
using Emgu.CV.OCR;
private Tesseract _ocr;//创建Tesseract 类

            string path = Application.StartupPath+"//";//申明数据源的路径,在运行目录的tessdata 文件夹下。
            string language = "";//申明选择语言。
                                 //*判断选择的语言*//
            if (checkBox1.Checked && checkBox2.Checked)//checkBox1 为识别英文。
            {
                language = "chi_sim+eng";
            }
            else
            {
                if (checkBox2.Checked)
                {
                    language = "chi_sim";
                }
                else
                {
                    language = "eng";
                    checkBox1.Checked = true;
                }
            }
            try
            {//https://github.com/tesseract-ocr/tessdata    Application.StartupPath + @"\tessdata"  \tessdata   .TesseractOnly))  //TesseractCubeCombined
                _ocr = new Tesseract(@"E:\Program Files\Tesseract-OCR\tessdata", language, OcrEngineMode.Default);//指定参数实例化tessdata 类。地址为空时,需将tessdata文件夹放在debug根目录                        
                _ocr.PageSegMode = PageSegMode.SingleBlock;
                _ocr.SetImage(gray);                
                int result = _ocr.Recognize();
                if (result != 0)
                {
                    MessageBox.Show("识别失败!");
                    return;
                }
                Tesseract.Character[] characters = _ocr.GetCharacters();//获取识别数据
                //Bgr drawColor = new Bgr(Color.Blue);//创建Bgr 为蓝色。
                //foreach (Tesseract.Character c in characters)//遍历每个识别数据。
                //{
                //    image.Draw(c.Region, drawColor, 1);//绘制检测到的区域。
                //}
                //imageBox1.Image = image;//显示绘制矩形区域的图像            
                String text = _ocr.GetUTF8Text();//得到识别字符串。
                richTextBox1.Text = text;//显示获取的字符串。
            }
            catch(Exception ex)
            {
                MessageBox.Show("检查运行目录是否有语言包"+ex.ToString());
            }           

五、效果

Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)

public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)

    Member of Emgu.CV.OCR.Tesseract



Summary:

Create an tesseract OCR engine.



Parameters:

dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.

language: The language is (usually) an ISO 639-3 string or NULL will default to eng.  It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier.  The language may be a string of the form [~]%lt;lang>[+[~]<lang>]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.

mode: OCR engine mode

whiteList: This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY



Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言



tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

Tesseract tesseract = new Tesseract();

tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言



tesseract.SetVariable("tessedit_char_whitelist", "0123456789");

参考:

C# OpenCV6 -车牌识别__iorilan的博客-CSDN博客_c#车牌识别

Tesseract.GetText, Emgu.CV.OCR C# (CSharp) Code Examples - HotExamples

OpenCVSharp入门教程——导读_小康师兄的博客-CSDN博客_opencvsharp中文文档

http://code.google.com/p/opencvsharp/w/list

OpenCVSharp学习 - 知乎 (zhihu.com)

shimat/opencvsharp_samples (github.com)

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值