【文献阅读】Hybrid model for Chinese character recognition based on Tesseract-OCR

最新推荐文章于 2024-08-14 18:10:44 发布

步步咏凉天

最新推荐文章于 2024-08-14 18:10:44 发布

阅读量1.5k

点赞数

分类专栏： NLP 文章标签： ocr

本文链接：https://blog.csdn.net/qq_39753778/article/details/121321459

版权

4 篇文章 0 订阅

订阅专栏

总结：openCV(image preprocessing)+KNN(phrase processing)+Tesseract-OCR engine
个人感觉此篇论文质量不高，实验细节未论述，实验结果没有统计分析，言辞重复，存在低级错误
在这里插入图片描述

Introduction

The number of English letters is only 26. But the number of Chinese characters that used commonly are about 2,500.
the strokes of Chinese characters are complex and similar.
The differences between the different fonts of Chinese are large.

the first OCR engine, supports more than 100 languages (tesseract-
ocr/tessdata, https://github.com/tesseract-ocr/tessdata).
The OCR engine of Tesseract- version 4.0 uses Long Short-Term Memory (LSTM).
In the Tesseract-OCR Simplified Chinese language library,the character recognition of separate words is based on the feature of standard Chinese characters.

is a simple python OCR engine based on OpenCV and NumPy

在这里插入图片描述 The main work of this study includes image preprocessing and phrase processing.

The methods of image preprocessing include binarisation, noise reduction, image tilt correction, and the like

在这里插入图片描述

在这里插入图片描述

关注

专栏目录