tesseract图片文字识别 - 滤波降噪灰度二值化去除文本边框去除验证码干扰线

最新推荐文章于 2024-03-21 22:37:35 发布

阳十三

最新推荐文章于 2024-03-21 22:37:35 发布

阅读量8.6k

点赞数 4

分类专栏： java

本文为wdful原创文章，未经wdful允许不得转载。

本文链接：https://blog.csdn.net/qq_28114645/article/details/81328039

版权

本文介绍了如何使用Tesseract OCR进行图片文字识别，包括图像的滤波降噪、灰度处理、二值化以及去除文本边框和验证码干扰线等预处理步骤，以提高文字识别的准确性。提供了处理前后的图片对比和Demo下载链接。

摘要由CSDN通过智能技术生成

import org.apache.xmlgraphics.image.codec.tiff.TIFFEncodeParam;
import org.apache.xmlgraphics.image.codec.util.ImageEncoder;

import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;

public class ImageUtil {
   

    public static void main(String[] args) throws IOException {
        File testDataDir = new File("pic");
        final String destDir = "picimages";
        for (File file : testDataDir.listFiles()) {
            cleanLinesInImage(file, destDir);
            cleanLinesInImage(file, destDir);
            cleanLinesInImage(file, destDir);
        }

    }

    /**
     * @param sfile   需要去噪的图像
     * @param destDir 去噪后的图像保存地址
     * @throws IOException
     */
    public static void cleanLinesInImage(File sfile, String destDir) throws IOException {
        File destF = new File(destDir);
        if (!destF.exists()) {
            destF.mkdirs();
        }

        BufferedImage bufferedImage = ImageIO.read(sfile);
        int h = bufferedImage.getHeight();
        int w = bufferedImage.getWidth();

        // 灰度化
        int[][] gray = new int[w][h];
        for (int x = 0; x < w; x++) {
            for (int y = 0; y < h; y++) {
                int argb = bufferedImage.getRGB(x, y);
                // 图像加亮（调整亮度识别率非常高）
                int r = (int) (((argb >> 16) & 0xFF) * 1.1 + 30);
                int g = (int) (((argb >> 8) & 0xFF) * 1.1 + 30);
                int b = (

最低0.47元/天解锁文章

阳十三

关注

4
点赞
踩
17

收藏

觉得还不错? 一键收藏
打赏
3
评论
tesseract图片文字识别 - 滤波降噪灰度二值化去除文本边框去除验证码干扰线

import org.apache.xmlgraphics.image.codec.tiff.TIFFEncodeParam;import org.apache.xmlgraphics.image.codec.util.ImageEncoder;import javax.imageio.ImageIO;import java.awt.*;import java.awt.image.Buf...
复制链接

扫一扫