使用深度学习解决拍照文档复杂背景二值化问题

知来者逆

已于 2024-04-11 09:36:52 修改

阅读量2.9k

点赞数 7

分类专栏：计算机视觉文章标签：省墨模式打印模式扫描手机扫描扫描王

于 2021-12-14 10:54:56 首次发布

本文链接：https://blog.csdn.net/matt45m/article/details/121922240

版权

计算机视觉专栏收录该内容

83 篇文章 46 订阅

订阅专栏

前言

1.在手持拍照设备对文档进行拍照时，很容易出现光线不均、阴影、过暗等，或者有些旧的文档，古籍文档都有虫洞、透背、字迹不清现象，为了方便阅读、打印文档，或者OCR识别，这些干扰都对处理结果有很多不良的影响。
2.在文档处理过程，往往分这几步，图像预处理、文档图像二值化、版面分析、文本检测与识别等环节，在文档才二值这块，有很多传统的算法可以使用，比如大津法，自适应二值化等，但在使用的过程，这些传统的算法只对针对某些特定的干扰做深度调参，并不能达到对所有文档都有高鲁棒性。

一.传统方法

1.传统数字图像处理里面，对图像二值化有好多用可用办法，我自己试过几种方法，从最常用的大津法、自适应二值化，到比较偏门的积分二值化。下面来对比下这几种方法的效果。

效果图像第一是灰度图像，第二张是自适应二值化，第三张是大津法，第四张是积分二值化。

第一种场景，手写文档，光线不均，有少些阴影：
原图：
在这里插入图片描述
二值化效果图像：

第二种场景，带大面积阴影的印刷文档，而且阴影比较明显：
原图：

二值图像效果图：
在这里插入图片描述
第三种场景，有很重透背的文档：
原图：

二值图像效果图：
在这里插入图片描述
第四种场景，纸张带有底色的古籍手抄文档：
原图：

二值图像效果图：
在这里插入图片描述

2.从以上效果来看，当使用场景有干扰或者光线有变化的情况下，传统的图像图像处理并不能完美的解决文档图像二值化的问题，表现稍微好一些积分二值化（效果图像第四格）也不能胜任大部分环境。但有些使用场景相对稳定的情况下，可以选择积分二值化这个方法。下面是积分二值化的代码，是基于OpenCV C++写的。


/// <summary>
/// 积分二值化
/// </summary>
/// <param name="inputMat">输入图像</param>
/// <param name="thre">阈值（1.0）</param>
/// <param name="outputMat">输出图像</param>
void thresholdIntegral(cv::Mat& inputMat, double thre, cv::Mat& outputMat)
{
    // accept only char type matrices
    CV_Assert(!inputMat.empty());
    CV_Assert(inputMat.depth() == CV_8U);
    CV_Assert(inputMat.channels() == 1);
   
    outputMat = cv::Mat(inputMat.size(), CV_8UC1, 1);

    // rows -> height -> y
    int nRows = inputMat.rows;
    // cols -> width -> x
    int nCols = inputMat.cols;

    // create the integral image
    cv::Mat sumMat;
    cv::integral(inputMat, sumMat);

    CV_Assert(sumMat.depth() == CV_32S);
    CV_Assert(sizeof(int) == 4);

    int S = MAX(nRows, nCols) / 8;
    double T = 0.15;

    // perform thresholding
    int s2 = S / 2;
    int x1, y1, x2, y2, count, sum;

    // CV_Assert(sizeof(int) == 4);
    int* p_y1, * p_y2;
    uchar* p_inputMat, * p_outputMat;

    for (int i = 0; i < nRows; ++i)
    {
        y1 = i - s2;
        y2 = i + s2;

        if (y1 < 0)
        {
            y1 = 0;
        }
        if (y2 >= nRows)
        {
            y2 = nRows - 1;
        }

        p_y1 = sumMat.ptr<int>(y1);
        p_y2 = sumMat.ptr<int>(y2);
        p_inputMat = inputMat.ptr<uchar>(i);
        p_outputMat = outputMat.ptr<uchar>(i);

        for (int j = 0; j < nCols; ++j)
        {
            // set the SxS region
            x1 = j - s2;
            x2 = j + s2;

            if (x1 < 0)
            {
                x1 = 0;
            }
            if (x2 >= nCols)
            {
                x2 = nCols - 1;
            }

            count = (x2 - x1) * (y2 - y1);

            // I(x,y)=s(x2,y2)-s(x1,y2)-s(x2,y1)+s(x1,x1)
            sum = p_y2[x2] - p_y1[x2] - p_y2[x1] + p_y1[x1];

            if ((int)(p_inputMat[j] * count) < (int)(sum * (1.0 - T) * thre))
                p_outputMat[j] = 0;
            else
                p_outputMat[j] = 255;
        }
    }
}

3.当使用环境不确定或者使用环境比较复杂的时候，传统方法再什么调参也不能完美解决，这个时候只能考虑使用深度学习了。

二.基于U-net的图像二值化

1.Unet 网络
U-Net一开始就是针对生物医学图片的分割用的，一直到现在许多对医学图像的分割网络中，很大一部分会采取U-Net作为网络的主干。
算法部分我这里参考的U-Net关于视网膜血管分割这个项目，github地址：https://github.com/orobix/retina-unet 。我们可以看看它对视网膜血管分割。
在这里插入图片描述
2.深度学习框架用的Pytorch，参考了这个项目：https://github.com/milesial/Pytorch-UNet添加链接描述。
3.但U-Net只能训练尺寸为512的图像，但对于拍摄的文档，如果把尺寸都压到512来做标签和训练，肯定会丢失好多细节上的东西，我这里把网络按ENet的结构做了微调。

三.模型推理

1.我的测试环境是win10，vs2019,OpenCV 4.5，模型推理这里为了方便，就直接用OpenCV的dnn。
2.代码：

DirtyDocUnet类：

#pragma once
#include <iostream>
#include <string>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn/dnn.hpp>

class DirtyDocUnet
{
public:
	DirtyDocUnet(std::string _model_path);
	
	void dnnInference(cv::Mat &cv_src, cv::Mat &cv_dst);

	void docBin(const cv::Mat& cv_src, cv::Mat& cv_dst);
private:
	std::string model_path;
	cv::dnn::Net doc_net;
	int target_w = 1560;
	int target_h = 1560;
};

#include "DirtyDocUnet.h"

DirtyDocUnet::DirtyDocUnet(std::string _model_path)
{
	model_path = _model_path;
	doc_net = cv::dnn::readNet(model_path);
}

void DirtyDocUnet::dnnInference(cv::Mat &cv_src, cv::Mat &cv_dst)
{
	cv::Size reso(this->target_w,this->target_h);

	cv::Mat cv_gray;
	cv::cvtColor(cv_src, cv_gray, cv::COLOR_BGR2GRAY);
	cv::Mat bold = cv::dnn::blobFromImage(cv_gray, 1.0 / 255, reso, cv::Scalar(0, 0, 0), false, false);
	doc_net.setInput(bold);

	cv::Mat cv_out = doc_net.forward();

	cv::Mat cv_seg = cv::Mat::zeros(cv_out.size[2], cv_out.size[3], CV_8UC1);

	for (int i = 0; i < cv_out.size[2] * cv_out.size[3]; i++)
	{
		cv_seg.data[i] = cv_out.ptr<float>(0, 0)[i] * 255;
	} 
	cv::resize(cv_seg, cv_dst, cv_src.size());
}

/// <summary>
/// 二值图像的边缘光滑处理
/// </summary>
/// <param name="src">输入图像</param>
/// <param name="dst">输出图像</param>
/// <param name="uthreshold">宽度阈值</param>
/// <param name="vthreshold">高度阈值</param>
/// <param name="type">突出部的颜色，0表示黑色，1代表白色</param>
void deleteZigzag(cv::Mat& src, cv::Mat& dst, int uthreshold, int vthreshold, int type)
{
    //int threshold;
    src.copyTo(dst);
    int height = dst.rows;
    int width = dst.cols;
    int k;  //用于循环计数传递到外部
    for (int i = 0; i < height - 1; i++)
    {
        uchar* p = dst.ptr<uchar>(i);
        for (int j = 0; j < width - 1; j++)
        {
            if (type == 0)
            {
                //行消除
                if (p[j] == 255 && p[j + 1] == 0)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 255)
                            {
                                break;
                            }
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + 1; h < k; h++)
                            {
                                p[h] = 255;
                            }
                        }
                    }
                }
                //列消除
                if (p[j] == 255 && p[j + width] == 0)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 255) break;
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 255;
                        }
                    }
                }
            }
            else  //type = 1
            {
                //行消除
                if (p[j] == 0 && p[j + 1] == 255)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + 1; h < k; h++)
                                p[h] = 0;
                        }
                    }
                }
                //列消除
                if (p[j] == 0 && p[j + width] == 255)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 0;
                        }
                    }
                }
            }
        }
    }
}

void  DirtyDocUnet::docBin(const cv::Mat& cv_src, cv::Mat& cv_dst)
{
	if (cv_src.empty())
	{
		return;
	}
	std::vector<cv::Mat> cv_pieces;
	cv_pieces.push_back(cv_src(cv::Rect(0, 0, cv_src.cols, cv_src.rows / 2)));
	cv_pieces.push_back(cv_src(cv::Rect(0, cv_src.rows / 2, cv_src.cols, cv_src.rows / 2)));

	cv::Mat cv_pars;
	for (auto v : cv_pieces)
	{
		cv::Mat cv_temp;

		dnnInference(v, cv_temp);

		cv_pars.push_back(cv_temp);
	}
    cv::Mat cv_resize;
    cv::resize(~cv_pars, cv_resize, cv::Size(4096, 4096), cv::INTER_CUBIC);
    cv::Mat cv_zig;
    deleteZigzag(cv_resize, cv_zig, 5, 5, 0);
    cv::Mat cv_bin;

    cv::resize(~cv_zig, cv_dst, cv::Size(cv_src.cols, cv_src.rows), cv::INTER_LINEAR);
}

#include "DirtyDocUnet.h"

DirtyDocUnet::DirtyDocUnet(std::string _model_path)
{
	model_path = _model_path;
	doc_net = cv::dnn::readNet(model_path);
}

void DirtyDocUnet::dnnInference(cv::Mat &cv_src, cv::Mat &cv_dst)
{
	cv::Size reso(this->target_w,this->target_h);

	cv::Mat cv_gray;
	cv::cvtColor(cv_src, cv_gray, cv::COLOR_BGR2GRAY);
	cv::Mat bold = cv::dnn::blobFromImage(cv_gray, 1.0 / 255, reso, cv::Scalar(0, 0, 0), false, false);
	doc_net.setInput(bold);

	cv::Mat cv_out = doc_net.forward();

	cv::Mat cv_seg = cv::Mat::zeros(cv_out.size[2], cv_out.size[3], CV_8UC1);

	for (int i = 0; i < cv_out.size[2] * cv_out.size[3]; i++)
	{
		cv_seg.data[i] = cv_out.ptr<float>(0, 0)[i] * 255;
	} 
	cv::resize(cv_seg, cv_dst, cv_src.size());
}

/// <summary>
/// 二值图像的边缘光滑处理
/// </summary>
/// <param name="src">输入图像</param>
/// <param name="dst">输出图像</param>
/// <param name="uthreshold">宽度阈值</param>
/// <param name="vthreshold">高度阈值</param>
/// <param name="type">突出部的颜色，0表示黑色，1代表白色</param>
void deleteZigzag(cv::Mat& src, cv::Mat& dst, int uthreshold, int vthreshold, int type)
{
    //int threshold;
    src.copyTo(dst);
    int height = dst.rows;
    int width = dst.cols;
    int k;  //用于循环计数传递到外部
    for (int i = 0; i < height - 1; i++)
    {
        uchar* p = dst.ptr<uchar>(i);
        for (int j = 0; j < width - 1; j++)
        {
            if (type == 0)
            {
                //行消除
                if (p[j] == 255 && p[j + 1] == 0)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 255)
                            {
                                break;
                            }
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + 1; h < k; h++)
                            {
                                p[h] = 255;
                            }
                        }
                    }
                }
                //列消除
                if (p[j] == 255 && p[j + width] == 0)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                        {
                            p[k] = 255;
                        }
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 255) break;
                        }
                        if (p[k] == 255)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 255;
                        }
                    }
                }
            }
            else  //type = 1
            {
                //行消除
                if (p[j] == 0 && p[j + 1] == 255)
                {
                    if (j + uthreshold >= width)
                    {
                        for (int k = j + 1; k < width; k++)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2; k <= j + uthreshold; k++)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + 1; h < k; h++)
                                p[h] = 0;
                        }
                    }
                }
                //列消除
                if (p[j] == 0 && p[j + width] == 255)
                {
                    if (i + vthreshold >= height)
                    {
                        for (k = j + width; k < j + (height - i) * width; k += width)
                            p[k] = 0;
                    }
                    else
                    {
                        for (k = j + 2 * width; k <= j + vthreshold * width; k += width)
                        {
                            if (p[k] == 0) break;
                        }
                        if (p[k] == 0)
                        {
                            for (int h = j + width; h < k; h += width)
                                p[h] = 0;
                        }
                    }
                }
            }
        }
    }
}

void  DirtyDocUnet::docBin(const cv::Mat& cv_src, cv::Mat& cv_dst)
{
	if (cv_src.empty())
	{
		return;
	}
	std::vector<cv::Mat> cv_pieces;
	cv_pieces.push_back(cv_src(cv::Rect(0, 0, cv_src.cols, cv_src.rows / 2)));
	cv_pieces.push_back(cv_src(cv::Rect(0, cv_src.rows / 2, cv_src.cols, cv_src.rows / 2)));

	cv::Mat cv_pars;
	for (auto v : cv_pieces)
	{
		cv::Mat cv_temp;

		dnnInference(v, cv_temp);

		cv_pars.push_back(cv_temp);
	}
    cv::Mat cv_resize;
    cv::resize(~cv_pars, cv_resize, cv::Size(4096, 4096), cv::INTER_CUBIC);
    cv::Mat cv_zig;
    deleteZigzag(cv_resize, cv_zig, 5, 5, 0);
    cv::Mat cv_bin;

    cv::resize(~cv_zig, cv_dst, cv::Size(cv_src.cols, cv_src.rows), cv::INTER_LINEAR);
}

调用类
main.cpp

#include <iostream>
#include "DirtyDocUnet.h"

void thresholdIntegral(cv::Mat& inputMat, double thre, cv::Mat& outputMat);
void mergeImages(const cv::Mat& cv_src1, const cv::Mat& cv_src2, cv::Mat& cv_dst);

void imshow(std::string name, const cv::Mat& cv_src)
{
    cv::namedWindow(name, 0);
    int max_rows = 800;
    int max_cols = 800;
    if (cv_src.rows >= cv_src.cols && cv_src.rows > max_rows)
    {
        cv::resizeWindow(name, cv::Size(cv_src.cols * max_rows / cv_src.rows, max_rows));
    }
    else if (cv_src.cols >= cv_src.rows && cv_src.cols > max_cols)
    {
        cv::resizeWindow(name, cv::Size(max_cols, cv_src.rows * max_cols / cv_src.cols));
    }
    cv::imshow(name, cv_src);
}

int main(void)
{
    std::string path = "images";
    std::vector<std::string> filenames;
    cv::glob(path, filenames, false);
    std::string model_path = "models/unetv2.onnx";

    DirtyDocUnet doc_bin(model_path);
    int i = 0;

    for (auto v : filenames)
    {
        cv::Mat cv_src = cv::imread(v);
        cv::Mat cv_bin, cv_otsu, cv_gray, cv_integral;
        cv::cvtColor(cv_src, cv_gray, cv::COLOR_BGR2GRAY);

        cv::threshold(cv_gray, cv_bin, 127, 255, cv::THRESH_BINARY);
        cv::threshold(cv_gray, cv_otsu, 0, 255, cv::THRESH_OTSU);
        thresholdIntegral(cv_gray, 1.0, cv_integral);

        cv::Mat cv_unet;
        doc_bin.docBin(cv_src, cv_unet);

        cv_bin.push_back(cv_otsu);
        cv_integral.push_back(cv_unet);

        cv::Mat cv_all;
        mergeImages(cv_bin, cv_integral,cv_all);
        cv::imwrite(v, cv_all);
    }
}

/// <summary>
/// 积分二值化
/// </summary>
/// <param name="inputMat">输入图像</param>
/// <param name="thre">阈值（1.0）</param>
/// <param name="outputMat">输出图像</param>
void thresholdIntegral(cv::Mat& inputMat, double thre, cv::Mat& outputMat)
{
    // accept only char type matrices
    CV_Assert(!inputMat.empty());
    CV_Assert(inputMat.depth() == CV_8U);
    CV_Assert(inputMat.channels() == 1);
   
    outputMat = cv::Mat(inputMat.size(), CV_8UC1, 1);

    // rows -> height -> y
    int nRows = inputMat.rows;
    // cols -> width -> x
    int nCols = inputMat.cols;

    // create the integral image
    cv::Mat sumMat;
    cv::integral(inputMat, sumMat);

    CV_Assert(sumMat.depth() == CV_32S);
    CV_Assert(sizeof(int) == 4);

    int S = MAX(nRows, nCols) / 8;
    double T = 0.15;

    // perform thresholding
    int s2 = S / 2;
    int x1, y1, x2, y2, count, sum;

    // CV_Assert(sizeof(int) == 4);
    int* p_y1, * p_y2;
    uchar* p_inputMat, * p_outputMat;

    for (int i = 0; i < nRows; ++i)
    {
        y1 = i - s2;
        y2 = i + s2;

        if (y1 < 0)
        {
            y1 = 0;
        }
        if (y2 >= nRows)
        {
            y2 = nRows - 1;
        }

        p_y1 = sumMat.ptr<int>(y1);
        p_y2 = sumMat.ptr<int>(y2);
        p_inputMat = inputMat.ptr<uchar>(i);
        p_outputMat = outputMat.ptr<uchar>(i);

        for (int j = 0; j < nCols; ++j)
        {
            // set the SxS region
            x1 = j - s2;
            x2 = j + s2;

            if (x1 < 0)
            {
                x1 = 0;
            }
            if (x2 >= nCols)
            {
                x2 = nCols - 1;
            }

            count = (x2 - x1) * (y2 - y1);

            // I(x,y)=s(x2,y2)-s(x1,y2)-s(x2,y1)+s(x1,x1)
            sum = p_y2[x2] - p_y1[x2] - p_y2[x1] + p_y1[x1];

            if ((int)(p_inputMat[j] * count) < (int)(sum * (1.0 - T) * thre))
                p_outputMat[j] = 0;
            else
                p_outputMat[j] = 255;
        }
    }
}

void mergeImages(const cv::Mat& cv_src1, const cv::Mat& cv_src2, cv::Mat& cv_dst)
{
    CV_Assert(!(cv_src1.rows != cv_src2.rows || cv_src1.cols != cv_src2.cols));
    CV_Assert(!(cv_src1.empty() || cv_src2.empty()));

    cv_dst.create(cv_src1.rows, cv_src1.cols * 2, cv_src1.type());
    cv::Mat r1 = cv_dst(cv::Rect(0, 0, cv_src1.cols, cv_src1.rows));
    cv_src1.copyTo(r1);

    cv::Mat r2 = cv_dst(cv::Rect(cv_src1.cols, 0, cv_src1.cols, cv_src1.rows));
    cv_src2.copyTo(r2);
}

3.对比下处理效果
原图：
在这里插入图片描述
二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：

原图：
在这里插入图片描述

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：
在这里插入图片描述

原图：
在这里插入图片描述

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：
在这里插入图片描述
原图：

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：

原图：

二值图像，第一张是自适应二值化，第二张是积分二值化，第三张是大津法二值化，第四张是UNet二值化的效果：
在这里插入图片描述
4.从整体的效果上看，使用深度学习方法，最终的效果不管在什么样的环境在，都能得到一个不错的。其实这个效果还有可提升的空间，我当前用的训练集大概是2000张左右的样本，如果还能增加更多环境下的样本，那模型泛化会更好。
5.这个效果在一些手机扫描类APP里面也有类似的功能，一般叫省墨模式，或者黑白扫描，我们在安卓和iOS上都移植了这个算法，下面是我们iOS APP里面的效果,对移动端扫描APP感兴趣的可以去试试《扫描家》这个APP。
请添加图片描述

6.可执行文件和源码都上传到CSDN，可执行文件把图像放到images,执行exe文件就在当前目录下保存几种效果的对比，
地址：https://download.csdn.net/download/matt45m/50653819
模型部署源码：https://download.csdn.net/download/matt45m/50654793
模型训练可以参考：计算机视觉——基于深度学习UNet实现的复杂背景文档二值化算法实现与模型训练

知来者逆

关注

7
点赞
踩
32

收藏

觉得还不错? 一键收藏
打赏
9
评论
使用深度学习解决拍照文档复杂背景二值化问题

前言1.在手持拍照设备对文档进行拍照时，很容易出现光线不均、阴影、过暗等，或者有些旧的文档，古籍文档都有虫洞、透背、字迹不清现象，为了方便阅读、打印文档，或者OCR识别，这些干扰都对处理结果有很多不良的影响。2.在文档处理过程，往往分这几步，图像预处理、文档图像二值化、版面分析、文本检测与识别等环节，在文档才二值这块，有很多传统的算法可以使用，比如大津法，自适应二值化等，但在使用的过程，这些传统的算法只对针对某些特定的干扰做深度调参，并不能达到对所有文档都有高鲁棒性。一.传统方法1.传统数字图像处理
复制链接

扫一扫