文档图像二值化 Document Image Binarization paper 1

yddcs

已于 2024-03-25 16:17:24 修改

阅读量1k

点赞数 2

分类专栏： # 文档图像二值化文章标签： python 深度学习神经网络机器学习 ieee论文

于 2020-10-31 14:48:55 首次发布

本文链接：https://blog.csdn.net/qq_35200351/article/details/109402024

版权

4 篇文章 3 订阅

订阅专栏

二值化通常是许多文档分析任务中的第一步，并在后面的步骤中发挥关键作用。

2017 Dual Discriminator Generative Adversarial Nets
Paper and Code

【文档图像二值化数据集 databases】
【文档图像二值化 paper 系列 –2–】

找paper搭配 Sci-Hub 食用更佳 (๑•̀ㅂ•́)و✧

Sci-Hub 实时更新 : https://tool.yovisun.com/scihub/
公益科研通文献求助：https://www.ablesci.com/
在这里插入图片描述
2020 Document Image Binarization Using Dual Discriminator Generative Adversarial Networks
(Paper and Code)
总体架构：生成器，局部鉴别器（32×32）和全局鉴别器（256×256）

生成器：

全局鉴别器：输入256×256 Conv 2-layers + pooling + 2-layers FC
在这里插入图片描述
局部鉴别器：没有池化输入32×32 Conv 4-layers + 3-layers FC

Loss function：

参数分别是0.5， 5， 75。注重local和生成图像

2020 Two-Stage Generative Adversarial Networks for Document Image Binarization with Color Noise and Background Removal (IEEE Trans. code)
第一阶段数据增强，通过R G B和灰度图
第二阶段多尺度融合，通过resizing图像
在这里插入图片描述
adopted EfficientNet [54] as the encoder in the generator;
similar to that of the discriminator network from the discriminator, based on a Markov random field model, of Pix2Pix GAN [47].

Global 512×512 in stage 2 上
Local 256×256 in stage 1 and stage 2 下
各种 Data augmentation

数据：DIBCO 120,000
LRDE DBD 310,000
Shipping label image 65,000

Pre-trained ImageNet
Encoder in the generator

use cGANs to solve the multi-scale information combination

but training unstable and quality unsatisfactory

because strokes not uniform and the proportion of text in the overall image varies greatly

so introduction Cascaded structure to multi-scales

在这里插入图片描述
过程：原图crop裁剪至 256*256 大小，通过G1生成对应的输出，再进行图像拼接，得到局部上下文信息。将其与GT一起resize输入D1

原图resize至256*256 大小，通过G1生成对应的输出，在resize至原图大小。这两张图和灰度图进行组合，再crop输入G2得到优化后的全局二值化结果。将其与GT一起resize输入D2

总的来说就是Cascaded G1 G2， G2 optimizes G1

技术：Pre-trained on facades photo-to-labels dataset

experience on DIBCO dataset cut and resize into 256*256

G1=G2 ： 5-layers Unet
D1=D2 ： FCN 256*256

鉴别器 D1 D2：resize 到256×256
在这里插入图片描述
生成器：UNet

2018 A selectional auto-encoder approach for document image binarization (PR code)

在这里插入图片描述
study three possibilities：

卷积自动编码器（CAE）：考虑卷积层的传统编码器 - 解码器架构[32]。编码器函数由卷积层和最大池操作层实现，而解码器函数由卷积层组成，包括卷积加上上采样运算符。
堆叠What-Where自动编码器（SWW AE）[33]：在此拓扑中，编码器的池化层产生两种类型的连接。将What连接到以下图层，而其中Where连接被馈送到解码器功能的类似层。我们考虑使用卷积网络作为编码器的SWW AE的实例，以及反卷积[34]作为解码器。
非常深的残差编码器 - 解码器网络（RED-NET）[35]：该拓扑包括从每个编码层到其类似解码层的残差连接，这有利于收敛并有更好的结果。此外，通过步幅卷积来执行下采样，而不是resorting汇集层。通过反卷积层实现上采样。

2021 Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net (PR)

pre-trained on general image， less data ， cascading modular U-Net

在这里插入图片描述
技术：COCO Text dataset ， pre-trained ，Cascading 5个 => 扩张, 侵蚀, 直方图均衡, Canny, Otsu

Cascading structure
Inter-module skip-connections

数据：历年DIBCO ， 900 张 256×256大小

关注