用于表格检测和结构识别的深度学习研究综述-Deep learning for table detection and structurerecognition: A survey

才疏学浅，努力修炼

已于 2024-01-29 17:26:20 修改

阅读量2k

点赞数 20

分类专栏：深度学习文章标签：深度学习人工智能

于 2024-01-27 16:30:21 首次发布

本文链接：https://blog.csdn.net/qq_55888300/article/details/135884537

版权

深度学习专栏收录该内容

11 篇文章

订阅专栏

部分内容整理如下：

表格区域检测方法

表格检测已经被研究了一段较长的时间。研究人员使用了不同的方法，可以分为如下：

1.基于启发式的方法

2.基于机器学习的方法

3.基于深度学习的方法

基于启发式的方法，主要用于20世纪90年代、2000年代和2010年初。他们使用了不同的视觉线索，如线条、关键词、空间特征等，来检测表格。

Pyreddy[69]等人提出了一种使用字符对齐、孔和间隙来检测表格的方法。Wang等人使用了一种统计方法来根据连续单词之间的距离来检测表线[70]。将水平连续的单词与垂直相邻的线分组起来，提出候选表实体。Jahan等人提出了一种使用单词间距和线高的局部阈值来检测表格区域的方法[71]。

Itonori[72]提出了一种基于规则的方法，通过文本块排列和规则行位置来定位文档中的表格。 Chandran和Kasturi开发了另一种基于垂直和水平线的表格检测方法[73]。Wonkyo Seo等人使用连接点（水平线和垂直线的交点）检测进行进一步处理[56]。

Hassan[74]等人通过分析文本块的空间特征来定位和分割表格。Ruffolo[75]等人介绍了PDF-TREX，这是一种用于单列PDF文档中的表格识别的启发式自下而上的方法。它使用页面元素的空间特征来将它们对齐和分组为段落和表格。Nurminen[76]提出了一套启发式方法来定位具有公共对齐的后续文本框，并确定它们作为一个表格的概率。Fang[77]使用表格标题作为起点来检测表格区域并分解其元素。

Harit[78]等人提出了一种基于唯一表起始和尾部模式识别的表格检测技术。Tupaj[79]等人提出了一种基于OCR的表格检测技术。该系统基于关键字搜索类似表格的行序列，上述方法在具有统一布局的文档上效果比较好。

国内的表格区域检测研究起步较晚，启发式方法较少。其中，具有代表性的是Fang等人提出的基于表格结构特征和视觉分隔符的方法。该方法以PDF文档为输入，分四步进行表格检测：PDF解析，页面布局分析，线条检测和页面分隔符检测，表格检测。在最后的表格检测部分中，通过对上一步检测出的线条和页面分隔符进行分析得到表格位置。然而，启发式规则需要推广到更广泛的表格种类，并不真正适合通用的解决方案。因此，开始采用机器学习方法来解决表检测问题。

上述方法在具有统一布局的文档上效果相对较好。然而，启发式规则需要调整以适应更广泛的表，并且并不真正适合通用解决方案。因此，开始采用各种机器学习方法来解决表格检测问题。

基于机器学习的方法在2000年代和2010年代很常见。

Kieninger[80]等人通过对单词片段进行聚类，应用了一种无监督的学习方法。Cesarini[81]等人使用了一种改进的XY树监督学习方法。Fan[82]等人使用有监督和无监督的方法进行PDF文档中的表格检测。Wang和Hu 将决策树和SVM分类器应用于布局、内容类型和词组特征[83]。T. Kasar[84]等人使用结点检测，然后将信息传递给SVM分类器。Silva[85]等人在视觉页面元素（隐马尔可夫模型）的顺序观察上应用联合概率分布，将潜在的表线合并到表中。Klampfl[86]等人比较了两种来自数字科学专题文章的无监督表识别方法。Docstrum[87]算法应用KNN将结构聚合成线，然后使用线之间的垂直距离和角度将它们组合成文本块。该算法是在1993年设计的，比本节中提到的其他方法要早。

F Shafait [88]提出了一种有用的表识别方法，该方法在具有相似布局的文档上表现良好，包括商业报告、新闻故事和杂志页面。Tesseract OCR引擎提供了该算法的一个开源实现。

随着神经网络的兴趣，研究人员开始将它们应用于文档布局分析任务中。最初，它们被用于更简单的任务，如表检测。后来，随着更复杂的架构的发展，更多的工作被放到表列和整体结构识别中。

A Gilani [《Table detection using deep learning》]展示了如何使用深度学习来识别表格。文档图片最初是按照文中提出的方法进行预处理的。然后，这些照片被发送到一个区域候选网络中进行表格测试，然后是一个完全连接的神经网络。该方法对各种具有不同布局的文档图片非常精确，包括文档、研究论文和期刊。

D Prasad [《An approach for end to end table detection and structure recognition from image-based documents》]提出了一种解释文档图片中的表格数据的自动表格检测方法，主要需要解决两个问题：表格检测和表格结构识别。使用单一的卷积神经网络（CNN）模型，提供了一个增强的基于深度学习的端到端解决方案，用于处理表检测和结构识别的挑战。CascadeTabNet是一个基于级联掩码区域的CNN高分辨率网络（Cascade mask R-CNN HRNet）的模型，可以同时识别表区域和识别这些表格中的结构单元格。

SS Paliwal [《Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images》]提出了一种新的端到端深度学习模型，可用于表格检测和结构识别。为了划分表格和列区域，该模型使用了表格检测和表结构识别这两个目标之间的依赖关系。然后，从发现的表格子区域中，进行基于语义规则的行提取。

Y Huang [《A yolo-based table detection method》]描述了一种基于YOLO原理的表格检测算法。作者对YOLOv3提供了各种自适应改进，包括一种锚定优化技术和两种后处理方法，以解释文档对象和真实对象之间的显著差异。还使用k-means聚类进行锚点优化，以创建更适合表格而不是自然对象的锚点，使他们的模型更容易找到表格的精确位置。在后处理过程中，将从投影的结果中删除额外的空白和有噪声的页面对象。

L Hao [《A table detection method for pdf documents based on convolutional neural networks》]提供了一种基于卷积神经网络的PDF文档中检测表格的新方法，这是目前最广泛使用的深度学习模型之一。该方法首先使用一些模糊的约束来选择一些类似表的区域，然后构建和细化卷积网络，以确定所选择的区域是否为表格。此外，卷积网络立即提取并使用表格部分的视觉方面特征，同时也考虑了原始PDF文档中包含的非视觉信息，以帮助获得更好的检测结果。

SA Siddiqui [《Decnt: Deep deformable cnn for table detection》]为检测文档中的表格提供了一种新的策略。这里给出的方法利用了数据的潜力来识别任何排列的表。该方法直接适用于图像，使它普遍能适用于任何格式。该方法采用了可变形CNN和faster R-CNN/FPN的独特混合。由于表格可能以不同的大小和转换（方向）的形式出现，传统的CNN有一个固定的感受野，这使得表格识别很困难。可变形卷积将其感受野建立在输入的基础上，使其能够对其感受野进行改造以匹配输入。由于感受野的定制，网络可以适应任何布局的表格。

N Sun [《Faster r-cnn based table detection combining corner locating》]提出了一种基于Faster R-CNN的表检测的寻角方法。首先使用Faster R-CNN网络来实现粗表格识别和角定位。然后，使用坐标匹配来对属于同一表格的那些角进行分组。不可靠的边同时被过滤。最后，匹配的角组微调并调整表格边框。在像素级，该技术提高了表格边界查找的精度。

I Kavasidis[《A saliency-based convolutional neural network for table and chart detection in digitized documents》]提出了一种检测表格和图表的方法，使用深度cnn、图形模型和 saliency ideas的组合。M Holecek[《Table understanding in structured documents》]提出了在账单等结构化文档中利用图卷积进行表格理解的概念，扩展了图神经网络的适用性。在研究中也使用了PDF文档，研究结合行项表格检测和信息提取，解决表格检测问题。任何字符都可以快速识别为行项或不使用行项技术。在字符分类之后，表格区域可以很容易地识别出来，因为与账单上的其他文本部分相比，表格线能够相当有效地区分。

A Casado-Garcıa[《The benefits of close-domain fine-tuning for table detection in document images》]使用了目标检测技术，作者已经表明，在进行了彻底的测试后发现，从一个更近域进行微调可以提高表格检测的性能。作者利用了Mask R-CNN、YOLO、SSD和 Retina Net结合目标检测算法。该研究选择了两个基本数据集， TableBank和PascalVOC。

X Zheng [《Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context》]提供了全局表格提取器（GTE），这是一种联合检测表格和识别单元结构的方法，可以在任何对象检测模型之上实现。为了利用单元格位置预测来训练他们的表网络，作者开发了GTE-Table，它引入了一种基于表格固有的单元格约束限制的新惩罚。一种名为GTE-Cell的新型分层单元识别网络利用了表格样式。此外，为了快速、低成本地构建一个相当大的训练和测试数据语料库，作者开发了一种方法来自动分类现有文本中的表格和单元格结构。

Y Li[《A gan-based feature generator for table detection》]提供了一种新的网络来生成表格文本的布局元素，并提高规则较少的表格的识别性能。生成对抗网络（GAN）与该特征生成器模型是类似的。作者要求特征生成器模型为规则约束严格和规则松散的表格提取可比较的特征。

DD Nguyen [《a fully convolutional network for table detection and segmentation in document images》]引入了TableSegNet，一个完全卷积的网络，设计紧凑，可以同时分离和检测表。TableSegNet使用较浅的路径来发现高分辨率的表格位置，而使用较深的路径来检测低分辨率的表格区域，将发现的区域分割成单独的表格。TableSegNet在整个特征提取过程中使用具有广泛内核大小的卷积块，并在主输出中使用一个额外的表格边界类，以提高检测和分离能力。

D Zhang [《Yolo-table: disclosure document table detection with involution》]提出了一种 YOLO-table-based的表格检测方法。为了提高网络学习表格空间排列方面的能力，作者将退化纳入了网络的核心，并创建了一个简单的FPN网络来提高模型的有效性。这项研究还提出了一种基于表格的增强技术。

下图是几种基于深度学习的表格检测方法的优缺点的比较。

表格结构识别模型

为了识别文档图像中的表结构，本部分回顾了深度学习方法。为了读者的利益，我们将这些方法划分为离散的深度学习原则。表3、4列出了基于对象检测识别表结构的所有方法及其优缺点。它还讨论了这些方法中使用的各种基于深度学习的方法。

A Zucker[107]提出了CluSTi，一种用于识别发票扫描图像中的表结构的聚类方法，作为一种有效的方法。CluSTi做出了三项贡献。首先，它使用聚类方法来消除表格图片中的高噪声。其次，它使用最先进的文本识别来提取所有文本框。最后，CluSTi使用具有最佳参数的水平和垂直聚类技术将文本框组织成正确的行和列。Z Zhang[108]提出的拆分、嵌入和合并（SEM）是一种准确的表结构识别器。M Namysl[109]在本研究中提出了一种通用的模块化表提取方法。

在这里插入图片描述

表2：几种基于深度学习的表检测方法的优缺点比较

E Koci[110]提供了一种新的方法，用于在确定每个单元格的布局角色后识别电子表格中的表格并构建布局区域。使用图形模型，它们表达了这些区域之间的空间相互关系。在此基础上，他们提出了移除和征服（RAC），这是一种基于一组精心选择的标准的表识别算法。

利用可变形卷积网络的潜力，SA Siddiqui[51]提出了一种分析文档图像中表格模式的独特方法。P Riba[54]在本文中提出了一种基于图形的技术，用于识别文档图片中的表格。还使用位置、上下文和内容类型，而不是原始内容（识别文本），因此这只是一种结构感知技术，不依赖于语言或文本阅读质量。E Koci[111]使用基于遗传的技术进行图形划分，以识别与表中表匹配的图形部分。

SA Siddiqui[112]将结构识别问题描述为语义分割问题。为了分割行和列，作者采用了完全卷积网络。引入了预测平铺的方法，该方法降低了表结构识别的复杂性，假设表结构中的一致性。作者从ImageNet中导入了预训练的模型，并使用了FCN编码器和解码器的结构模型。当给定图像时，模型创建与原始输入图片大小相同的特征。

SA Khan[113]在这项工作中提出了一种基于深度学习的鲁棒解决方案，用于从文档图片中的已识别表中提取行和列。在将表格图片发送到双向递归神经网络之前，使用门控递归单元（GRU）和所建议的解决方案中具有softmax激活的完全连接层对表格图片进行预处理。

SF Rashid[114]为不同文档图片中的表内容识别提供了一种新的基于学习的方法。SR Qasim[115]提出了一种基于图网络的表识别架构，作为典型神经网络的一种替代方案。S Raja[116]描述了一种用于识别表结构的方法，该方法结合了细胞检测和交互模块来定位细胞，并根据行和列预测它们与其他检测到的细胞的关系。此外，对作为额外差分分量的单元识别的损失函数添加结构限制。Y Deng[52]研究了端到端表识别的现有问题，他还强调了在这一领域需要更大的数据集。

Y Zou[117]的另一项研究呼吁开发一种使用全卷积网络的基于图像的表结构识别技术。所示的工作划分了表的行、列和单元格。所有表组件的估计边界都使用连接组件分析来增强。根据行和列分隔符的位置，然后为每个单元格分配行和列编号。此外，还使用特殊算法来优化蜂窝边界。

为了识别表中的行和列，KAHashmi[118]提出了一种指导的表结构识别技术。根据这项研究，通过使用锚点优化方法，可以更好地定位行和列。在他们提出的工作中，使用掩码R-CNN和优化的锚来检测行和列的边界。

对表格结构进行分段的另一项工作是W Xue[119]的ReS2TIM论文，该论文描述了从表格中重建句法结构。回归每个单元格的坐标是该模型的主要目标。

最初使用新技术构建了一个可以识别表中每个单元的邻居的网络。在研究中，给出了一个基于距离的加权系统，该系统将帮助网络克服与训练相关的班级失衡问题。

C Tensmeyer[120]提出了SPLERGE（拆分和合并），这是另一种使用扩张卷积的方法。他们的策略需要使用两个不同的深度学习模型，第一个模型建立表格的网格状布局，第二个模型确定是否可以在多行或多列上进一步跨越单元格。

Nassar[68]为表结构提供了新的识别模型。后者在两个重要方面增强了PubTabNet端到端深度学习模型的最新编码器-双解码器。首先，作者提供了一种全新的表单元对象检测解码器。这使得他们可以轻松访问编程PDF中表格单元格的内容，而无需训练任何专有的OCR解码器。作者声称，这种体系结构的改进使表内容提取更加精确，并使他们能够使用非英语表。第二，基于变压器的解码器取代LSTM解码器。

S Raja[121]提出了一种新的基于对象检测的深度模型，该模型专为快速优化而设计，并捕捉表格内单元格的自然排列。即使使用精确的单元格检测，密集表识别仍然可能存在问题，因为多行/列跨越单元格使得难以捕获长距离的行/列关系。因此，作者还试图通过确定唯一的基于直线图的公式来增强结构识别。作者从语义的角度强调了表中空单元格的相关性。作者建议对一个很好的评估标准进行修改，以考虑这些细胞。为了激发对这个问题的新观点，然后提供一个中等规模的评估数据集，其中包含根据人类认知建模的注释。

X Shen[122]提出了两个模块，称为“汇总行”（RA）和聚合列（CA）。首先，为了生成行和列的粗略预测并解决高误差容限问题，应用了特征切片和平铺。其次，计算通道的注意力图以进一步获得行和列信息。为了完成行分割和列分割，作者使用RA和CA构建了一个称为行和列聚合网络（RCANet）的语义分割网络。

C Ma[123]提出了RobusTabNet，这是一种识别表格结构并从各种文档图片中检测其边界的新方法。作者建议使用CornerNet作为一个新的区域建议网络，为Faster-R-CNN生成更高质量的表格建议，这大大提高了Faster R-CNN用于表格识别的定位精度。通过仅利用最小的ResNet-18骨干网络。

此外，作者还提出了一种新的拆分和合并方法来识别表结构。在该方法中，使用新的空间CNN分离线预测模块将每个检测到的表划分为网格单元，然后使用网格CNN单元合并模块来恢复生成单元。他们的表格结构识别器可以准确地识别具有显著空白区域的表格和几何变形（甚至弯曲）的表格，因为空间CNN模块可以在整个表格图片中有效地传输上下文信息。B Xiao[124]假设一个复杂的表结构可以用一个图来表示，其中顶点和边代表单个单元格及其之间的连接。

然后，作者设计了一个条件注意网络，并将表结构识别问题描述为细胞关联分类问题（CATT Net）。

Jain[125]建议训练一个深度网络，以识别表格图片中包含的各种单词对之间的空间关系，从而破译表格结构。作者通过单词深度空间关联提供了一个名为TSR-DSAW:TSR的端到端管道，该管道以HTML等结构化格式生成表格图片的数字表示。建议的技术首先利用文本检测网络（如CRAFT）来识别输入表格图片中的每个单词。接下来，使用动态编程创建单词配对。这些单词对在每个单独的图像中加下划线，然后提供给DenseNet-121分类器，该分类器已被训练以识别空间相关性，如同一行、同一列、同一单元格或无。最后，作者对分类器输出应用后处理，以生成HTML表结构。

H Li[126]将该问题表述为细胞关系提取挑战，并提供T2，一种成功地从数字保存的文本中提取表结构的尖端两阶段方法。T2提供了一个广泛的概念，称为一个基本连接，它准确地表示细胞之间的直接关系。为了找到复杂的表结构，它还构建了一个对齐图并使用消息传递网络。

[1] J. Hu, R. S. Kashi, D. Lopresti, G. T. Wilfong, Evaluating the performance of table processing algorithms, International Journal on Document Analysis and Recognition 4 (3) (2002) 140–153.
[2] P. Doll´ar, R. Appel, S. Belongie, P. Perona, Fast feature pyramids for object detection, IEEE transactions on pattern analysis and machine intelligence 36 (8) (2014) 1532–1545.
[3] J. Yang, G. Yang, Modified convolutional neural network based on dropout and the stochastic gradient descent optimizer, Algorithms 11 (3) (2018) 28.
[4] S. Li, W. Liu, G. Xiao, Detection of srew nut images based on deep transfer learning network, in: 2019 Chinese Automation Congress (CAC), IEEE, 2019, pp. 951–955.
[5] K. L. Masita, A. N. Hasan, S. Paul, Pedestrian detection using r-cnn object detector, in: 2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI), IEEE, 2018, pp. 1–6.
[6] Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, Q. Sun, Deep learning for image-based cancer detection and diagnosis- a survey, Pattern Recognition 83 (2018) 134–149.
[7] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once:Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[8] A. Abdallah, A. Berendeyev, I. Nuradin, D. Nurseitov, Tncr:table net detection and classification dataset, Neurocomputing 473 (2022)79–97. doi:10.1016/j.neucom.2021.11.101.URL https://www.sciencedirect.com/science/article/pii/S0925231221018142
[9] R. Fakoor, F. Ladhak, A. Nazi, M. Huber, Using deep learning to enhance cancer diagnosis and classification, in: Proceedings of the international conference on machine learning, Vol. 28, ACM, New York,USA, 2013, pp. 3937–3949.
[10] S. Minaee, Z. Liu, Automatic question-answering using a deep similarity neural network, in: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, 2017, pp. 923–927.
[11] A. Abdallah, M. Kasem, M. A. Hamada, S. Sdeek, Automated question-answer medical model based on deep learning technology, in:Proceedings of the 6th International Conference on Engineering & MIS
2020, 2020, pp. 1–8.
[12] A. Arpteg, B. Brinne, L. Crnkovic-Friis, J. Bosch, Software engineering challenges of deep learning, in: 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), IEEE, 2018,
pp. 50–59.
[13] M. A. Hamada, A. Abdallah, M. Kasem, M. Abokhalil, Neural network estimation model to optimize timing and schedule of software projects,in: 2021 IEEE International Conference on Smart Information Systems and Technologies (SIST), IEEE, 2021, pp. 1–7.
[14] M. Mahmoud, M. Kasem, A. Abdallah, H. S. Kang, Ae-lstm: Autoencoder with lstm-based intrusion etection in iot, in: 2022 International Telecommunications Conference (ITC-Egypt), IEEE, 2022, pp. 1–6.
[15] W. Xu, J. Jang-Jaccard, A. Singh, Y. Wei, F. Sabrina, Improving performance of autoencoder-based network anomaly detection on nslkdd dataset, IEEE Access 9 (2021) 140136–140146.
[16] S. A. Mahmoud, I. Ahmad, W. G. Al-Khatib, M. Alshayeb, M. T. Parvez, V. M¨argner, G. A. Fink, Khatt: An open arabic offline handwritten text database, Pattern Recognition 47 (3) (2014) 1096–1112.
[17] D. Nurseitov, K. Bostanbekov, D. Kurmankhojayev, A. Alimova,A. Abdallah, R. Tolegenov, Handwritten kazakh and russian (hkr)database for text recognition, Multimedia Tools and Applications 80 (21) (2021) 33075–33097.
[18] N. Toiganbayeva, M. Kasem, G. Abdimanap, K. Bostanbekov, A. Abdallah, A. Alimova, D. Nurseitov, Kohtd: Kazakh offline handwritten text dataset, Signal Processing: Image Communication 108 (2022)116827.
[19] A. Fischer, C. Y. Suen, V. Frinken, K. Riesen, H. Bunke, A fast matching algorithm for graph-based andwriting recognition, in: International Workshop on Graph-Based Representations in Pattern Recognition, Springer, 2013, pp. 194–203.
[20] S. Schreiber, S. Agne, I. Wolf, A. Dengel, S. Ahmed, Deepdesrt: Deep learning for detection and structure recognition of tables in document images, in: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), Vol. 1, IEEE, 2017, pp. 1162–1167.
[21] M. Traquair, E. Kara, B. Kantarci, S. Khan, Deep learning for the detection of tabular information from electronic component datasheets,in: 2019 IEEE Symposium on Computers and Communications (ISCC), IEEE, 2019, pp. 1–6.
[22] A. Gilani, S. R. Qasim, I. Malik, F. Shafait, Table detection using deep learning, in: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), Vol. 1, IEEE, 2017, pp. 771–776.
[23] D. N. Tran, T. A. Tran, A. Oh, S. H. Kim, I. S. Na, Table detection from document image using vertical arrangement of text blocks,International Journal of Contents 11 (4) (2015) 77–85.
[24] L. Hao, L. Gao, X. Yi, Z. Tang, A table detection method for pdf documents based on convolutional neural networks, in: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), IEEE, 2016, pp. 287–292.
[25] S. Mao, A. Rosenfeld, T. Kanungo, Document structure analysis algorithms: a literature survey, Document recognition and retrieval X 5010(2003) 197–207.
[26] E. Kara, M. Traquair, M. Simsek, B. Kantarci, S. Khan, Holistic designfor deep learning-based discovery of tabular structures in datasheet images, Engineering Applications of Artificial Intelligence 90 (2020)
103551.
[27] M. Sarkar, M. Aggarwal, A. Jain, H. Gupta, B. Krishnamurthy, Document structure extraction using prior based high resolution hierarchicalsemantic segmentation, in: European Conference on Computer Vision,Springer, 2020, pp. 649–666.
[28] R. Zanibbi, D. Blostein, J. R. Cordy, A survey of table recognition,Document Analysis and Recognition 7 (1) (2004) 1–16.
[29] D. W. Embley, M. Hurst, D. Lopresti, G. Nagy, Table-processing paradigms: a research survey, International Journal of Document Analysis and Recognition (IJDAR) 8 (2) (2006) 66–86.
[30] B. Co¨uasnon, A. Lemaitre, Recognition of tables and forms (2014).[31] S. Khusro, A. Latif, I. Ullah, On methods and tools of table detection,extraction and annotation in pdf documents, Journal of Information
Science 41 (1) (2015) 41–57.
[32] R. Szeliski, Computer vision: algorithms and applications, SpringerScience & Business Media, 2010.
[33] B. C. G. Lee, Line detection in binary document scans: a case study
with the international tracing service archives, in: 2017 IEEE International Conference on Big Data (Big Data), IEEE, 2017, pp. 2256–2261.
[34] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu,M. Pietik¨ainen, Deep learning for generic object detection: A survey,International journal of computer vision 128 (2) (2020) 261–318.
[35] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE transactions on pattern analysis andmachine intelligence 35 (8) (2013) 1798–1828.
[36] Y. LeCun, Y. Bengio, G. Hinton, et al., Deep learning. nature, 521(7553), 436-444, Google Scholar Google Scholar Cross Ref Cross Ref(2015).
[37] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.[38] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, C. I. S´anchez,A survey on deep learning in medical image analysis, Medical image analysis 42 (2017) 60–88.
[39] X. X. Zhu, D. Tuia, L. Mou, G.-S. Xia, L. Zhang, F. Xu, F. Fraundorfer,Deep learning in remote sensing: A comprehensive review and list ofresources, IEEE Geoscience and Remote Sensing Magazine 5 (4) (2017)
8–36.
[40] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu,X. Wang, G. Wang, J. Cai, et al., Recent advances in convolutional neural networks, Pattern Recognition 77 (2018) 354–377.
[41] S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-L.Shyu, S.-C. Chen, S. S. Iyengar, A survey on deep learning: Algorithms,techniques, and applications, ACM Computing Surveys (CSUR) 51 (5)(2018) 1–36.
[42] T. Young, D. Hazarika, S. Poria, E. Cambria, Recent trends in deep learning based natural language processing, ieee Computational intelligenCe magazine 13 (3) (2018) 55–75.
[43] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li,M. Sun, Graph neural networks: A review of methods and applications,AI Open 1 (2020) 57–81.
[44] Z. Zhang, J. Geiger, J. Pohjalainen, A. E.-D. Mousa, W. Jin,B. Schuller, Deep learning for environmentally robust speech recognition: An overview of recent developments, ACM Transactions onIntelligent Systems and Technology (TIST) 9 (5) (2018) 1–28.
[45] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, S. Y. Philip, A comprehensive survey on graph neural etworks, IEEE transactions on neuralnetworks and learning systems 32 (1) (2020) 4–24.
[46] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutionalnetworks, in: European conference on computer vision, Springer, 2014,pp. 818–833.
[47] M. Oquab, L. Bottou, I. Laptev, J. Sivic, Learning and transferringmid-level image representations using convolutional neural networks,in: Proceedings of the IEEE conference on computer vision and patternrecognition, 2014, pp. 1717–1724.
[48] M. G¨obel, T. Hassan, E. Oro, G. Orsi, Icdar 2013 table competition,in: 2013 12th International Conference on Document Analysis andRecognition, IEEE, 2013, pp. 1449–1453.
[49] L. Gao, X. Yi, Z. Jiang, L. Hao, Z. Tang, Icdar2017 competition onpage object detection, in: 2017 14th IAPR International Conferenceon Document Analysis and Recognition (ICDAR), Vol. 1, IEEE, 2017,pp. 1417–1422.
[50] L. Gao, Y. Huang, H. D´ejean, J.-L. Meunier, Q. Yan, Y. Fang, F. Kleber, E. Lang, Icdar 2019 competition on table detection and recognition(ctdar), in: 2019 International Conference on Document Analysis andRecognition (ICDAR), IEEE, 2019, pp. 1510–1515.
[51] S. A. Siddiqui, I. A. Fateh, S. T. R. Rizvi, A. Dengel, S. Ahmed,Deeptabstr: deep learning based table structure recognition, in: 2019International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 1403–1409.
[52] Y. Deng, D. Rosenberg, G. Mann, Challenges in end-to-end neuralscientific table recognition, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 894–901.
[53] A. W. Harley, A. Ufkes, K. G. Derpanis, Evaluation of deep convolutional nets for document image classification and retrieval, in: 201513th International Conference on Document Analysis and Recognition
(ICDAR), IEEE, 2015, pp. 991–995.
[54] P. Riba, A. Dutta, L. Goldmann, A. Forn´es, O. Ramos, J. Llad´os,Table detection in invoice documents by graph neural networks, in:2019 International Conference on Document Analysis and Recognition
(ICDAR), IEEE, 2019, pp. 122–127.
[55] A. Mondal, P. Lipps, C. Jawahar, Iiit-ar-13k: a new dataset for graphical object detection in documents, in: International Workshop on Document Analysis Systems, Springer, 2020, pp. 216–230.
[56] W. Seo, H. I. Koo, N. I. Cho, Junction-based table detection in cameracaptured document images, International Journal on Document Analysis and Recognition (IJDAR) 18 (1) (2015) 47–57.
[57] A. Shahab, F. Shafait, T. Kieninger, A. Dengel, An open approachtowards the benchmarking of table structure recognition systems, in:Proceedings of the 9th IAPR International Workshop on Document
Analysis Systems, 2010, pp. 113–120.
[58] I. T. Phillips, User’s reference manual for the uw english/technical document image database iii, UW-III English/technical document imagedatabase manual (1996).
[59] J. Hu, R. Kashi, D. Lopresti, G. Nagy, G. Wilfong, Why table groundtruthing is hard, in: Proceedings of Sixth International Conference onDocument Analysis and Recognition, IEEE, 2001, pp. 129–133.
[60] J. Fang, X. Tao, Z. Tang, R. Qiu, Y. Liu, Dataset, ground-truth andperformance metrics for table detection evaluation, in: 2012 10th IAPRInternational Workshop on Document Analysis Systems, IEEE, 2012,
pp. 445–449.
[61] M. Li, L. Cui, S. Huang, F. Wei, M. Zhou, Z. Li, Tablebank: Tablebenchmark for image-based table detection and recognition, in: Proceedings of the 12th Language Resources and Evaluation Conference,
2020, pp. 1918–1925.
[62] N. Siegel, N. Lourie, R. Power, W. Ammar, Extracting scientific figureswith distantly supervised neural networks, in: Proceedings of the 18thACM/IEEE on joint conference on digital libraries, 2018, pp. 223–232.
[63] B. Smock, R. Pesala, R. Abraham, W. Redmond, Pubtables-1m: Towards comprehensive table extraction from unstructured documents,arXiv preprint arXiv:2110.00061 (2021).
[64] Z. Chi, H. Huang, H.-D. Xu, H. Yu, W. Yin, X.-L. Mao, Complicatedtable structure recognition, arXiv preprint arXiv:1908.04729 (2019).
[65] X. Zheng, D. Burdick, L. Popa, P. Zhong, N. X. R. Wang, Globaltable extractor (gte): A framework for joint table identification and cell structure recognition using visual context, Winter Conference forApplications in Computer Vision (WACV) (2021).
[66] X. Zhong, E. ShafieiBavani, A. Jimeno Yepes, Image-based table recognition: data, model, and evaluation, in: European Conference on Computer Vision, Springer, 2020, pp. 564–580.
[67] A. Abdallah, A. Berendeyev, I. Nuradin, D. Nurseitov, Tncr: Tablenet detection and classification dataset, Neurocomputing 473 (2022)79–97.
[68] A. Nassar, N. Livathinos, M. Lysak, P. Staar, Tableformer: Tablestructure understanding with transformers, in: Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition,
2022, pp. 4614–4623.
[69] P. Pyreddy, W. Croft, Tinti: A system for retrieval in text tables title2(1997).
[70] Y. Wangt, I. T. Phillipst, R. Haralick, Automatic table ground truthgeneration and a background-analysis-based table structure extractionmethod, in: Proceedings of Sixth International Conference on Document Analysis and Recognition, IEEE, 2001, pp. 528–532.
[71] M. A. Jahan, R. G. Ragel, Locating tables in scanned documents forreconstructing and republishing, in: 7th International Conference onInformation and Automation for Sustainability, IEEE, 2014, pp. 1–6.
[72] K. Itonori, Table structure recognition based on textblock arrangementand ruled line position, in: Proceedings of 2nd International Conferenceon Document Analysis and Recognition (ICDAR’93), IEEE, 1993, pp.765–768.
[73] S. Chandran, R. Kasturi, Structural recognition of tabulated data, in:Proceedings of 2nd International Conference on Document Analysisand Recognition (ICDAR’93), IEEE, 1993, pp. 516–519.
[74] T. Hassan, R. Baumgartner, Table recognition and understanding frompdf files, in: Ninth International Conference on Document Analysis andRecognition (ICDAR 2007), Vol. 2, IEEE, 2007, pp. 1143–1147.
[75] E. Oro, M. Ruffolo, Trex: An approach for recognizing and extractingtables from pdf documents, in: 2009 10th International Conference onDocument Analysis and Recognition, IEEE, 2009, pp. 906–910.
[76] A. Nurminen, Algorithmic extraction of data in tables in pdf documents, Master’s thesis (2013).
[77] J. Fang, P. Mitra, Z. Tang, C. L. Giles, Table header detection andclassification, in: Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
[78] G. Harit, A. Bansal, Table detection in document images using headerand trailer patterns, in: Proceedings of the Eighth Indian Conferenceon Computer Vision, Graphics and Image Processing, 2012, pp. 1–8.
[79] S. Tupaj, Z. Shi, C. H. Chang, H. Alam, Extracting tabular informationfrom text files, EECS Department, Tufts University, Medford, USA 1(1996).
[80] T. Kieninger, A. Dengel, The t-recs table recognition and analysissystem, in: International Workshop on Document Analysis Systems,Springer, 1998, pp. 255–270.
[81] F. Cesarini, S. Marinai, L. Sarti, G. Soda, Trainable table location indocument images, in: Object recognition supported by user interactionfor service robots, Vol. 3, IEEE, 2002, pp. 236–240.
[82] M. Fan, D. S. Kim, Table region detection on large-scale pdf files without labeled data, CoRR, abs/1506.08891 (2015).
[83] Y. Wang, J. Hu, A machine learning based approach for table detectionon the web, in: Proceedings of the 11th international conference onWorld Wide Web, 2002, pp. 242–250.
[84] T. Kasar, P. Barlas, S. Adam, C. Chatelain, T. Paquet, Learning todetect tables in scanned document images using line information, in:2013 12th International Conference on Document Analysis and Recognition, IEEE, 2013, pp. 1185–1189.
[85] A. C. e Silva, Learning rich hidden markov models in document analysis: Table location, in: 2009 10th International Conference on Document Analysis and Recognition, IEEE, 2009, pp. 843–847.
[86] S. Klampfl, K. Jack, R. Kern, A comparison of two unsupervised tablerecognition methods from digital scientific articles, D-Lib Magazine20 (11) (2014) 7.
[87] L. O’Gorman, The document spectrum for page layout analysis, IEEETransactions on pattern analysis and machine intelligence 15 (11)(1993) 1162–1173.
[88] F. Shafait, R. Smith, Table detection in heterogeneous documents, in:Proceedings of the 9th IAPR International Workshop on DocumentAnalysis Systems, 2010, pp. 65–72.
[89] D. He, S. Cohen, B. Price, D. Kifer, C. L. Giles, Multi-scale multi-taskfcn for semantic page segmentation and table detection, in: 2017 14thIAPR International Conference on Document Analysis and Recognition
(ICDAR), Vol. 1, IEEE, 2017, pp. 254–261.
[90] S. Arif, F. Shafait, Table detection in document images using foreground and background features, in: 2018 Digital Image Computing:Techniques and Applications (DICTA), IEEE, 2018, pp. 1–8.
[91] M. M. Reza, S. S. Bukhari, M. Jenckel, A. Dengel, Table localizationand segmentation using gan and cnn, in: 2019 International Conferenceon Document Analysis and Recognition Workshops (ICDARW), Vol. 5,IEEE, 2019, pp. 152–157.
[92] M. Agarwal, A. Mondal, C. Jawahar, Cdec-net: Composite deformablecascade network for table detection in document images, in: 2020 25thInternational Conference on Pattern Recognition (ICPR), IEEE, 2021,
pp. 9491–9498.
[93] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov,S. Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229.
[94] J. Li, Y. Xu, T. Lv, L. Cui, C. Zhang, F. Wei, Dit: Selfsupervised pre-training for document image ransformer, arXivpreprint arXiv:2203.02378 (2022).
[95] D. Prasad, A. Gadpal, K. Kapadni, M. Visave, K. Sultanpure, Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents, in: Proceedings of theIEEE/CVF conference on computer vision and pattern recognitionworkshops, 2020, pp. 572–573.
[96] S. S. Paliwal, D. Vishwanath, R. Rahul, M. Sharma, L. Vig, Tablenet:Deep learning model for end-to-end table detection and tabular dataextraction from scanned document images, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019,pp. 128–133.
[97] Y. Huang, Q. Yan, Y. Li, Y. Chen, X. Wang, L. Gao, Z. Tang, Ayolo-based table detection method, in: 2019 International Conferenceon Document Analysis and Recognition (ICDAR), IEEE, 2019, pp.813–818.
[98] S. A. Siddiqui, M. I. Malik, S. Agne, A. Dengel, S. Ahmed, Decnt: Deepdeformable cnn for table detection, IEEE access 6 (2018) 74151–74161.
[99] N. Sun, Y. Zhu, X. Hu, Faster r-cnn based table detection combining corner locating, in: 2019 International Conference on DocumentAnalysis and Recognition (ICDAR), IEEE, 2019, pp. 1314–1319.
[100] I. Kavasidis, C. Pino, S. Palazzo, F. Rundo, D. Giordano, P. Messina,C. Spampinato, A saliency-based convolutional neural network for tableand chart detection in digitized documents, in: International conference on image analysis and processing, Springer, 2019, pp. 292–302.
[101] M. Holeˇcek, A. Hoskovec, P. Baudiˇs, P. Klinger, Table understanding instructured documents, in: 2019 International Conference on DocumentAnalysis and Recognition Workshops (ICDARW), Vol. 5, IEEE, 2019,
pp. 158–164.
[102] A. Casado-Garc´ıa, C. Dom´ınguez, J. Heras, E. Mata, V. Pascual, The ´benefits of close-domain fine-tuning for table detection in documentimages, in: International workshop on document analysis systems,
Springer, 2020, pp. 199–215.
[103] X. Zheng, D. Burdick, L. Popa, X. Zhong, N. X. R. Wang, Globaltable extractor (gte): A framework for joint table identification andcell structure recognition using visual context, in: Proceedings of theIEEE/CVF winter conference on applications of computer vision, 2021,pp. 697–706.
[104] Y. Li, L. Gao, Z. Tang, Q. Yan, Y. Huang, A gan-based feature generator for table detection, in: 2019 International Conference on DocumentAnalysis and Recognition (ICDAR), IEEE, 2019, pp. 763–768.
[105] D.-D. Nguyen, Tablesegnet: a fully convolutional network for tabledetection and segmentation in document images, International Journalon Document Analysis and Recognition (IJDAR) 25 (1) (2022) 1–14.
[106] D. Zhang, R. Mao, R. Guo, Y. Jiang, J. Zhu, Yolo-table: disclosure document table detection with nvolution, International Journal on Document Analysis and Recognition (IJDAR) (2022) 1–14.
[107] A. Zucker, Y. Belkada, H. Vu, V. N. Nguyen, Clusti: Clustering methodfor table structure recognition in scanned images, Mobile Networks andApplications 26 (4) (2021) 1765–1776.
[108] Z. Zhang, J. Zhang, J. Du, F. Wang, Split, embed and merge: Anaccurate table structure recognizer, Pattern Recognition 126 (2022)108565.
[109] M. Namysl, A. M. Esser, S. Behnke, J. K¨ohler, Flexible table recognition and semantic interpretation system., in: VISIGRAPP (4: VISAPP), 2022, pp. 27–37.
[110] E. Koci, M. Thiele, W. Lehner, O. Romero, Table recognition in spreadsheets via a graph representation, in: 2018 13th IAPR InternationalWorkshop on Document Analysis Systems (DAS), IEEE, 2018, pp.139–144.
[111] E. Koci, M. Thiele, O. Romero, W. Lehner, A genetic-based searchfor adaptive table recognition in spreadsheets, in: 2019 InternationalConference on Document Analysis and Recognition (ICDAR), IEEE,
2019, pp. 1274–1279.
[112] S. A. Siddiqui, P. I. Khan, A. Dengel, S. Ahmed, Rethinking semanticsegmentation for table structure recognition in documents, in: 2019International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 1397–1402.
[113] S. A. Khan, S. M. D. Khalid, M. A. Shahzad, F. Shafait, Table structure extraction with bi-directional gated recurrent unit networks, in:2019 International Conference on Document Analysis and Recognition(ICDAR), IEEE, 2019, pp. 1366–1371.
[114] S. F. Rashid, A. Akmal, M. Adnan, A. A. Aslam, A. Dengel, Tablerecognition in heterogeneous documents using machine learning, in:2017 14th IAPR International conference on document analysis and
recognition (ICDAR), Vol. 1, IEEE, 2017, pp. 777–782.
[115] S. R. Qasim, H. Mahmood, F. Shafait, Rethinking table recognition using graph neural networks, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 142–147.
[116] S. Raja, A. Mondal, C. Jawahar, Table structure recognition usingtop-down and bottom-up cues, in: European Conference on ComputerVision, Springer, 2020, pp. 70–86.
[117] Y. Zou, J. Ma, A deep semantic segmentation model for image-basedtable structure recognition, in: 2020 15th IEEE International Conference on Signal Processing (ICSP), Vol. 1, IEEE, 2020, pp. 274–280.
[118] K. A. Hashmi, D. Stricker, M. Liwicki, M. N. Afzal, M. Z. Afzal, Guidedtable structure recognition through anchor optimization, IEEE Access9 (2021) 113521–113534.
[119] W. Xue, Q. Li, D. Tao, Res2tim: Reconstruct syntactic structures fromtable images, in: 2019 International Conference on Document Analysisand Recognition (ICDAR), IEEE, 2019, pp. 749–755.
[120] C. Tensmeyer, V. I. Morariu, B. Price, S. Cohen, T. Martinez, Deepsplitting and merging for table structure decomposition, in: 2019 International Conference on Document Analysis and Recognition (ICDAR),
IEEE, 2019, pp. 114–121.
[121] S. Raja, A. Mondal, C. Jawahar, Visual understanding of complex tablestructures from document images, in: Proceedings of the IEEE/CVFWinter Conference on Applications of Computer Vision, 2022, pp.2299–2308.
[122] X. Shen, L. Kong, Y. Bao, Y. Zhou, W. Liu, Rcanet: A rows andcolumns aggregated network for table structure recognition, in: 20223rd Information Communication Technologies Conference (ICTC),IEEE, 2022, pp. 112–116.
[123] C. Ma, W. Lin, L. Sun, Q. Huo, Robust table detection and structure recognition from heterogeneous document images, arXiv preprintarXiv:2203.09056 (2022).
[124] B. Xiao, M. Simsek, B. Kantarci, A. A. Alkheir, Table structure recognition with conditional attention, arXiv preprint arXiv:2203.03819(2022).
[125] A. Jain, S. Paliwal, M. Sharma, L. Vig, Tsr-dsaw: Table structure recognition via deep spatial association of words, arXiv preprintarXiv:2203.06873 (2022).
[126] H. Li, L. Zeng, W. Zhang, J. Zhang, J. Fan, M. Zhang, A two-phase approach for recognizing tables with complex structures, in: InternationalConference on Database Systems for Advanced Applications, Springer,
2022, pp. 587–595.
[127] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards realtime object detection with region proposal networks, arXiv preprintarXiv:1506.01497 (2015).
[128] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, 2017 IEEEInternational Conference on Computer Vision (ICCV) (Oct 2017).
[129] K. Sun, B. Xiao, D. Liu, J. Wang, Deep high-resolution representationlearning for human pose estimation, in: CVPR, 2019.
[130] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang,W. Liu, J. Wang, High-resolution representations for labeling pixelsand regions, CoRR abs/1904.04514 (2019).
[131] A. Newell, K. Yang, J. Deng, Stacked hourglass networks for humanpose estimation, in: European conference on computer vision, Springer,2016, pp. 483–499.
[132] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, B. Schiele,Deepercut: A deeper, stronger, and faster multi-person pose estimationmodel, in: European Conference on Computer Vision, Springer, 2016,pp. 34–50.
[133] B. Xiao, H. Wu, Y. Wei, Simple baselines for human pose estimationand tracking, in: Proceedings of the European conference on computervision (ECCV), 2018, pp. 466–481.
[134] W. Yang, S. Li, W. Ouyang, H. Li, X. Wang, Learning feature pyramidsfor human pose estimation, in: proceedings of the IEEE internationalconference on computer vision, 2017, pp. 1281–1290.
[135] H. Zhang, C. Wu, Z. Zhang, Y. Zhu, Z. Zhang, H. Lin, Y. Sun, T. He,J. Muller, R. Manmatha, M. Li, A. Smola, Resnest: Split-attentionnetworks, arXiv preprint arXiv:2004.08955 (2020).
[136] H. Zhang, H. Chang, B. Ma, N. Wang, X. Chen, Dynamic R-CNN:Towards high quality object detection via dynamic training, arXiv
preprint arXiv:2004.06002 (2020).
[137] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng,Z. Liu, J. Xu, et al., Mmdetection: Open mmlab detection toolbox andbenchmark, arXiv preprint arXiv:1906.07155 (2019).
[138] S. Wu, J. Yang, X. Wang, X. Li, Iou-balanced loss functions for singlestage object detection, arXiv preprint arXiv:1908.05641 (2019).