- 定义:
- 表格检测(Table Detection)任务是从一个页面中检测出表格所在的区域
- 表格结构识别(Table Structure Recognition)任务则是在检测到的表格区域的基础上,进一步将表格的内容与逻辑结构识别出来
- 代码:
- 运用unet实现对文档表格的自动检测,表格重建:https://github.com/chineseocr/table-ocr
- 完整印刷体表格解决方案:https://github.com/Rid7/Table-OCR
- 数据集:
名称 | 说明 | 内容 | 量级 | 地址 |
ICDAR2013 | | 美国政府文件和欧盟文件 | ||
icdar2017页面对象识别 | 页面截图 | |||
ctdar2019 | 分为两类数据,历史文档和现在文档 | |||
TABLE2LATEX-450K | latex | 46.6万 | https://github.com/bloomberg/TABLE2LATEX | |
DECO | 电子表格 | 1165 | DECO: A Dataset of Annotated Spreadsheets for Layout and Table Recognition | Database Systems Group | |
第三方个人数据 | 扫描英文表格检测 | 403 |
- 论文:
- ICDAR2019会议中,共有16篇与表格识别相关的论文
- 其中5篇针对表格检测任务
- 8篇针对表格结构识别任务
- 1篇在同时进行了表格检测与结构识别的任务
- 2篇则是发布了新的表格识别相关的数据集
任务 | 论文名称 | 说明 | 作者 | 代码 | 数据 |
识别 | A Genetic-based Search for Adaptive Table Recognition in Spreadsheets | 传统图像,应用于excel截图 | |||
识别 | Deep Splitting and Merging for Table Structure Decomposition | ICDAR2013表格竞赛表格结构识别子任务的数据集State-of-the-art | adobe研究院 | ||
识别 | DeepTabStr:Deep Learning based Table Structure Recognition | ||||
识别 | ReS2TIM: Reconstruct SyntacticStructures from Table Images | icdar2013 f1 0.74 | |||
识别 | Rethinking Semantic Segmentationfor Table Structure Recognition in Documents | 不可处理跨行跨列 | |||
识别 | Rethinking Table Recognitionusing Graph Neural Networks | 有框线无框线表格均可处理 没有提供预训练模型 | 合成,提供数据生产工具 | ||
识别 | TableStructure Extraction with Bi-directional Gated Recurrent Unit Networks | ||||
端到端检测识别 | TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images | icdar2013检测和识别F1分别为96.62%和91.51% | |||
检测 | A GAN-based Feature Generator forTable Detection | ICDAR13/17 state-of-the-art | 北京大学王选计算机研究所 | ||
检测 | A YOLO-based Table Detection Method | ||||
检测 | Faster R-CNN BasedTable Detection Combining Corner Locating | ICDAR2017 POD数据集 | |||
检测 | Table Detection in Invoice Documents by Graph Neural Networks | 取自 RVL-CDIP invoice data | |||
端到端 | CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents | GitHub - DevashishPrasad/CascadeTabNet: This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents" |