2020-10-10

OCR + Tabular Frame Solutions

可用的数据集

在这里插入图片描述

FinTab的数据组成形式

    1. Basic single-page table. This is the most basic type of table, which takes up less than one page and does
      not include merged cells. It is worth mentioning that we offer not only textual ground truth and structure information, but also the unit of the table, because mostly financial table contains quite a few numbers.
  1. Table with merged cells. In this case, the corresponding merged cells should be recovered.
    合并的cells。 这样的情况下,我们需要发现被合并的cells。

  2. Cross-page table. If the table appears to spread across pages, the cross-page table need to be merged into a
    single form. If the header of the two pages appears to be duplicated, only one needs to be remained. Page number and other useless information should also be removed.
    跨页面的表格。如果一个表格出现在两个连续的不同的页面,cross-page table需要被merged到同一个表格。如果两个页面的标头是重复的,那么只需要保留一个。页号和无用的信息应该被去除。

  3. Another difficult situation to be noticed is that if a single cell is separated by two pages, it should be merged into one according to its semantics.
    另外一种困难的情况是如果一个单cell被翻开成两个pages,他们应该根据语义被merged到一个cell中。

Baseline Methods

Figure 2 illustrates an overview of GFTE. Since our dataset is in Chinese, we give a translated version of the example in Table IV for better understanding. Given a table in PDF document, our method can be summarized into the following steps:
Fig2. 给了我们下列的方式步骤
在这里插入图片描述

  • a. We build its ground truth, which consists of
    (1) image of the table region
    (2) textual content,
    (3) text position
    (4) structure labels.

b. Then, we construct an undirected graph G = ⟨ V , R C ⟩ G = \langle V, RC \rangle G=V,RC on the cells. 建图的方式在以下。

c. Finally, we use our GCN-based algorithm to predict adjacent relations, including both vertical and horizontal relations. 最终我们用GCN来预测紧邻关系。

Problem Interpretation:

In a table recognition problem, it is quite natural to consider each cell in the table as a node. Then, the vertical or horizontal relation between a node and its neighbors can be understood as the feature of edges.
cell为点,近邻为点之间的连边的逻辑。

If we use N N N to denote the set of nodes and E C E_C EC to denote the fully connected edges, then a table structure can be represented by a complete graph G = ⟨ V , R C ⟩ G = \langle V, R_C \rangle G=V,RC, where R C R_C RC indicates
a set of relations between EC . More specifically, we have R C = E C × { v e r t i c a l , h o r i z o n t a l , u n r e l a t e d } . R_C = EC \times \{vertical, horizontal, unrelated\}. RC=EC×{vertical,horizontal,unrelated}.

主要的观点是如何分类边。边不只是edges,是edges乘上关系。(暂时使用这种方式去定义。)

为了模型这个table,我们只需要用到k紧邻即可.

分类边的问题Kipf的解释: https://github.com/tkipf/gcn/issues/106

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值