Table detection based on the MASK-RCNN

最新推荐文章于 2024-06-19 09:32:14 发布

李伯爵的指间沙

最新推荐文章于 2024-06-19 09:32:14 发布

阅读量810

点赞数

分类专栏： CV English blog Python 文章标签：深度学习表格识别

你若不肯低头，谁能为你戴上桂冠

本文链接：https://blog.csdn.net/m0_37690102/article/details/106355462

版权

English blog 同时被 3 个专栏收录

90 篇文章 4 订阅

订阅专栏

Python

51 篇文章 2 订阅

订阅专栏

50 篇文章 1 订阅

订阅专栏

Recently I have been testing this part of the table，I now use the traditional method of table detection.The traditional method can also achieve a certain degree for the table, but the detection accuracy is not so good.
Based on TableBank of 400,000 table data sets, MaskRCNN was used for table detection.Let's start with a brief introduction to TableBank,you can refer to the :https://github.com/doc-analysis/TableBank
TableBank: a high-quality annotated table dataset

Although it is easy for a human to visually identify a table, it is not easy for a machine to determine what a table is and how it relates to what is in it, because of its various layouts and patterns.Traditional rule-based table recognition method requires a lot of manual operation in the document background once a document is changed.However, the existing machine learning methods cannot obtain a large amount of effective annotation data, so it is difficult to support the application in the actual scenario.So TableBank came into being.
TableBank is a data set for table detection and recognition, based on public, large-scale Word documents and LaTex documents, created by weak supervision.Different from the traditional weak supervision training set, TableBank not only has high data quality, but also its data scale is several orders of magnitude larger than the previous manually labeled table analysis data set, with its table data volume reaching 417,000.
For a machine to read a table, however, it must first be able to identify which tables are in the document and then identify the information in the table area.

MaskRCNN

Open source code: https://github.com/matterport/Mask_RCNN/
Mask RCNN followed the idea of Faster RCNN and adopted the resnet-fpn architecture for feature extraction. In addition, another Mask prediction branch was added.It can be seen that Mask RCNN integrated many previous excellent research results.
Was simply amazing Mask RCNN https://zhuanlan.zhihu.com/p/37998710

Configure the environment

Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
keras-resnet==0.2.0
keras-retinanet==0.5.0
numpy==1.16.4
opencv-python==3.4.2.16
tensorboard==1.13.1
tensorflow==1.14.0rc0
tensorflow-estimator==1.14.0rc0
scikit-image==0.15.0

The experiment：

Network parameters：train

BACKBONE                       resnet50
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     4
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 4
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  512
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  512
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [512 512   3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           model_cfg
NUM_CLASSES                    2
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                44
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001

The training process is as follows:

 1/44 [..............................] - ETA: 52:14 - loss: 4.8811 - rpn_class_loss: 0.3806 - rpn_bbox_loss: 1.2823 - mrcnn_class_loss: 1.6889 - mrcnn_bbox_loss: 0.6109 - mrcnn_mask_loss: 0.91852020-05-26 15:31:36.051069: W tensorflow/core/framework/allocator.cc:124] Allocation of 642252800 exceeds 10% of system memory.
 2/44 [>.............................] - ETA: 48:26 - loss: 4.4976 - rpn_class_loss: 0.3775 - rpn_bbox_loss: 1.1399 - mrcnn_class_loss: 1.4792 - mrcnn_bbox_loss: 0.5925 - mrcnn_mask_loss: 0.90842020-05-26 15:32:41.443471: W tensorflow/core/framework/allocator.cc:124] Allocation of 642252800 exceeds 10% of system memory.
 3/44 [=>............................] - ETA: 46:47 - loss: 3.7849 - rpn_class_loss: 0.2552 - rpn_bbox_loss: 0.7948 - mrcnn_class_loss: 1.2611 - mrcnn_bbox_loss: 0.5845 - mrcnn_mask_loss: 0.8893
 4/44 [=>............................] - ETA: 45:07 - loss: 3.7246 - rpn_class_loss: 0.2800 - rpn_bbox_loss: 0.8946 - mrcnn_class_loss: 1.0985 - mrcnn_bbox_loss: 0.5910 - mrcnn_mask_loss: 0.8605
 5/44 [==>...........................] - ETA: 43:22 - loss: 3.2908 - rpn_class_loss: 0.2241 - rpn_bbox_loss: 0.7241 - mrcnn_class_loss: 0.9475 - mrcnn_bbox_loss: 0.5743 - mrcnn_mask_loss: 0.8208
 6/44 [===>..........................] - ETA: 41:55 - loss: 3.2450 - rpn_class_loss: 0.2388 - rpn_bbox_loss: 0.7912 - mrcnn_class_loss: 0.8650 - mrcnn_bbox_loss: 0.5775 - mrcnn_mask_loss: 0.7725
 7/44 [===>..........................] - ETA: 40:45 - loss: 2.9766 - rpn_class_loss: 0.2049 - rpn_bbox_loss: 0.6914 - mrcnn_class_loss: 0.7875 - mrcnn_bbox_loss: 0.5714 - mrcnn_mask_loss: 0.7213
 8/44 [====>.........................] - ETA: 39:19 - loss: 2.9930 - rpn_class_loss: 0.2412 - rpn_bbox_loss: 0.7782 - mrcnn_class_loss: 0.7355 - mrcnn_bbox_loss: 0.5679 - mrcnn_mask_loss: 0.6701
 9/44 [=====>........................] - ETA: 38:10 - loss: 2.7740 - rpn_class_loss: 0.2145 - rpn_bbox_loss: 0.6978 - mrcnn_class_loss: 0.6814 - mrcnn_bbox_loss: 0.5563 - mrcnn_mask_loss: 0.6240
10/44 [=====>........................] - ETA: 37:07 - loss: 2.6909 - rpn_class_loss: 0.2095 - rpn_bbox_loss: 0.7136 - mrcnn_class_loss: 0.6379 - mrcnn_bbox_loss: 0.5468 - mrcnn_mask_loss: 0.5831
11/44 [======>.......................] - ETA: 35:59 - loss: 2.5332 - rpn_class_loss: 0.1905 - rpn_bbox_loss: 0.6543 - mrcnn_class_loss: 0.6021 - mrcnn_bbox_loss: 0.5389 - mrcnn_mask_loss: 0.5475
12/44 [=======>......................] - ETA: 34:55 - loss: 2.4562 - rpn_class_loss: 0.1817 - rpn_bbox_loss: 0.6503 - mrcnn_class_loss: 0.5760 - mrcnn_bbox_loss: 0.5301 - mrcnn_mask_loss: 0.5180

2. The directory structure：

3.Test the recognition.We used our own data set and ICDAR2013 data set for testing.

I added the red line myself.The accuracy of the table recognition is very good,however, the box line does not contain all the parts of the table.The table can be accurately positioned, but it will have a great impact on the subsequent table segmentation.I need to achieve the following effect:

This requires the detection accuracy of the network model to be very high. Next, we need to improve the detection accuracy of the network model to meet my requirements.Aiming at this problem, I designed an algorithm that can solve this problem perfectly

If you want my trained model and code, you can leave me a message

I hope I can help you,If you have any questions, please comment on this blog or send me a private message. I will reply in my free time.

李伯爵的指间沙

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Table detection based on the MASK-RCNN

Recently I have been testing this part of the table，I now use the traditional method of table detection.The traditional method can also achieve a certain degree for the table, but the detection accuracy is not so good. Based on TableBank of 400,000 ..
复制链接

扫一扫

专栏目录