Table detection based on the MASK-RCNN

51 篇文章 2 订阅
50 篇文章 1 订阅
  • Recently I have been testing this part of the table,I now use the traditional method of table detection.The traditional method can also achieve a certain degree for the table, but the detection accuracy is not so good.

  • Based on TableBank of 400,000 table data sets, MaskRCNN was used for table detection.Let's start with a brief introduction to TableBank,you can refer to the :https://github.com/doc-analysis/TableBank

  • TableBank: a high-quality annotated table dataset

  1. Although it is easy for a human to visually identify a table, it is not easy for a machine to determine what a table is and how it relates to what is in it, because of its various layouts and patterns.Traditional rule-based table recognition method requires a lot of manual operation in the document background once a document is changed.However, the existing machine learning methods cannot obtain a large amount of effective annotation data, so it is difficult to support the application in the actual scenario.So TableBank came into being.
  2. TableBank is a data set for table detection and recognition, based on public, large-scale Word documents and LaTex documents, created by weak supervision.Different from the traditional weak supervision training set, TableBank not only has high data quality, but also its data scale is several orders of magnitude larger than the previous manually labeled table analysis data set, with its table data volume reaching 417,000.
  3. For a machine to read a table, however, it must first be able to identify which tables are in the document and then identify the information in the table area.
  • MaskRCNN 
  1. Open source code: https://github.com/matterport/Mask_RCNN/
  2. Mask RCNN followed the idea of Faster RCNN and adopted the resnet-fpn architecture for feature extraction. In addition, another Mask prediction branch was added.It can be seen that Mask RCNN integrated many previous excellent research results.
  3. Was simply amazing Mask RCNN  https://zhuanlan.zhihu.com/p/37998710
  • Configure the environment
  1. Keras==2.2.4
    Keras-Applications==1.0.6
    Keras-Preprocessing==1.0.5
    keras-resnet==0.2.0
    keras-retinanet==0.5.0
  2. numpy==1.16.4
  3. opencv-python==3.4.2.16
  4. tensorboard==1.13.1
    tensorflow==1.14.0rc0
    tensorflow-estimator==1.14.0rc0
  5. scikit-image==0.15.0
  • The experiment:
  1. Network parameters:train
BACKBONE                       resnet50
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     4
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 4
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  512
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  512
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [512 512   3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           model_cfg
NUM_CLASSES                    2
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                44
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001

 The training process is as follows:

 1/44 [..............................] - ETA: 52:14 - loss: 4.8811 - rpn_class_loss: 0.3806 - rpn_bbox_loss: 1.2823 - mrcnn_class_loss: 1.6889 - mrcnn_bbox_loss: 0.6109 - mrcnn_mask_loss: 0.91852020-05-26 15:31:36.051069: W tensorflow/core/framework/allocator.cc:124] Allocation of 642252800 exceeds 10% of system memory.
 2/44 [>.............................] - ETA: 48:26 - loss: 4.4976 - rpn_class_loss: 0.3775 - rpn_bbox_loss: 1.1399 - mrcnn_class_loss: 1.4792 - mrcnn_bbox_loss: 0.5925 - mrcnn_mask_loss: 0.90842020-05-26 15:32:41.443471: W tensorflow/core/framework/allocator.cc:124] Allocation of 642252800 exceeds 10% of system memory.
 3/44 [=>............................] - ETA: 46:47 - loss: 3.7849 - rpn_class_loss: 0.2552 - rpn_bbox_loss: 0.7948 - mrcnn_class_loss: 1.2611 - mrcnn_bbox_loss: 0.5845 - mrcnn_mask_loss: 0.8893
 4/44 [=>............................] - ETA: 45:07 - loss: 3.7246 - rpn_class_loss: 0.2800 - rpn_bbox_loss: 0.8946 - mrcnn_class_loss: 1.0985 - mrcnn_bbox_loss: 0.5910 - mrcnn_mask_loss: 0.8605
 5/44 [==>...........................] - ETA: 43:22 - loss: 3.2908 - rpn_class_loss: 0.2241 - rpn_bbox_loss: 0.7241 - mrcnn_class_loss: 0.9475 - mrcnn_bbox_loss: 0.5743 - mrcnn_mask_loss: 0.8208
 6/44 [===>..........................] - ETA: 41:55 - loss: 3.2450 - rpn_class_loss: 0.2388 - rpn_bbox_loss: 0.7912 - mrcnn_class_loss: 0.8650 - mrcnn_bbox_loss: 0.5775 - mrcnn_mask_loss: 0.7725
 7/44 [===>..........................] - ETA: 40:45 - loss: 2.9766 - rpn_class_loss: 0.2049 - rpn_bbox_loss: 0.6914 - mrcnn_class_loss: 0.7875 - mrcnn_bbox_loss: 0.5714 - mrcnn_mask_loss: 0.7213
 8/44 [====>.........................] - ETA: 39:19 - loss: 2.9930 - rpn_class_loss: 0.2412 - rpn_bbox_loss: 0.7782 - mrcnn_class_loss: 0.7355 - mrcnn_bbox_loss: 0.5679 - mrcnn_mask_loss: 0.6701
 9/44 [=====>........................] - ETA: 38:10 - loss: 2.7740 - rpn_class_loss: 0.2145 - rpn_bbox_loss: 0.6978 - mrcnn_class_loss: 0.6814 - mrcnn_bbox_loss: 0.5563 - mrcnn_mask_loss: 0.6240
10/44 [=====>........................] - ETA: 37:07 - loss: 2.6909 - rpn_class_loss: 0.2095 - rpn_bbox_loss: 0.7136 - mrcnn_class_loss: 0.6379 - mrcnn_bbox_loss: 0.5468 - mrcnn_mask_loss: 0.5831
11/44 [======>.......................] - ETA: 35:59 - loss: 2.5332 - rpn_class_loss: 0.1905 - rpn_bbox_loss: 0.6543 - mrcnn_class_loss: 0.6021 - mrcnn_bbox_loss: 0.5389 - mrcnn_mask_loss: 0.5475
12/44 [=======>......................] - ETA: 34:55 - loss: 2.4562 - rpn_class_loss: 0.1817 - rpn_bbox_loss: 0.6503 - mrcnn_class_loss: 0.5760 - mrcnn_bbox_loss: 0.5301 - mrcnn_mask_loss: 0.5180

 2. The directory structure:

3.Test the recognition.We used our own data set and ICDAR2013 data set for testing.

 I added the red line myself.The accuracy of the table recognition is very good,however, the box line does not contain all the parts of the table.The table can be accurately positioned, but it will have a great impact on the subsequent table segmentation.I need to achieve the following effect:

This requires the detection accuracy of the network model to be very high. Next, we need to improve the detection accuracy of the network model to meet my requirements.Aiming at this problem, I designed an algorithm that can solve this problem perfectly

If you want my trained model and code, you can leave me a message

​​​I hope I can help you,If you have any questions, please  comment on this blog or send me a private message. I will reply in my free time.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值