-
Recently I have been testing this part of the table,I now use the traditional method of table detection.The traditional method can also achieve a certain degree for the table, but the detection accuracy is not so good.
-
Based on TableBank of 400,000 table data sets, MaskRCNN was used for table detection.Let's start with a brief introduction to TableBank,you can refer to the :https://github.com/doc-analysis/TableBank
-
TableBank: a high-quality annotated table dataset
- Although it is easy for a human to visually identify a table, it is not easy for a machine to determine what a table is and how it relates to what is in it, because of its various layouts and patterns.Traditional rule-based table recognition method requires a lot of manual operation in the document background once a document is changed.However, the existing machine learning methods cannot obtain a large amount of effective annotation data, so it is difficult to support the application in the actual scenario.So TableBank came into being.
- TableBank is a data set for table detection and recognition, based on public, large-scale Word documents and LaTex documents, created by weak supervision.Different from the traditional weak supervision training set, TableBank not only has high data quality, but also its data scale is several orders of magnitude larger than the previous manually labeled table analysis data set, with its table data volume reaching 417,000.
- For a machine to read a table, however, it must first be able to identify which tables are in the document and then identify the information in the table area.
- MaskRCNN
- Open source code: https://github.com/matterport/Mask_RCNN/
- Mask RCNN followed the idea of Faster RCNN and adopted the resnet-fpn architecture for feature extraction. In addition, another Mask prediction branch was added.It can be seen that Mask RCNN integrated many previous excellent research results.
- Was simply amazing Mask RCNN https://zhuanlan.zhihu.com/p/37998710
- Configure the environment
- Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
keras-resnet==0.2.0
keras-retinanet==0.5.0 - numpy==1.16.4
- opencv-python==3.4.2.16
- tensorboard==1.13.1
tensorflow==1.14.0rc0
tensorflow-estimator==1.14.0rc0 - scikit-image==0.15.0
- The experiment:
- Network parameters:train
BACKBONE resnet50
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 4
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 4
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 512
IMAGE_META_SIZE 14
IMAGE_MIN_DIM 512
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [512 512 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME model_cfg
NUM_CLASSES 2
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 44
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001
The training process is as follows:
1/44 [..............................] - ETA: 52:14 - loss: 4.8811 - rpn_class_loss: 0.3806 - rpn_bbox_loss: 1.2823 - mrcnn_class_loss: 1.6889 - mrcnn_bbox_loss: 0.6109 - mrcnn_mask_loss: 0.91852020-05-26 15:31:36.051069: W tensorflow/core/framework/allocator.cc:124] Allocation of 642252800 exceeds 10% of system memory.
2/44 [>.............................] - ETA: 48:26 - loss: 4.4976 - rpn_class_loss: 0.3775 - rpn_bbox_loss: 1.1399 - mrcnn_class_loss: 1.4792 - mrcnn_bbox_loss: 0.5925 - mrcnn_mask_loss: 0.90842020-05-26 15:32:41.443471: W tensorflow/core/framework/allocator.cc:124] Allocation of 642252800 exceeds 10% of system memory.
3/44 [=>............................] - ETA: 46:47 - loss: 3.7849 - rpn_class_loss: 0.2552 - rpn_bbox_loss: 0.7948 - mrcnn_class_loss: 1.2611 - mrcnn_bbox_loss: 0.5845 - mrcnn_mask_loss: 0.8893
4/44 [=>............................] - ETA: 45:07 - loss: 3.7246 - rpn_class_loss: 0.2800 - rpn_bbox_loss: 0.8946 - mrcnn_class_loss: 1.0985 - mrcnn_bbox_loss: 0.5910 - mrcnn_mask_loss: 0.8605
5/44 [==>...........................] - ETA: 43:22 - loss: 3.2908 - rpn_class_loss: 0.2241 - rpn_bbox_loss: 0.7241 - mrcnn_class_loss: 0.9475 - mrcnn_bbox_loss: 0.5743 - mrcnn_mask_loss: 0.8208
6/44 [===>..........................] - ETA: 41:55 - loss: 3.2450 - rpn_class_loss: 0.2388 - rpn_bbox_loss: 0.7912 - mrcnn_class_loss: 0.8650 - mrcnn_bbox_loss: 0.5775 - mrcnn_mask_loss: 0.7725
7/44 [===>..........................] - ETA: 40:45 - loss: 2.9766 - rpn_class_loss: 0.2049 - rpn_bbox_loss: 0.6914 - mrcnn_class_loss: 0.7875 - mrcnn_bbox_loss: 0.5714 - mrcnn_mask_loss: 0.7213
8/44 [====>.........................] - ETA: 39:19 - loss: 2.9930 - rpn_class_loss: 0.2412 - rpn_bbox_loss: 0.7782 - mrcnn_class_loss: 0.7355 - mrcnn_bbox_loss: 0.5679 - mrcnn_mask_loss: 0.6701
9/44 [=====>........................] - ETA: 38:10 - loss: 2.7740 - rpn_class_loss: 0.2145 - rpn_bbox_loss: 0.6978 - mrcnn_class_loss: 0.6814 - mrcnn_bbox_loss: 0.5563 - mrcnn_mask_loss: 0.6240
10/44 [=====>........................] - ETA: 37:07 - loss: 2.6909 - rpn_class_loss: 0.2095 - rpn_bbox_loss: 0.7136 - mrcnn_class_loss: 0.6379 - mrcnn_bbox_loss: 0.5468 - mrcnn_mask_loss: 0.5831
11/44 [======>.......................] - ETA: 35:59 - loss: 2.5332 - rpn_class_loss: 0.1905 - rpn_bbox_loss: 0.6543 - mrcnn_class_loss: 0.6021 - mrcnn_bbox_loss: 0.5389 - mrcnn_mask_loss: 0.5475
12/44 [=======>......................] - ETA: 34:55 - loss: 2.4562 - rpn_class_loss: 0.1817 - rpn_bbox_loss: 0.6503 - mrcnn_class_loss: 0.5760 - mrcnn_bbox_loss: 0.5301 - mrcnn_mask_loss: 0.5180
2. The directory structure:
3.Test the recognition.We used our own data set and ICDAR2013 data set for testing.
I added the red line myself.The accuracy of the table recognition is very good,however, the box line does not contain all the parts of the table.The table can be accurately positioned, but it will have a great impact on the subsequent table segmentation.I need to achieve the following effect:
This requires the detection accuracy of the network model to be very high. Next, we need to improve the detection accuracy of the network model to meet my requirements.Aiming at this problem, I designed an algorithm that can solve this problem perfectly
If you want my trained model and code, you can leave me a message
I hope I can help you,If you have any questions, please comment on this blog or send me a private message. I will reply in my free time.