python图片表格提取算法_从扫描的文档opencv python中提取行表

最新推荐文章于 2024-03-10 15:22:03 发布

weixin_39858245

最新推荐文章于 2024-03-10 15:22:03 发布

阅读量542

点赞数

文章标签： python图片表格提取算法

I want to extract the information from a scanned table and store it a csv. Right now my table extraction algorithm does the following steps.

Apply skew correction

Apply a gaussian filter for denoising.

Do a binarization using Otsu thresholding

Do a morphological opening.

Canny egde detection

Do a hough transform to obtain lines of table.

Remove duplicate lines( same lines in the range of 10 pixels)

filter the horizontal and vertical lines using slope of line(slope should be less than +/-5 degree for horizontal and normal of verticals).

This algorithm is working fine for digital born pdfs and most of the scanned documents. But, Some of the documents have a noisy table and thus its not identifying the lines correctly.

Here is a sample image in which my algorithm fails.

These are the operations I am doing on this table.

1.Gaussian blur

2.Otsu thresholding

3.Morphological opening

4.Canny edge detection

5.filtered lines,as you can see the lines are clearly not identified

correctly.

Can anyone please suggest better method for extracting horizontal and vertical lines from this kind of less quality scans.

Thanks in advance!!

解决方案

Here,We are doing morphological transformations using a vertical kernel to detect vetical lines and horizontal kernel to detect horizontal lines and then combining them to get all the required lines.

Vertical lines