对于表单元的提取,通过对前面博客中对求得的MASK图像使用与操作,得到joints图像,通过对joints图像进行处理即可得到对应表交点的坐标,通过对坐标处理实现表单元的分割处理。我使用的是Vector来实现坐标的存储,通过erase方法实现相同值的提取,通过阈值处理实现表点坐标的提取。进行实现表单元的分割提取。
- For the extraction of form elements, I used and operated MASK images obtained in the previous blog to obtain joints images, and through the processing of JOINTS images, I obtained joints images for corresponding table intersection points, and realized the segmentation of form elements through the processing of coordinates.I used Vector to realize the storage of coordinates, erase method to achieve the extraction of the same value, and threshold processing to achieve the extraction of table point coordinates.Carry out the segmentation and extraction of form cells.
核心代码如下:
- the core of code as the following:
vector<int> Variable_Pixel_White_Y_OK;
vector<int> Variable_Pixel_White_X_OK;
vector<int> Variable_Pixel_White_Y;
vector<int> Variable_Pixel_White_X;
int pixel_white = 0;
int white_pixel_Y_last = 0;
for (int y = 0; y < joints.rows; y++)//rows
{
for (int x = 0; x < joints.cols; x++)//cols
{
pixel_white = joints.at<uchar>(y, x);
if (pixel_white == 255)//白色像素的位置
{
Variable_Pixel_White_Y.push_back(y);//row
Variable_Pixel_White_X.push_back(x);//col
}
}
}
if (Variable_Pixel_White_X.size() > 2 && Variable_Pixel_White_Y.size() > 2)
{
//========================================================================================================
sort(Variable_Pixel_White_X.begin(), Variable_Pixel_White_X.end());
Variable_Pixel_White_X.erase(unique(Variable_Pixel_White_X.begin(), Variable_Pixel_White_X.end()), Variable_Pixel_White_X.end());
for (unsigned int i = 0; i < Variable_Pixel_White_X.size() - 2; i++)
{
if ((Variable_Pixel_White_X[i + 2] - Variable_Pixel_White_X[i + 1]) - (Variable_Pixel_White_X[i + 1] - Variable_Pixel_White_X[i]) > 10)//
{
Variable_Pixel_White_X_OK.push_back(Variable_Pixel_White_X[i + 1]);
}
}
Variable_Pixel_White_X_OK.push_back(Variable_Pixel_White_X[Variable_Pixel_White_X.size() - 1]);
//========================================================================================================
//========================================================================================================
sort(Variable_Pixel_White_Y.begin(), Variable_Pixel_White_Y.end());
Variable_Pixel_White_Y.erase(unique(Variable_Pixel_White_Y.begin(), Variable_Pixel_White_Y.end()), Variable_Pixel_White_Y.end());
for (unsigned int i = 0; i < Variable_Pixel_White_Y.size() - 2; i++)
{
if ((Variable_Pixel_White_Y[i + 2] - Variable_Pixel_White_Y[i + 1]) - (Variable_Pixel_White_Y[i + 1] - Variable_Pixel_White_Y[i]) > 10)//
{
Variable_Pixel_White_Y_OK.push_back(Variable_Pixel_White_Y[i + 1]);
}
}
Variable_Pixel_White_Y_OK.push_back(Variable_Pixel_White_Y[Variable_Pixel_White_Y.size() - 1]);
//========================================================================================================
//========================================================================================================
cout << "cols:" << Variable_Pixel_White_X_OK.size() - 1 << endl;
cout << "rows:" << Variable_Pixel_White_Y_OK.size() - 1 << endl;
//========================================================================================================
//========================================================================================================
//------------------------------------------------>分割<--------------------------------------------------
int rect_x = 0, rect_y = 0;
int d_y = 0, d_num = 0, Abs = 0;
int d = 0, h = 0;
for (int i = 0; i < Variable_Pixel_White_Y_OK.size() - 1; i++)
{
for (int j = 0; j < Variable_Pixel_White_X_OK.size() - 1; j++)
{
//
d = Variable_Pixel_White_X_OK[j + 1] - Variable_Pixel_White_X_OK[j];
h = Variable_Pixel_White_Y_OK[i + 1] - Variable_Pixel_White_Y_OK[i];
rect_x = Variable_Pixel_White_X_OK[j];
rect_y = Variable_Pixel_White_Y_OK[i];
//(0 <= roi.x && 0 <= roi.width &&
//roi.x + roi.width <= m.cols &&
//0 <= roi.y && 0 <= roi.height &&
//roi.y + roi.height <= m.rows)
if (rect_x + 5 >= 0 &&
rect_y + 5 >= 0 &&
d - 10 >= 0 &&
h - 10 >= 0 &&
rect_x + 5 + d - 10 <= gray.cols &&
rect_y + 5 + h - 10 <= gray.rows)
{
Rect rect(rect_x + 5, rect_y + 5, d - 10, h - 10);
Mat ROI = gray(rect);
string Img_Name = "./title_time/roi/" + to_string(save_Img) + ".jpg";
save_Img++;
imwrite(Img_Name, ROI);
}
}
}
}
else
{
}
但是这种方法适用于常规的表格分割,对于含有合并单元格的不能这么处理,这部分有待继续研究。正在研究新的方法进行对合并单元的表提取分割。后续的工作基于本次代码基础之上实现,对于本次代码如何使用,可以参考我前几天的博客中表提取,将这部分代码加入其中就可以。在这里我就不全部列出了。
- However, this method is suitable for regular table, and it cannot be done for those containing merged cells. This part needs further study.A new method for table extraction and segmentation of merged cells is being studied.The subsequent work is based on this code. For how to use this code, please refer to the table extraction in my blog a few days ago and add this part of code into it.I won't list them all here.
I hope I can help you,If you have any questions, please comment on this blog or send me a private message. I will reply in my free time.