在之前半自动倾斜校正的基础上,继续完成了行分割和字分割。
从wotsit.org查阅了一下BMP文件格式,简单摘录如下:
BMP文件包含四个部分:File header、Infomation header、Color table、Data array。
File header:
0000H TYPE 2Bytes 'BM':Windows;others:OS/2
0002H File Size 1DWord File Size
0006H Reserve 1DWord must be 0
000AH OffsetBits 1DWord Offset Bits to Data
InfoHeader:
000EH Header Size 1DWord '28H':Windows;others:OS/2
0012H Width 1DWord Width by Pixel
0016H Height 1DWord Height by Pixel
001AH Planes 1Word Always be 1
001CH Bits/Pix 1Word 1/4/8/16/32/24/32
001EH Compression 1DWord 0/1/2/3
0022H Data Size 1DWord by Bytes,must be 4*
0026H HResolution 1DWord aclinic DPI by pixel/m
002AH VResolution 1DWord vertical DPI by pixel/m
002EH Colors 1DWord 0 for all used
0032H Impotant 1DWord 0 for all significant
Color Table:
00XXH Palette N*4 Byte R/G/B/0
Data Array:
00XXH Data xxx Real Data
从wotsit.org查阅了一下BMP文件格式,简单摘录如下:
BMP文件包含四个部分:File header、Infomation header、Color table、Data array。
File header:
0000H TYPE 2Bytes 'BM':Windows;others:OS/2
0002H File Size 1DWord File Size
0006H Reserve 1DWord must be 0
000AH OffsetBits 1DWord Offset Bits to Data
InfoHeader:
000EH Header Size 1DWord '28H':Windows;others:OS/2
0012H Width 1DWord Width by Pixel
0016H Height 1DWord Height by Pixel
001AH Planes 1Word Always be 1
001CH Bits/Pix 1Word 1/4/8/16/32/24/32
001EH Compression 1DWord 0/1/2/3
0022H Data Size 1DWord by Bytes,must be 4*
0026H HResolution 1DWord aclinic DPI by pixel/m
002AH VResolution 1DWord vertical DPI by pixel/m
002EH Colors 1DWord 0 for all used
0032H Impotant 1DWord 0 for all significant
Color Table:
00XXH Palette N*4 Byte R/G/B/0
Data Array:
00XXH Data xxx Real Data
读入所有头信息,判断文件合法性并记录长、宽、偏移等,暂时只处理单色位图,逐bit读入数据并存入数组。
行分割:每行数据叠加,噪音系数在6(经验值)以下的判为空行,在临界处画线并纪录;
字分割:每文字行的垂直方向像素累加,噪音系数在1以下的判为间隔,在临界处画线并纪录;
不足之处:噪音与“一”混淆,似乎不可解决,(其他OCR软件也有类似错误);”判为'',这个似乎可以后期处理掉;
下一步工作:继续判断8位(256色)图像,加强鲁棒性。
至此,小鑫的毕设已经基本完成,做个界面就可以交工了,用时5小时~~Good For Me~~
=================================
原文时间:2005.05.23
原文地址:http://mnky.bokee.com/1621207.html