题目:Segmentationof historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths
作者:Nikos Nikolaou a,b,*, Michael Makridis a, Basilis Gatos b, NikolaosStamatopoulos b, Nikos Papamarkos
期刊:Imageand Vision Computing,2010
自己的批注放在【】里,用红色标出
1 introduction
(1)把印刷历史文档分为7类,作为实验和评估中的条件:
1.Multi column documents.
2.Noisy documents.
3.Documents with non-constant spaces between text lines, words and characters.
4.Documents with marginal text.
5.Documents in which various font sizes coexist.
6.Documents with ornamental characters and graphical illustrations.
7.Documents whose text is warped and/or skewed
(2)常用于当代文档的分割方法有projectionprofiles, run length smoothing algorithm(RLSA)。
(3)本文提出以下创新:
(i)use of a novel Adaptive Run Length Smoothing Algorithm (ARLSA) in order to facethe problem of complex and dense document layout,
自适应RLSA解决复杂且稠密的版面
(ii)detection of noisy areas and punctuation marks that are usual in historicalmachine-printed documents,
检测噪声区域和标点符号
(iii)detection of possible obstacles formed from background areas in order toseparate neighboring text columns or text lines,
从背景区域中检测出分割文本列或文本行的障碍区
(iv)use of skeleton segmentation paths in order to isolate possible connectedcharacters.
骨骼分割路径分割可能相连的字符