我不得不清除一些来自OCR的输入,它将手写识别为乱码。任何建议正则表达式来清理随机字符?例如:正则表达式来代替乱码
Federal prosecutors on Monday charged a Miami man with the largest
case of credit and debit card data theft ever in the United States,
accusing the one-time government informant of swiping 130 million
accounts on top of 40 million he stole previously.
, ':, Ie
':... 11'1
. '(.. ~!' ': f I I
. " .' I ~
I' ,11 l
I I I ~ \ :' ,! .~ , .. r, 1 , ~ I . I' , .' I ,.
, i
I ; J . I.' ,.\) ..
. : I
'I', I
.' '
r,"
Gonzalez is a former informant for the U.S. Secret Service who helped
the agency hunt hackers, authorities say. The agency later found out that
he had also been working with criminals and feeding them information
on ongoing investigations, even warning off at least one individual,
according to authorities.
eh....l
~.\O ::t
e;~~~
s: ~ ~. 0
qs c::; ~ g
o t/J (Ii .,
::3 (1l Il:l
~ cil~ 0 2:
t:lHj~(1l
. ~ ~a
0~ ~ S'
N ("b t/J :s
Ot/JIl:l"-<:>
v'g::!t:O
-....c......
VI (:ll
:= - ~
< (1l ::3
(1l ~ '
t/J VJ ~
Pl
.....
....
(II
2009-08-18
JoshB
+3
+1,因为它是一个有趣的问题,虽然我怀疑你不会得到其问题的解答。 –
2009-08-18 03:40:03
+0
这是一个很好的问题,而单词/短语识别(或其他方式)是AI的一个热门话题。 –
2009-08-18 03:41:50
+1
我强烈地感到REGEX是这项工作的错误工具。 –
2009-08-18 05:20:00