记得以前就看过这篇文章:How to write a spelling corrector,文章将贝叶斯原理运用于拼写检查,二十几行简单的Python的代码就实现了一个拼写检查器。
原作者python代码:
importre, collectionsdef words(text): return re.findall('[a-z]+', text.lower())deftrain(features):
model= collections.defaultdict(lambda: 1)for f infeatures:
model[f]+= 1
returnmodel
NWORDS= train(words(file('big.txt').read()))
alphabet= 'abcdefghijklmnopqrstuvwxyz'
defedits1(word):
splits= [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes= [a + b[1:] for a, b in splits ifb]
transposes= [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces= [a + c + b[1:] for a, b in splits for c in alphabet ifb]
inserts= [a + c + b for a, b in splits for c inalphabet]return set(deletes + transposes + replaces +inserts)defknown_edits2(word):return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 inNWORDS)def known(words): return set(w for w in words if w inNWORDS)defcorrect(word):
candidates= known([word]) or known(edits1(word)) or known_edits2(word) or[word]return max(candidates, key=NWORDS.get)
读完整篇文章,感叹数学的美妙之外,也很喜欢类似