21行python代码实现拼写检查

最新推荐文章于 2024-05-14 07:32:04 发布

沙丁鱼鱼鱼

最新推荐文章于 2024-05-14 07:32:04 发布

阅读量974

点赞数 1

文章标签： python 拼写检查

原文链接：

http://norvig.com/spell-correct.htm

http://blog.csdn.net/Pwiling/article/details/50573650

"big.txt" 下载链接：http://norvig.com/big.txt

代码版本1：

import re
from collections import Counter

def words(text): return re.findall(r'\w+', text.lower())

WORDS = Counter(words(open('big.txt').read()))

def P(word, N=sum(WORDS.values())): 
    "Probability of `word`."
    return WORDS[word] / N

def correction(word): 
    "Most probable spelling correction for word."
    return max(candidates(word), key=P)

def candidates(word): 
    "Generate possible spelling corrections for word."
    return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word])

def known(words): 
    "The subset of `words` that appear in the dictionary of WORDS."
    return set(w for w in words if w in WORDS)

def edits1(word):
    "All edits that are one edit away from `word`."
    letters    = 'abcdefghijklmnopqrstuvwxyz'
    splits     = [(word[:i], word[i:])    for i in range(len(word) + 1)]
    deletes    = [L + R[1:]               for L, R in splits if R]
    transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
    replaces   = [L + c + R[1:]           for L, R in splits if R for c in letters]
    inserts    = [L + c + R               for L, R in splits for c in letters]
    return set(deletes + transposes + replaces + inserts)

def edits2(word): 
    "All edits that are two edits away from `word`."
    return (e2 for e1 in edits1(word) for e2 in edits1(e1))

代码版本2：

import re, collections

def words(text): return re.findall('[a-z]+', text.lower()) 

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

NWORDS = train(words(file('Cloud.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)

这两个版本的代码都可以运行。但是遇到一个问题，原文中说输入"speling",拼写检查器会将其修正为"spelling"。不过我试了运行，发现结果不一致。

两个版本的结果都和我预期得到的结果不一样，很奇怪，大概是"big.txt"中缺少"spelling"这个训练数据？于是我手动在"big.txt"中加入了"spelling"这个单词，咦，跑了一遍，发现结果正确了！说明我们给的训练集对之后的拼写检查结果有着极大的影响。

虽然只有短短21行，却实现了复杂的拼写检查，这里面的算法实在是精妙啊！

沙丁鱼鱼鱼

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
21行python代码实现拼写检查

原文链接：http://norvig.com/spell-correct.htmhttp://blog.csdn.net/Pwiling/article/details/50573650"big.txt" 下载链接：http://norvig.com/big.txt代码版本1：import refrom collections import Counterdef wo
复制链接

扫一扫