拼写纠错python代码

最新推荐文章于 2022-07-25 10:48:10 发布

置顶

赤醒醒

最新推荐文章于 2022-07-25 10:48:10 发布

阅读量1k

点赞数

分类专栏：笔记文章标签：自然语言处理

本文链接：https://blog.csdn.net/wawjb/article/details/105617009

版权

本文介绍了如何使用Python进行拼写纠错，通过自然语言处理技术，纠正文本中的拼写错误，提高文本质量。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

from nltk import *
from nltk.corpus import brown
#每次访问数据需要添加数据至路径当中
corpus = brown.sents()
#.sent()整个语料库中的句子,sents(fileids=[f1,f2..],categories=[c1,c2...])
import numpy as np


# 读入字典
#set() 函数创建一个无序不重复元素集，可进行关系测试，删除重复数据，还可以计算交集、差集、并集等
vocabs = set([lines.rstrip() for lines in open('vocab.txt')])


# 生成最短编辑距离的正确单词
# 1.生成候选集合和候选项
def generate1(wrong_word):
    letters = {
   'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
               'v', 'w', 'x', 'y', 'z'}
    right_word_split = [[wrong_word[:i], wrong_word[i:]] for i in range(len(wrong_word) + 1)]  # 集合没有切片操作
    insert = {
   R + M + L for R, L in right_word_split for M in letters}  # 使用}为集合，使用[为list
    replace = {
   R + M + L[1:] for R, L in right_word_split