NLP-C2-W1-自动更正和动态规划

最新推荐文章于 2022-07-03 18:40:47 发布

小鸽的杂货铺

最新推荐文章于 2022-07-03 18:40:47 发布

阅读量446

点赞数

本文链接：https://blog.csdn.net/weixin_42928397/article/details/108579614

版权

自动更正：

将拼写错误的单词修改正确。

工作原理：

识别拼写错误的单词
查找n编辑距离的字符串
筛选候选项
计算单词概率

构建模型

Identify a misspelled word和字典匹配
If word not in vocabulary then its misspedlled
Find strings n edit distance away
找到n编辑距离的字符串
Edit: an operation performed on a string to change it每一步只对一个字母进行操作
- Insert (add a letter) 插入一个字母
  Add a letter to a string at any position: to ==> top,two,…
- Delete (remove a letter) 删除一个字母
  Remove a letter from a string : hat ==> ha, at, ht
- Switch (swap 2 adjacent letters) 调换字母位置
  Exmaple: eta=> eat,tea
- Replace (change 1 letter to another) 替换一个字母
  Example: jaw ==> jar,paw,saw,…

最小编辑距离

评估两个字符串之间的相似性，即将一个字符串转换为另一个字符串所需的最小编辑次数，该算法试图使编辑成本最小化。
图片: https://uploader.shimo.im/f/UkmtvG2IRU7qTCfL.png

最小编辑距离算法

列上是源单词，行上是目标单词，（0,0）处是每个单词开头的空字符串

D[i，j]是源单词开头到索引i和目标单词开头到索引j之间的最小编辑距离

图片: https://uploader.shimo.im/f/EiPgJ2V2sV8VOZIw.png
在这里插入图片描述

作业

资料：github链接
最后的backtrace algorithm没有完成

Part 3-3: suggest spelling suggestions

知识一：Short circuit

In Python, logical operations such as and and or have two useful properties. They can operate on lists and they have ‘short-circuit’ behavior.

知识二：sort a list of tuples by second Item

https://www.geeksforgeeks.org/python-program-to-sort-a-list-of-tuples-by-second-item/

知识三：set intersection

# UNQ_C10 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# UNIT TEST COMMENT: Candidate for Table Driven Tests
# GRADED FUNCTION: get_corrections
def get_corrections(word, probs, vocab, n=2, verbose = False):
    '''
    Input: 
        word: a user entered string to check for suggestions
        probs: a dictionary that maps each word to its probability in the corpus
        vocab: a set containing all the vocabulary
        n: number of possible word corrections you want returned in the dictionary
    Output: 
        n_best: a list of tuples with the most probable n corrected words and their probabilities.
    '''
    
    suggestions = []
    n_best = []
    
    ### START CODE HERE ###
    suggestions = list((word in vocab and word) or edit_one_letter(word).intersection(vocab) or edit_two_letter(word).intersection(vocab))
    all_best = [(suggestion, probs[suggestion]) for suggestion in suggestions]
    n_best = sorted(all_best, key = lambda x: x[1], reverse = True)

    ### END CODE HERE ###
    
    if verbose: print("entered word = ", word, "\nsuggestions = ", suggestions)

    return n_best

小鸽的杂货铺

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
NLP-C2-W1-自动更正和动态规划

变成作业：拼写纠错资料：github链接Part 3-3: suggest spelling suggestions小知识：Short circuitIn Python, logical operations such as and and or have two useful properties. They can operate on lists and they have ‘short-circuit’ behavior.# UNQ_C10 (UNIQUE CELL IDENTIFIER,
复制链接

扫一扫