自动纠错

最新推荐文章于 2022-10-08 16:10:33 发布

寂寞灵魂

最新推荐文章于 2022-10-08 16:10:33 发布

阅读量509

点赞数

分类专栏： NLP

本文链接：https://blog.csdn.net/riverflowrand/article/details/53519603

版权

NLP 专栏收录该内容

49 篇文章 1 订阅

订阅专栏

a summary for https://medium.com/@sarthfrey/https-medium-com-prcobol-the-anatomy-of-autocorrect-9671cecad4b1#.gthtpsfo9

pre knowledge:

1、编辑距离
2、
$P (r i g h t | e r r o r) = P ( e r r o r | r i g h t ) * P ( r i g h t ) P ( e r r o r )$ $P(right|error)= \frac{P(error|right) * P(right)}{P(error)}$
3、This is not a bad assumption, as approximately 75% of errors are within 1 edit distance and nearly all of them are within 2 edit distance , and A simple estimate with a 75% accuracy for one suggestion provides a 98.4% accuracy for 3 suggestions (100*(1-0.25³)).

tempt 1:

1、Check if the error word is valid English, if so return it, otherwise proceed.
2、Find the word at 1 edit distance of the error word and that occurs most in the corpus and return it, if none can be found then proceed.
3、Find the valid word within 2 edit distance of the error word and that occurs most in the corpus and return it, if none can be found then proceed.
4、The spelling corrector has failed, return the error word.

tempt 2:

using knowledge 2.

这里写图片描述
tempt 3:

This is where we can add an α parameter, in which we exponentiate our language model by α, such that we are now finding the w that maximizes P(x|w)*P(w)^α.

what’s more:

Next, what if the suitable correction to our error word is at 2 edit distance, and the way we multiply the first edit probability by the second in our error model makes it so that we pretty much never select corrections at more than 1 edit distance? We can raise the second edit probability to β and test that to choose a β like we did for α.

这里写图片描述

future tempt:

使用上下文信息。using markev chain、RNN等模型。

寂寞灵魂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
自动纠错

a summary for https://medium.com/@sarthfrey/https-medium-com-prcobol-the-anatomy-of-autocorrect-9671cecad4b1#.gthtpsfo9pre knowledge: 1、编辑距离 2、P(right|error)=P(error|right)∗P(right)P(error)P(righ
复制链接

扫一扫