自动纠错

a summary for https://medium.com/@sarthfrey/https-medium-com-prcobol-the-anatomy-of-autocorrect-9671cecad4b1#.gthtpsfo9

pre knowledge:

1、编辑距离
2、

P(right|error)=P(error|right)P(right)P(error)

3、This is not a bad assumption, as approximately 75% of errors are within 1 edit distance and nearly all of them are within 2 edit distance , and A simple estimate with a 75% accuracy for one suggestion provides a 98.4% accuracy for 3 suggestions (100*(1-0.25³)).

tempt 1:

1、Check if the error word is valid English, if so return it, otherwise proceed.
2、Find the word at 1 edit distance of the error word and that occurs most in the corpus and return it, if none can be found then proceed.
3、Find the valid word within 2 edit distance of the error word and that occurs most in the corpus and return it, if none can be found then proceed.
4、The spelling corrector has failed, return the error word.

tempt 2:

using knowledge 2.

这里写图片描述
tempt 3:

This is where we can add an α parameter, in which we exponentiate our language model by α, such that we are now finding the w that maximizes P(x|w)*P(w)^α.

what’s more:

Next, what if the suitable correction to our error word is at 2 edit distance, and the way we multiply the first edit probability by the second in our error model makes it so that we pretty much never select corrections at more than 1 edit distance? We can raise the second edit probability to β and test that to choose a β like we did for α.

这里写图片描述

future tempt:

使用上下文信息。using markev chain、RNN等模型。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值