Evaluation and Perplexity

Every natural language processing tool has to be evaluated and language models have to be evaluated as well.

There is two method to evaluate the model language.

One is extrinsic evaluation:

The best way of comparing any two two language models, A and B is to put each model in a task, and we'll get accuracy and compare the two accuracy of the two models. But it's  time consuming in many cases.

The other is an intrinsic evaluation, most common intrinsic evaluation is called perplexity.

Perplexity happens to be a bad approximation to an extrinsic evaluation unless it turns out that the test data looks a lot like the training data. So generally perplexity is useful only in pilot experiments, but it doesn't help to think about the problem and it's a useful tool as long as we also use extrinsic evaluation as well.

Perplexity is the probability of the test set, normalized by the number of words : PP(W) = (P(w1w2w3...wN))^(-1/N)

we want some nomalizing factor so we can compare test sets of different lengths. The minimizing perplexity is the same as maximizing probability. That perplexity is related to the average branching factor.

For example, if I have ten possible word that can come next and they were all equal probablity, the perplexity will be ten. Let's suppose a sentence consisting of random digits.

PP(W) = P(w1w2...wN)^(-1/N) = (1/10 * 1/10....1/10) ^ (-1 / 10) = 10

Conclusion: Low perplexity = better model

转载于:https://www.cnblogs.com/chuanlong/archive/2013/04/22/3035623.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值