Perplexity Vs Cross-entropy

Evaluating a Language Model: Perplexity

We have a serial of \(m\) sentences:
\[s_1,s_2,\cdots,s_m\]
We could look at the probability under our model \(\prod_{i=1}^m{p(s_i)}\). Or more conveniently, the log probability:
\[\log \prod_{i=1}^m{p(s_i)}=\sum_{i=1}^m{\log p(s_i)}\]
where \(p(s_i)\) is the probability of sentence \(s_i\).

In fact, the usual evaluation measure is perplexity:
\[PPL=2^{-l}\]
\[l=\frac{1}{M}\sum_{i=1}^m{\log p(s_i)}\]
and \(M\) is the total number of words in the test data.

Cross-Entropy

Given words \(x_1,\cdots,x_t\), a language model prdicts the following word \(x_{t+1}\) by modeling:
\[P(x_{t+1}=v_j|x_t\cdots,x_1)=\hat y_j^t\]
where \(v_j\) is a word in the vocabulary.

The predicted output vector \(\hat y^t\in \mathbb{R}^{|V|}\) is a probability distribution over the vocabulary, and we optimize the cross-entrpy loss:
\[\mathcal{L}^t(\theta)=CE(y^t,\hat y^t)=-\sum_{i=1}^{|V|}{y_i^t\log \hat y_i^t}\]
where \(y^t\) is the one-hot vector corresponding to the target word. This is a poiny-wise loss, and we sum the cross-ntropy loss across all examples in a sequence, across all sequences in the dataset in order to evaluate model performance.

The relationship between cross-entropy and ppl

\[PP^t=\frac{1}{P(x_{t+1}^{pred}=x_{t+1}|x_t\cdots,x_1)}=\frac{1}{\sum_{j=1}^V {y_j^t\cdot \hat y_j^t}}\]
which is the inverse probability of the correct word, according to the model distribution \(P\).

suppose \(y_i^t\) is the only nonzero element of \(y^t\). Then, note that:
\[CE(y^t,\hat y^t)=-\log \hat y_i^t=\log\frac{1}{\hat y_i^t}\]
\[PP(y^t,\hat y^t)=\frac{1}{\hat y_i^t}\]
Then, it follows that:
\[CE(y^t,\hat y^t)=\log PP(y^t,\hat y^t)\]

In fact, minizing the arthimic mean of the cross-entropy is identical to minimizing the geometric mean of the perplexity. If the model predictions are completely random, \(E[\hat y_i^t]=\frac{1}{|V|}\), and the expected cross-entropies are \(\log |V|\), (\(\log 10000\approx 9.21\))

转载于:https://www.cnblogs.com/ZJUT-jiangnan/p/5612096.html

在PyTorch中,计算 perplexity 主要是用于评估语言模型性能的一个指标,它衡量了模型对未知数据的预测能力。Perplexity 是指给定一段文本时,模型平均猜测每个词的概率的倒数指数。公式通常是: \[ Perplexity = e^{-\frac{1}{N} \sum_{i=1}^{N} \log(p(w_i))} \] 其中 \( N \) 是文本中单词的数量,\( p(w_i) \) 是模型预测第 \( i \) 个词的概率。 以下是基本步骤来计算 perplexity: 1. **编码文本**:首先,你需要将文本转换成模型可以理解的形式,例如词汇索引序列。 2. **模型预测**:通过模型对每个单词进行概率预测。 3. **取对数**:对每个单词的概率取自然对数,因为原生的概率值可能很小,直接相加会有数值溢出的风险。 4. **求平均**:将所有单词的对数概率相加,然后除以总单词数得到平均每词的损失。 5. **指数运算**:最后,对平均损失取指数得到 perplexity 值。 在PyTorch中,这通常会在 `nn.CrossEntropyLoss` 或自定义函数中结合 `softmax` 和 `log_softmax` 函数完成。 ```python import torch # 假设 model_out 是从模型得到的前向传播结果(形状为 (batch_size, seq_length, vocab_size)) model_out = ... # softmax后的概率分布 target = ... # 形状为 (batch_size, seq_length) # 使用 log_softmax 取对数 loss = nn.functional.cross_entropy(model_out, target, reduction='mean') # 计算 perplexity perplexity = torch.exp(loss) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值