NLP学习(5) 语言模型

最新推荐文章于 2023-07-06 18:03:43 发布

两个幽灵

最新推荐文章于 2023-07-06 18:03:43 发布

阅读量233

点赞数 1

分类专栏：深度学习

原文链接：www.bilibili.com

版权

深度学习专栏收录该内容

15 篇文章 1 订阅

订阅专栏

语言模型 (Language Model)

用口语来说, 就是判断一句话是否在语法上通顺

Compute the probability of a sentence or sequence of words.

Noise Channel Model

由源文本生成目标文本的概率 $\text{p}(test|source)\propto \text{p}(source|text)\text{p}(text)$

例如, 拼写纠错 $\text{p}(正确单词|错误单词)\propto\text{p}(错误单词|正确单词)\text{p}(正确单词)$

$\text{p}(text)$ 是语言模型, 目标字符串要符合语法

Unigram

$\text{p}([今天,是,春节,我们,都,休息])=\text{p}(今天)\text{p}(是)\text{p}(春节)\text{p}(我们)\text{p}(都)\text{p}(休息)$

$\text{p}([今天,春节,是,都,我们,休息])=\text{p}(今天)\text{p}(是)\text{p}(春节)\text{p}(我们)\text{p}(都)\text{p}(休息)$

不通的次序会生成相同的概率.

Bigram

$\text{p}([今天,是,春节,我们,都,休息])=\text{p}(今天)\text{p}(是|今天)\text{p}(春节|是)\text{p}(我们|春节)\text{p}(都|我们)\text{p}(休息|都)$

$\text{p}([今天,春节,是,都,我们,休息])=\text{p}(今天)\text{p}(春节|今天)\text{p}(是|春节)\text{p}(都|是)\text{p}(我们|都)\text{p}(休息|我们)$

N-gram

N>2都称为Higher Order

N=3称为Tri-gram

概率估计

Unigram

假设有语料库

5项| 今天 的 天气 很好 啊
5项| 我 很 想 出去 运动
5项| 但 今天 上午 有 课程
4项| 训练营 明天 才 开始

$\begin{aligned} & P([今天,开始,训练营,课程]) \\ = & P(今天)P(开始)P(训练营)P(课程) \\ = & {2 \over 19}\cdot {1 \over 19}\cdot {1 \over 19}\cdot {1 \over 19} \\ = & {2 \over {19}^4} \end{aligned}$

Bigram

假设有语料库

5项| 今天 的 天气 很好 啊
5项| 我 很 想 出去 运动
5项| 但 今天 上午 想 上课
4项| 训练营 明天 才 开始

$\begin{matrix} \begin{aligned} P(上午|今天)={1 \over 2}\\ P(的|今天)={1 \over 2}\\ P(出去|想)={1\over 2}\\ P(上课|想)={1\over 2} \end{aligned} & \begin{aligned} &P([今天,上午,想,出去,运动]) \\ =&P(今天)P(上午|今天)P(想|上午)P(出去|想)P(运动|出去)\\ =&{2\over 19}\cdot {1\over 2}\cdot 1\cdot{1 \over 2}\cdot 1\\ =&{1\over 36} \end{aligned} \end{matrix}$

模型评估

Perplexity

Perplexity= $2^{-(x)}$ , $x$ : average log likelihood

则perplexity越小, 语言模型越好.

例子: 假设有训练好的Bigram

p(天气|今天)=0.01
p(今天)=0.002
p(很好|天气)=0.1
p(适合|很好)=0.01
p(出去|适合)=0.02
p(运动|出去)=0.1

则"今天天气很好适合出去运动"的困惑度为

$\begin{aligned} x&=\frac{\log 0.002\cdot \log 0.01\cdot \log 0.1\cdot \log 0.01\cdot \log 0.02\cdot \log 0.1}{6}\\ \text{perplexity}&=2^{-x} \end{aligned}$

平滑

Add-one Smoothing

在平滑之前, 在单词 $w_{i-1}$ 后出现单词 $w_i$ 的概率是
$P_{MLE}(w_i|w_{i-1})=\frac{c(w_{i},w_{i-1})}{c(w_{i-1})}$

加平滑之后,
$P_{Add-1}(w_i|w_{i-1})=\frac{c(w_{i}, w_{i-1})+1}{c(w_{i-1})+V}$

$V$ 是词典的大小. 因为 $P(w_i)$ 可以有 $V$ 种选择, 所以分母必须加 $V$ , 是的 $P(*|w_{i-1})=1$

Add-K Smoothing

$xP_{Add-1}(w_i|w_{i-1})=\frac{c(w_{i}, w_{i-1})+k}{c(w_{i-1})+kV}$

如何选择最好的 $k$ ?

$k=1,2,\dots,100$
优化(最小化) $\text{perplexity}=f(k)$

Interpolation

计算Trigram的概率时同时考虑Unigram, Bigram, Trigram出现的频次

$\begin{aligned} p(w_n|w_{n-1},w_{n-2})&=\lambda_1 p(w_n|w_{n-1},w_{n-2})\\ &+\lambda_2 p(w_n|w_{n-1})\\ &+\lambda_3 p(w_n) \end{aligned}$

其中 $\lambda_1+\lambda_2+\lambda_3=1$

Good-Turning Smoothing

[视频不存在]

两个幽灵

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NLP学习(5) 语言模型

语言模型 (Language Model)用口语来说, 就是判断一句话是否在语法上通顺Compute the probability of a sentence or sequence of words.Noise Channel Model由源文本生成目标文本的概率 p(test∣source)∝p(source∣text)p(text)\text{p}(test|source)\propto \text{p}(source|text)\text{p}(text)p(test∣source)∝p(
复制链接

扫一扫