Coursera NLP 笔记01

最新推荐文章于 2021-03-18 20:36:17 发布

weixin_30952535

最新推荐文章于 2021-03-18 20:36:17 发布

阅读量120

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/bertrandwdx/archive/2013/03/04/2943358.html

版权

开始看coursera上Micheal Collins 的nlp 课程了，老师讲得不错，但是自己理解其实不是很深刻所以做些笔记，方便补充。

什么是NLP？

　　一般定义就是计算机使用自然语言作为输入或者输出。

　　NLU: undestanding input

　　NLG: generation output

NLP的任务有：

　　最老的Machine Translation， Information Extraction(信息抽取）， TextSummarization(文本归纳), DialogueSystems(对话系统)。

最基本的nlp问题主要有Tagging和Parsing两类：

　　Tagging就是对于一句话中的每一个可分单位给予特定的标记

例如：

常用的词性标注Part-of-speech tagging: noun, verb, preposition, …
- Profits (N) soared (V) at (P) Boeing (N)
命名实体识别Named Entity Recognition: companies, locations, people
- Profits (NA) soared (NA) at (NA) Boeing (C)

　　Parsing主要是对于句子的语义分析。

　　例如：

　　将“Boeing is located in Seattle" 变成一颗派生树。

为什么NLP这个方向很难？

　　Ambiguity(二义性或者多义性）多方面：syntactic, acoustic, semantic, discourse(multi-course)

　　例如：

“At last, a computer that understands you like your mother”; three intrepretations at the syntactic level.
But also occurs at an acoustic level: “like your” sounds like “lie cured”.
- One is more likely than the other, but without this information difficult to tell.
At semantic level, words often have more than one meaning. Need context to disambiguate.
- “I saw her duck with a telescope”.
At discourse (multi-clause) level.
- “Alice says they’ve built a computer that understands you like your mother”
- If you start a sentence saying “but she…”, who is she referring to?

The Language Modeling Problem

　　V:是我们已经有的有限个数的单词集合

　　如： V＝｛the，a, man, telescope, Beckham, two,...}

　　而V⁺则代表了可数的无限字符串集合，这些字符串都由之前的单词集合组成，而语言模型的目标就是用来判定每一个字符串是否符合要求，而现在最常用的判断

方法就是通过使用一个概率分布来体系这个句子的正确性。

　　如：V⁺={"the STOP","a STOP","the fan STOP",...}

We have a training sample of example sentences in English.
- Sentences from the New York Times in the last 10 years.
- Sentences from a large set of web pages.
- In the 1990’s 20 million words common, by the end of the 90’s 1 billion words common.
- Nowadays 100’s of billions of words.
With this training sample we want to “learn” a probabiliy distribution p, i.e. p is a function that satisfies:

\sum x \in V + p (x) = 1, p (x) \geq 0 \forall x \in V +

For any sentence x in language, p(x) >= 0.
通过每个句子或者字符串概率的大小就可以较为科学的区分出句子的正确性。

　　语言模型中使用最为广泛的模型叫做Markov模型，它对于语言的分析做了一个很强的假设，也就是说“一个词汇出现的概率只与前面的一个或者数个词相关”，很明显这个假设是错的，但在目前的应用中非常有效,主要使得计算简便了很多。用公式解释这个假设也就是：

其中最后一个等式使用了假设

上式因为一个词汇的概率只与前一个词有关，所有又被称为First-OrderMarkov Processess除此之外还有Second-Order MarkovProcesses：与前两个词有关：

　　一般为了表示的方便，我们设定x(0)=*，x(-1)=* 。' * '表示一种特殊的开始符号，所以Second-Order又可以表示为：

Trigram Language Models

一个三元语言模型由以下几个部分组成:
1. 一个有限集合V (the words, the vocabulary).
2. 对于每个三元组u,v,w有一个参数 q(w|u,v)满足 w∈V⋃{STOP}, and u,v∈V⋃{∗}.
  - For each trigram u,v,w, a sequence of three words, we have a parameter q(w|u,v).
  - w could be any element of V or STOP, and
  - u,v could be any element of V or START.
For any sentence x1,…,xn where xi∈V for i=1…(n−1), and xn=STOP, the probability of the sentence under the trigram model is:

p (x 1, \dots, x n) = \prod n i = 1 q (x i | x i - 2, x i - 1)

where we define x0=x−1=∗.
i.e. for any sentence the probability of it is the product of second-order Markov probabilities of its constituent trigrams.

The Trigram Estimation Problem

A natural estimate: the maximum likelihood estimate (ML).
Recall that we assume that we have a training set, some example sentences in our language, typically, as you recall, millions or billions of sentences.
From these sentences we can derive counts; how often do trigrams occur?

会遇到Sparse Data Problems，也就是说分母可能为0，这是极有可能出现的事情，而且当|V|越来越大时，参数q的个数也就越来越多，因为我们知道参数的个数等于|V|^3，这里留待以后解决。

Evaluating Language Models: Perplexity

我们有一系列用来测试的句子

，我们可以用我们做好的模型来衡量它们概率和，为了方便起见对其取log

那么对于perplexity的定义也就是：

我们假设一个模型，有N=|V |+1，V是词汇的数量，并且q(w | u,v)=1 / N，可以计算这个模型的perplexity=N，而perplexity的值越小则越好。

通过一个实际的例子来予以说明：

其中的bigram model(二元模型)和unigram model(一元模型)是类似于三元模型的定义，这点在上面可以看得很清楚，可以发现三个模型中trigram的perplexity是最低的，与|V|+1=50001相比差距十分巨大。

Linear Interpolation 线性插值？

这里又涉及到了Bias-Variance的区分了，具体来说就是简单的模型容易欠拟合，复杂的模型容易过拟合，上面列举的三个模型也一样，我们不能单单看perplexity来说一个模型的好坏，应该看这个模型在测试数据上的表现如何。

Linear Interpolation也就是将三个模型综合起来，首先对三个模型的参数q进行最大似然估计，得到：

然后将这三个结果通过参数结合起来：

其中

那么如何得到这三个参数的值呢？这需要我们将一部分的训练数据留下作为验证集(validation set)，选择λ1，λ2，λ3使得下式最大：

其中

是

在validation set 中出现的次数，这样一来也就是说最大化了模型对于validation set的精度，并以此来求出三个参数

这两个部分摘自http://blog.csdn.net/dark_scope/article/details/8616672感觉总结得不错

转载于:https://www.cnblogs.com/bertrandwdx/archive/2013/03/04/2943358.html

weixin_30952535

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Coursera NLP 笔记01

开始看coursera上Micheal Collins 的nlp 课程了，老师讲得不错，但是自己理解其实不是很深刻所以做些笔记，方便补充。什么是NLP？　　一般定义就是计算机使用自然语言作为输入或者输出。　　NLU: undestanding input　　NLG: generation outputNLP的任务有：　　最老的Machine Translation， Inf...
复制链接

扫一扫