[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

最新推荐文章于 2022-03-10 09:05:29 发布

置顶 gdtop818

最新推荐文章于 2022-03-10 09:05:29 发布

阅读量3.4k

点赞数

分类专栏： deep learning 文章标签： deeep learning

本文链接：https://blog.csdn.net/weixin_37993251/article/details/79334047

版权

deep learning 专栏收录该内容

32 篇文章 0 订阅

订阅专栏

2.1 Introduction to Word Embeddings

2.1.1 Word Representation

Featurized representation: word embedding

use an n-dimensional vector to represent one word

2.1.2 Using word embeddings

Transfer learning and word embeddings

2.1.3 Properties of word embeddings

Cosine similarity

2.1.4 Embedding matrix

2.2 Learning Word Embeddings: Word2VEC & Glove

2.2.1 learn word embeddings

2.2.2 Word2Vec

2.2.3 GloVe word vectors

2.3 Applications using Word Embeddings

2.3.1 Sentiment Classification

RNN for sentiment classification

2.3.2 Debiasing word embeddings

bias problem

Q&A

5.A

E∗e is computationally wasteful.

Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words.

True

False

Correct

Question 2Correct

1 / 1 points

2. Question 2

What is t-SNE?

A linear transformation that allows us to solve analogies on word vectors

A non-linear dimensionality reduction technique

Correct

A supervised learning algorithm for learning word embeddings

An open-source sequence modeling library

Question 3Correct

1 / 1 points

3. Question 3

Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.

x (input text)	y (happy?)
I'm feeling wonderful today!	1
I'm bummed my cat is ill.	0
Really enjoying this!	1

Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label y=1.

True

Correct

False

Question 4Incorrect

0 / 1 points

4. Question 4

Which of these equations do you think should hold for a good word embedding? (Check all that apply)

eboy−egirl≈ebrother−esister

This should be selected

eboy−egirl≈esister−ebrother

Un-selected is correct

eboy−ebrother≈egirl−esister

Correct

eboy−ebrother≈esister−egirl

This should not be selected

Recall the logic of analogies! The order of the words matter.

Question 5Incorrect

0 / 1 points

5. Question 5

Let E be an embedding matrix, and let e1234 be a one-hot vector corresponding to word 1234. Then to get the embedding of word 1234, why don’t we call E∗e1234 in Python?

It is computationally wasteful.

The correct formula is ET∗e1234.

This doesn’t handle unknown words (<UNK>).

None of the above: Calling the Python snippet as described above is fine.

This should not be selected

Question 6Correct

1 / 1 points

6. Question 6

When learning word embeddings, we create an artificial task of estimating P(target∣context). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.

True

Correct

False

Question 7Correct

1 / 1 points

7. Question 7

In the word2vec algorithm, you estimate P(t∣c), where t is the target word and c is a context word. How are t and c chosen from the training set? Pick the best answer.

c is the one word that comes immediately before t.

c is a sequence of several words immediately before t.

c and t are chosen to be nearby words.

Correct

c is the sequence of all the words in the sentence before t.

Question 8Incorrect

0 / 1 points

8. Question 8

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec model uses the following softmax function:

P(t∣c)=eθTtec∑10000t′=1eθTt′ec

Which of these statements are correct? Check all that apply.

θt and ec are both 500 dimensional vectors.

Correct

θt and ec are both 10000 dimensional vectors.

Un-selected is correct

θt and ec are both trained with an optimization algorithm such as Adam or gradient descent.

Correct

After training, we should expect θt to be very close to ec when t and c are the same word.

This should not be selected

Question 9Correct

1 / 1 points

9. Question 9

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:

min∑10,000i=1∑10,000j=1f(Xij)(θTiej+bi+b′j−logXij)2

Which of these statements are correct? Check all that apply.

θi and ej should be initialized to 0 at the beginning of training.

Un-selected is correct

θi and ej should be initialized randomly at the beginning of training.

Correct

Xij is the number of times word i appears in the context of word j.

Correct

The weighting function f(.) must satisfy f(0)=0.

Correct

The weighting function helps prevent learning only from extremely common word pairs. It is not necessary that it satisfies this function.

Question 10Correct

1 / 1 points

10. Question 10

You have trained word embeddings using a text dataset of m1 words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of m2 words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstance would you expect the word embeddings to be helpful?

m1 >> m2

Correct

m1 << m2

gdtop818

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.1 Introduction to Word Embeddings2.1.1 Word RepresentationFeaturized representation: word embeddinguse an n-dimensional vector to represent one word 2.1.2 Using word embeddingsTransf...
复制链接

扫一扫

专栏目录