深度学习Course5第二周Natural Language Processing & Word Embeddings习题整理

最新推荐文章于 2024-01-09 12:58:25 发布

l8947943

最新推荐文章于 2024-01-09 12:58:25 发布

阅读量4.3k

点赞数 1

分类专栏： deeplearning_ai 文章标签：深度学习 python 人工智能

本文链接：https://blog.csdn.net/l8947943/article/details/126739465

版权

deeplearning_ai 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

本文探讨了自然语言处理中的词向量表示，解释了为何词向量维度通常小于词汇量，并介绍了t-SNE的非线性降维作用。通过预训练的词向量在情感分析任务中的应用，展示了模型的泛化能力。同时，讨论了词向量的数学性质，如性别关系的线性表示。最后，提到了在不同大小数据集上使用预训练词向量的有效性。

摘要由CSDN通过智能技术生成

Natural Language Processing & Word Embeddings

True/False: Suppose you learn a word embedding for a vocabulary of 20000 words. Then the embedding vectors could be 1000 dimensional, so as to capture the full range of variation and meaning in those words.

False
True

解析：The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors range between 50 and 1000.

True/False: t-SNE is a linear transformation that allows us to solve analogies on word vectors.

False
True

解析：tr-SNE is a non-linear dimensionality reduction technique.

Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.

Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label $y = 1$ .

False
True

解析： word vectors empower your model with an incredible ability to generalize. The vector for “ecstatic” would contain a positive/happy connotation which will probably make your model classify the sentence as a “1”.(泛化能力增强)

Which of these equations do you think should hold for a good word embedding? (Check all that apply)

$e_{man}−e_{uncle}≈e_{woman}−e_{aunt}$
$e_{man}−e_{woman}≈e_{uncle}−e_{aunt}$
$e_{man}−e_{woman}≈e_{aunt}−e_{uncle}$
$e_{man}−e_{aunt}≈e_{woman}−e_{uncle}$

Let $A$ be an embedding matrix, and let $o_{4567}$ be a one-hot vector corresponding to word 4567. Then to get the embedding of word 4567, why don’t we call $A * o_{4567}$ in Python?

It is computationally wasteful.
The correct formula is $A^T∗o_{4567}$
None of the answers are correct: calling the Python snippet as described above is fine.
This doesn’t handle unknown words ().

解析：the element-wise multiplication will be extremely inefficient.

When learning word embeddings, words are automatically generated along with the surrounding words.

True
False

解析： we pick a given word and try to predict its surrounding words or vice versa.

In the word2vec algorithm, you estimate $\mid c)$ , where $t$ is the target word and $c$ is a context word. How are $t$ and $c$ chosen from the training set? Pick the best answer.

$c$ is the sequence of all the words in the sentence before $t$
$c$ and $t$ are chosen to be nearby words.
$c$ is a sequence of several words immediately before $t$
$c$ is the one word that comes immediately before $t$

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec model uses the following softmax function:

Which of these statements are correct? Check all that apply.

$θ_t$ and $e_c$ are both trained with an optimization algorithm such as Adam or gradient descent.
$θ_t$ and $e_c$ are both 500 dimensional vectors.
After training, we should expect $θ_t$ to be very close to $e_c$ when $t$ and $c$ are the same word.
$θ_t$ and $e_c$ are both 10000 dimensional vectors.

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The GloVe model minimizes this objective:

Which of these statements are correct? Check all that apply.

$θ_i$ and $e_j$ should be initialized to 0 at the beginning of training.
$θ_i$ and $e_j$ should be initialized randomly at the beginning of training.
$X_ij$ is the number of times word j appears in the context of word i.
Theoretically, the weighting function $f (.)$ must satisfy $f (0) = 0$

You have trained word embeddings using a text dataset of $t_1$ words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of $t_2$ words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstances would you expect the word embeddings to be helpful?