深度学习Course5第二周Natural Language Processing & Word Embeddings习题整理

本文探讨了自然语言处理中的词向量表示,解释了为何词向量维度通常小于词汇量,并介绍了t-SNE的非线性降维作用。通过预训练的词向量在情感分析任务中的应用,展示了模型的泛化能力。同时,讨论了词向量的数学性质,如性别关系的线性表示。最后,提到了在不同大小数据集上使用预训练词向量的有效性。
摘要由CSDN通过智能技术生成

Natural Language Processing & Word Embeddings

  1. True/False: Suppose you learn a word embedding for a vocabulary of 20000 words. Then the embedding vectors could be 1000 dimensional, so as to capture the full range of variation and meaning in those words.
  • False
  • True

解析:The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors range between 50 and 1000.

  1. True/False: t-SNE is a linear transformation that allows us to solve analogies on word vectors.
  • False
  • True

解析:tr-SNE is a non-linear dimensionality reduction technique.

  1. Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.
    在这里插入图片描述
    Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label y = 1 y=1 y=1.
  • False
  • True

解析: word vectors empower your model with an incredible ability to generalize. The vector for “ecstatic” would contain a positive/happy connotation which will probably make your model classify the sentence as a “1”.(泛化能力增强)

  1. Which of these equations do you think should hold for a good word embedding? (Check all that apply)
  • e m a n − e u n c l e ≈ e w o m a n − e a u n t e_{man}−e_{uncle}≈e_{woman}−e_{aunt} emaneuncleewomaneaunt
  • e m a n − e w o m a n ≈ e u n c l e − e a u n t e_{man}−e_{woman}≈e_{uncle}−e_{aunt} emanewomaneuncleeaunt
  • e m a n − e w o m a n ≈ e a u n t − e u n c l e e_{man}−e_{woman}≈e_{aunt}−e_{uncle} emanewomaneaunteuncle
  • e m a n − e a u n t ≈ e w o m a n − e u n c l e e_{man}−e_{aunt}≈e_{woman}−e_{uncle} emaneauntewomaneuncle
  1. Let A A A be an embedding matrix, and let o 4567 o_{4567} o4567 be a one-hot vector corresponding to word 4567. Then to get the embedding of word 4567, why don’t we call A ∗ o 4567 A * o_{4567} Ao4567 in Python?
  • It is computationally wasteful.
  • The correct formula is A T ∗ o 4567 A^T∗o_{4567} ATo4567
  • None of the answers are correct: calling the Python snippet as described above is fine.
  • This doesn’t handle unknown words ().

解析:the element-wise multiplication will be extremely inefficient.

  1. When learning word embeddings, words are automatically generated along with the surrounding words.
  • True
  • False

解析: we pick a given word and try to predict its surrounding words or vice versa.

  1. In the word2vec algorithm, you estimate P ( t ∣ c ) P(t \mid c) P(tc), where t t t is the target word and c c c is a context word. How are t t t and c c c chosen from the training set? Pick the best answer.
  • c c c is the sequence of all the words in the sentence before t t t
  • c c c and t t t are chosen to be nearby words.
  • c c c is a sequence of several words immediately before t t t
  • c c c is the one word that comes immediately before t t t
  1. Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec model uses the following softmax function:
    在这里插入图片描述
    Which of these statements are correct? Check all that apply.
  • θ t θ_t θt and e c e_c ec are both trained with an optimization algorithm such as Adam or gradient descent.
  • θ t θ_t θt and e c e_c ec are both 500 dimensional vectors.
  • After training, we should expect θ t θ_t θt to be very close to e c e_c ec when t t t and c c c are the same word.
  • θ t θ_t θt and e c e_c ec are both 10000 dimensional vectors.
  1. Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The GloVe model minimizes this objective:
    在这里插入图片描述
    Which of these statements are correct? Check all that apply.
  • θ i θ_i θi and e j e_j ej should be initialized to 0 at the beginning of training.
  • θ i θ_i θi and e j e_j ej should be initialized randomly at the beginning of training.
  • X i j X_ij Xij is the number of times word j appears in the context of word i.
  • Theoretically, the weighting function f ( . ) f(.) f(.) must satisfy f ( 0 ) = 0 f(0)=0 f(0)=0
  1. You have trained word embeddings using a text dataset of t 1 t_1 t1 words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of t 2 t_2 t2 words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstances would you expect the word embeddings to be helpful?
  • When t 1 t_1 t1 is equal to t 2 t_2 t2
  • When t 1 t_1 t1 is smaller than t 2 t_2 t2
  • When t 1 t_1 t1 is larger than t 2 t_2 t2

解析:Transfer embeddings to new tasks with smaller training sets.

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

l8947943

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值