programming task2--neural network

最新推荐文章于 2022-11-25 22:40:15 发布

SophieCXT

最新推荐文章于 2022-11-25 22:40:15 发布

阅读量464

点赞数

分类专栏： Neural network for machine lea

本文链接：https://blog.csdn.net/SophieCXT/article/details/80869560

版权

Neural network for machine lea 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

Train a model with 50 dimensional embedding space, 200 dimensional hidden layer and default setting of all other hyperparameters. What is average validation set cross entropy as reported by the training program after 10 epochs ? Please provide a numeric answer (three decimal places). [4 points]

不正确回答

第 3 个问题

错误

0/3 分

3. 第 3 个问题

Train a model for 10 epochs with a 50 dimensional embedding space, 200 dimensional hidden layer, a learning rate of 0.0001 and default setting of all other hyperparameters. What do you observe ? [3 points]

Cross Entropy on the training and validation set decreases very rapidly.

这个选项的答案不正确

Cross Entropy on the validation set fluctuates wildly and eventually diverges.

这个选项的答案不正确

Cross Entropy on the training set fluctuates wildly and eventually diverges.

这个选项的答案不正确

Cross Entropy on the training and validation set decreases very slowly.

正确

错误

0/3 分

4. 第 4 个问题

If all weights and biases in this network were set to zero and no training is performed, what will be the average cross entropy on the training set ? Please provide a numeric answer (three decimal places). [3 points]

不正确回答

The answer you gave is not a number.

If all weights and biases are zero, the output distribution will be uniform for all inputs. The entropy will then be $\log_e(n)$ where $n$ is the number of words in the vocabulary. In this case it will $\log_e(250)$

第 5 个问题

正确

1/1 分

5. 第 5 个问题

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer.

Model A: Learning rate = 0.001,
Model B: Learning rate = 0.1
Model C: Learning rate = 10.0.

Use the default settings for all other hyperparameters. Which model gives the lowest training set cross entropy after 1 epoch ? [3 points]

Model C

Model A

Model B 不正确

正确

第 6 个问题

错误

0/2 分

6. 第 6 个问题

In the models trained in Question 5, which one gives the lowest training set cross entropy after 10 epochs ? [2 points]

Model B

Model A

这个选项的答案不正确

Model C 试试不对

第 7 个问题

正确

3/3 分

7. 第 7 个问题

Train each of following models:

Model A: 5 dimensional embedding, 100 dimensional hidden layer
Model B: 50 dimensional embedding, 10 dimensional hidden layer
Model C: 50 dimensional embedding, 200 dimensional hidden layer
Model D: 100 dimensional embedding, 5 dimensional hidden layer

Use default values for all other hyperparameters.

Which model gives the best training set cross entropy after 10 epochs of training ? [3 points]

Model D

Model C

正确

Model A

Model B

第 8 个问题

错误

0/2 分

8. 第 8 个问题

In the models trained in Question 7, which one gives the best validation set cross entropy after 10 epochs of training ? [2 points]

Model D 试试，不对

Model A 试试，不对

Model B

这个选项的答案不正确

Model C

第 9 个问题

错误

0/3 分

9. 第 9 个问题

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer.

Model A: Momentum = 0.0
Model B: Momentum = 0.5
Model C: Momentum = 0.9

Use the default settings for all other hyperparameters. Which model gives the lowest validation set cross entropy after 5 epochs ? [3 points]

Model C 试试，对的！

Model B

这个选项的答案不正确

Model A

第 10 个问题

正确

2/2 分

10. 第 10 个问题

Train a model with 50 dimensional embedding layer and 200 dimensional hidden layer for 10 epochs. Use default values for all other hyperparameters.

Which words are among the 10 closest words to the word 'could'. [2 points]

'can'

正确

'some'

未选择的是正确的

'the'

未选择的是正确的

'should'

正确

第 11 个问题

错误

0/2 分

11. 第 11 个问题

In the model trained in Question 10, why is the word 'percent' close to 'dr.' even though they have very different contexts and are not expected to be close in word embedding space? [2 points]

The model is not capable of separating them in embedding space, even if it got a much larger training set.

这个选项的答案不正确

Both words occur very rarely, so their embedding weights get updated very few times and remain close to their initialization. 试试，对的！

We trained the model with too large a learning rate.

Both words occur too frequently.

第 12 个问题

正确

2/2 分

12. 第 12 个问题

In the model trained in Question 10, why is 'he' close to 'she' even though they refer to completely different genders? [2 points]

Both words occur very rarely, so their embedding weights get updated very few times and remain close to their initialization.

They differ by only one letter.

The model does not care about gender. It puts them close because if 'he' occurs in a 4-gram, it is very likely that substituting it by 'she' will also make a sensible 4-gram.

正确

They often occur close by in sentences.

第 13 个问题

正确

3/3 分

13. 第 13 个问题

In conclusion, what kind of words does the model put close to each other in embedding space. Choose the most appropriate answer. [3 points]

Words that belong to similar topics. A topic is a semantic categorization (like 'sports', 'art', 'business', 'computers' etc).

Words that can be substituted for one another and still make up a sensible 4-gram.

正确

Words that occur close to each other (within three words to the left or right) in many sentences.

Words that occur close in an alphabetical sort.

3. 第 3 个问题

Train a model for 10 epochs with a 50 dimensional embedding space, 200 dimensional hidden layer, a learning rate of 100.0 and default setting of all other hyperparameters. What do you observe ? [3 points]

Cross Entropy on the training set fluctuates around a large value.

这应该被选择

Cross Entropy on the training set decreases smoothly but fluctuates around a large value on the validation set.

这个选项的答案不正确

Cross Entropy on the validation set fluctuates around a large value.

这应该被选择

Cross Entropy on the training set fluctuates wildly and eventually diverges.

未选择的是正确的