机器学习中的神经网络Neural Networks for Machine Learning：Lecture 10 Quiz-CSDN博客

本文链接：https://blog.csdn.net/GarfieldEr007/article/details/50598306

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

In accordance with the Coursera Honor Code, I certify that the answers here are my own work.

Question 1

When learning a mixture of experts, it is desirable that each expert specializes in a different area of the input space. But then at test time, how will we know which expert to use?

We see which training case the test case is closest to (in input space) and use the model that was used for that training case.

We also learn a "manager" model that sees the input and assigns probabilities for picking each expert. We then get predictions from all the experts and take their weighted average using the probabilities.

We also learn a "manager" model that sees the input and assigns probabilities for picking each expert. We then choose the expert that has the highest probability and use it to make a prediction.

We uniformly average the predictions of each expert.

Question 2

Which data set can not be classified perfectly with a linear classifier but can be classified perfectly with a mixture of two experts where each expert is a linear classifier ?
Assume that the manager is also linear, i.e., it can decide which expert to use based on a linear criterion only. (In other words, the manager has a linear function

f and given any input case

x , it must decide to apply expert 1 with probability 1 if

f(x)>0 and expert 2 with probability 1 if

f(x)≤0 .

Question 3

Andy has a dataset of points that he wishes to classify. This set is shown below.

Being knowledgable about bagging, he samples this data set and creates 3 separate ones. He then uses a neural net to learn separate classifiers on each data set. The learned classifier boundaries are shown below. Note that each data set is a subset of the complete dataset.

Which of the following statements is true ?

The learned models are different ("high-variance") and do well on their training sets, so model averaging is likely to help in generalization.

All the learned models make a lot of errors ("high-bias"), so model averaging is likely to help in generalization.

All the learned models make a lot of errors ("high-bias"), so model averaging is unlikely to help in generalization.

The learned models are different ("high-variance") and do well on their training sets, so model averaging is unlikely to help in generalization.

Question 4

In Bayesian learning, we learn a probability distribution over parameters of the model. Then at test time, how should this distribution be used to get predictions with the highest possible accuracy?

Sample a lot of parameters using some sampling procedure (such as MCMC) and average the parameters. Then use the averaged parameter setting to obtain a prediction.

Sample the distribution once to get a parameter setting and use it to make a prediction.

Sample a lot of parameters using some sampling procedure (such as MCMC) and average the predictions obtained by using each parameter setting separately.

Pick the parameter setting that has maximum probability and use it to make a prediction.

Question 5

Amy is trying different MCMC samplers to sample from a probability distribution. Each option shows a few samples obtained by running a sampler. It is known that the distribution is multimodal and peaked. Which among the following is the best sampler ?

Question 6

Brian wants to learn a classifier to predict if a movie is "good" or "bad" given its review. He has a lot of computational power and wants to use a very powerful model. However, he only has a small dataset of labelled movie reviews. He tried learning a massive neural net and found that it achieves zero training error very easily, but its test performance is much worse. He then trained a small neural net and found that it does not get zero training error, but still the test performance is no better than what the big model got. Neither a big nor a small model works for him! He is completely disappointed with neural nets now. He is willing to spend as much computational power as needed during training and testing. What suggestion can you give to help him?

Look for a better optimization algorithm to help the large neural net.

Train lots of small neural nets of the same architecture on the whole data and average their predictions.

Train the big neural net with dropout in the hidden units.

Train many different models - neural nets, SVMs, decision trees - and average their predictions.