机器学习中的神经网络Neural Networks for Machine Learning：Lecture 6 Quiz_overview of mini-batch gradient descent. neural ne-CSDN博客

本文链接：https://blog.csdn.net/GarfieldEr007/article/details/50598147

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

Question 1

Suppose

w is the weight on some connection in a neural network. The network is trained using gradient descent until the learning converges. However, the dataset consists of two mini-batches, which differ from each other somewhat. As usual, we alternate between the mini-batches for our gradient calculations, and that has implications for what happens after convergence. We plot the change of

w as training progresses. Which of the following scenarios shows that convergence has occurred? Notice that we're plotting the change in

w , as opposed to

w itself.
Note that in the plots below, each iteration refers to a single step of steepest descent on a single minibatch.

Question 2

Suppose you are using mini-batch gradient descent for training some neural net on a large dataset. You have to decide on the learning rate, weight initializations, preprocess the inputs etc. You try some values for these and find that the value of the objective function on the training set decreases smoothly but very slowly. What could be causing this? Check all that apply.

The weights might have been initialized to very large values (hint: think of what this would do to the logistic hidden units).

The learning rate may be too small.

The inputs might have a very large scale (hint: think of what this would do to the logistic hidden units).

The minibatch size is too small.

Question 3

Four datasets are shown below. Each dataset has two input values (plotted below) and a target value (not shown). Each point in the plots denotes one training case. Assume that we are solving a classification problem. Which of the following datasets would most likely be easiest to train using neural nets ?

Question 4

Claire is training a neural net using mini-batch gradient descent. She chose a particular learning rate and found that the training error decreased as more iterations of training were performed, as shown here in blue:

She was not sure if this was the best she could do. So she tried a bigger learning rate. Which of the following error curves (shown in red) might she observe now? Select the two most likely plots.
Note that in the plots below, each iteration refers to a single step of steepest descent on a single minibatch.

Question 5

In the lectures, we discussed two kinds of gradient descent algorithms: mini-batch and full-batch. For which of the following problems is mini-batch gradient descent likely to be a lot better than full-batch gradient descent?

Disease prediction: Predict if a person will get cancer. The input consists of 1000 medical indicators (blood pressure, family cancer history, etc.); the training set consists of 100 patients who all suffered the same type of cancer, and 100 healthy patients.

Sentiment Analysis: Decide whether a given movie review says that the movie is 'good' or 'bad'. The input consists of the word count in the review, for each of 50,000 words. The training set consists of 1,000,000 movie reviews found on the internet.

Predict if an experiment at the Large Hadron Collider is going to yield positive results. The input consists of 25 experiment parameters (energy level, types of particles, etc). The training set consists of the 200 experiments that have already been completed (some of those yielded positive results; some yielded only negative results).

Language modeling: Predict the next word using the previous 3 words. The vocabulary consists of 100,000 words. The dataset consists of all Wikipedia articles.