机器学习中的神经网络Neural Networks for Machine Learning：Lecture 3 Quiz

本文链接：https://blog.csdn.net/GarfieldEr007/article/details/50598136

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

Question 1

Which of the following neural networks are examples of a feed-forward neural network?

Question 2

Consider a neural network with only one training case with input

x=(x1,x2,…,xn)⊤ and correct output

t . There is only one output neuron, which is linear, i.e.

y=w⊤x (notice that there are no biases). The loss function is squared error. The network has no hidden units, so the inputs are directly connected to the output neuron with weights

w=(w1,w2,…,wn)⊤ . We're in the process of training the neural network with the backpropagation algorithm. What will the algorithm add to

wi for the next iteration if we use a step size (also known as a learning rate) of

ϵ ?

ϵ(t−w⊤x)xi

xi if

w⊤x>t

−xi if

w⊤x≤t

ϵ(w⊤x−t)xi

Question 3

Suppose we have a set of examples and Brian comes in and duplicates every example, then randomly reorders the examples. We now have twice as many examples, but no more information about the problem than we had before. If we do not remove the duplicate entries, which one of the following methods will not be affected by this change, in terms of the computer time (time in seconds, for example) it takes to come close to convergence?

Full-batch learning.

Mini-batch learning, where for every iteration we randomly pick 100 training cases.

Online learning, where for every iteration we randomly pick a training case.

Question 4

Consider a linear output unit versus a logistic output unit for a feed-forward network with no hidden layer shown below. The network has a set of inputs

x and an output neuron

y connected to the input by weights

w and bias

b .

We're using the squared error cost function even though the task that we care about, in the end, is binary classification. At training time, the target output values are

1 (for one class) and

0 (for the other class). At test time we will use the classifier to make decisions in the standard way: the class of an input

x according to our model after training is as follows:

class of x={1 if wTx+b≥00 otherwise
Note that we will be training the network using

y , but that the decision rule shown above will be the same at test time, regardless of the type of output neuron we use for training. Which of the following statements is true?

The error function (the error as a function of the weights) for both types of units will form a quadratic bowl.

For a logistic unit, the derivatives of the error function with respect to the weights can have unbounded magnitude, while for a linear unit they will have bounded magnitude.

At the solution that minimizes the error, the learned weights are always the same for both types of units; they only differ in how they get to this solution.

Unlike a logistic unit, using a linear unit will penalize us for getting the answer right too confidently.

Question 5

Consider a neural network with one layer of logistic hidden units (intended to be fully connected to the input units) and a linear output unit. Suppose there are

n input units and

m hidden units. Which of the following statements are true? Check all that apply.

Any function that can be learned by such a network can also be learned by a network without any hidden layers (with the same inputs).

A network with

m>n has more learnable parameters than a network with

m≤n (for a fixed value of

n ).

As long as

m≥1 , this network can learn to compute any function that can be learned by a network without any hidden layers (with the same inputs).

m>n , this network can learn more functions than if

m is less than

n (with

n being the same).

Question 6

Brian wants to make his feed-forward network (with no hidden units) using a linear output neuron more powerful. He decides to combine the predictions of two networks by averaging them. The first network has weights

w1 and the second network has weights

w2 . The predictions of this network for an example

x are therefore:

y=12wT1x+12wT2x
Can we get the exact same predictions as this combination of networks by using a single feed-forward network (again with no hidden units) using a linear output neuron and weights