Test in Lecture 2- Neural Network of Hinton

最新推荐文章于 2022-12-13 22:16:48 发布

SophieCXT

最新推荐文章于 2022-12-13 22:16:48 发布

阅读量403

点赞数

分类专栏： Neural network for machine lea

本文链接：https://blog.csdn.net/SophieCXT/article/details/80368476

版权

Neural network for machine lea 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

If the output of a model is given by $y=f(\mathbf{x};W)$ , then which of the following choices for $f$ are most appropriate when the task is binary classification?

Binary threshold

正确

There can be more than one reasonable choice.

Linear

未选择的是正确的

Linear threshold

未选择的是正确的

Logistic sigmoid

这应该被选择

第 2 个问题

错误

0 / 1 分

2。第 2 个问题

题目解析：For every training case, the update to the weight matrix is determined by the output of the perceptron unit, this is 1 bit of information. However, we can also represent the model with an integer that stores whether we added / subtracted or left the weight matrix unchanged when we looked at that example (-1, 0, +1).

After learning using the Perceptron algorithm, how easy is it to express the learned weight vector in terms of the input vectors and the initial weight vector? Assume the input vectors have real-valued components.

It requires one bit per training case.

这应该被选择

It is impossible.

未选择的是正确的

It requires only one integer per training case.

这应该被选择

It requires real numbers.

这个选项的答案不正确

You might reasonably think that you needed real values to describe real-valued weights, but see the correct answer.

第 3 个问题

错误

0 / 1 分

3。第 3 个问题

Suppose we are given three data points:

x1,01,10,1→t→1→1→0

Furthermore, we are given the following weight vector (where the bias is set to 0):

$w = (0, - 3)$

Let $∣ ∣ w^{(t)} - w^{(t - 1)} ∣ ∣_{2}$ be the distance between the weight vectors at iteration $t$ and iteration $t - 1$ of the perceptron learning algorithm. Here, for a given 2D vector $v$ , $∣ ∣ v ∣ ∣_{2} = v_{12} + v_{22}$ (this is also called the Euclidean norm). What is the maximum amount by which the weight vectors can change between successive iterations? Note that in this example we are not learning the bias.

解析：

Let's say that at time $t$ we observe that we have misclassified some point $\mathbf{\hat{x}}$ with target $\hat{t}$ . Then the learning algorithm will proceed as:

$\mathbf{w}^{(t)} =$

{w(t−1)+x^ if t^=1w(t−1)−x^ if t^=0

w(t)={w(t−1)+x^ if t^=1w(t−1)−x^ if t^=0​

In either case, the distance between $\mathbf{w}^{(t)}$ and $\mathbf{w}^{(t-1)}$ will be $||\mathbf{w}^{(t-1)} - \mathbf{w}^{(t-1)} \pm \mathbf{\hat{x}}||_2=||\pm \mathbf{\hat{x}}||_2=||\mathbf{\hat{x}}||_2\leq\sqrt{2}$ since this is the length of the largest input vector (in this case, $(1, 1)$ ).

$答案：\sqrt{2}$

第 4 个问题

正确

1 / 1 分

4。第 4 个问题

Suppose that we have a perceptron with weight vector $\mathbf{w}$ and we create a new set of weights $\mathbf{w}^*=c \mathbf{w}$ by scaling $\mathbf{w}$ by some positive constant $c$ .

Assume that the bias is zero.

True or false: if the perceptron now uses $\mathbf{w}^*$ instead then it's classification decisions might change (that is, we have moved the classification boundary).

True

False

正确

If the bias term is zero, all of the hyperplanes that represent individual cases go through the origin of weight space. So changing the length of the weight vector without changing its direction cannot change which side of the plane it lies on.

第 5 个问题

正确

1 / 1 分

5。第 5 个问题

Suppose that we have a perceptron with weight vector $\mathbf{w}$ and we create a new set of weights $\mathbf{w}^*=\mathbf{w} + \mathbf{c}$ by adding some constant vector $\mathbf{c}$ to $\mathbf{w}$ . Assume that the bias is zero.

True or false: if the perceptron now uses $\mathbf{w}^*$ instead then it's classification decisions might change (that is, we have moved the classification boundary).

False

True

正确

Adding a constant vector can change the direction of the weight vector. This might change the side on which some data points lie.

第 6 个问题

错误

0 / 1 分

6。第 6 个问题

Suppose we are given four training cases:

x1,11,00,10,0→t→1→0→0→1

It is impossible for a binary threshold unit to produce the desired target outputs for all four cases. Now suppose that we add an extra input dimension so that each of the four input vectors consists of three numbers instead of two.

Which of the following ways of setting the value of the extra input will create a set of four input vectors that is linearly separable (i.e. that can be given the right target values by a binary threshold unit with appropriate weights and bias).

Make the third value of each input vector be the same as the target value for that input vector.

正确

Make the third value of each input vector be the same as the first value.

这个选项的答案不正确

Make the third value of each input vector be the opposite of the first value (i.e. use 1 if the first value is 0 and 0 if the first value is 1)

未选择的是正确的

Make the third value be 1 for one of the four input vectors and 0 for the other three.

这应该被选择

第 7 个问题

正确

1 / 1 分

7。第 7 个问题

Brian wants to use a neural network to predict the price of a stock tomorrow given today's price and the price over the last 10 days. The inputs to this network are price over the last 10 days and the output is tomorrow's price. The hidden units in this network receive information from the layer below, transmit information to the layer above and do not send information within the same layer. Is this an example of a feed-forward network or a recurrent network?

Recurrent

Feed-forward

正确

Even though Brian's network is modelling a sequence, it is doing this in an entirely feed-forward fashion. Another name for this kind of model is a nonlinear autoregressive process. Recurrent networks are much more powerful for this task and can do a much better job, however they are also more difficult to train.

第 8 个问题

正确

1 / 1 分

8。第 8 个问题

Brian and Andy are having an argument about the perceptron algorithm. They have a dataset that the perceptron cannot seem to classify (that is, it fails to converge to a solution). Andy reasons that if he could collect more examples, that might solve the problem by making the data set linearly separable and then the perceptron algorithm will converge. Brian claims that collecting more examples will not help. Which one of them is correct?

Andy

Brian

正确

If any set A of points is not linearly separable from set B, then adding more examples to either set cannot make them linearly separable.