If the output of a model is given by y=f(\mathbf{x};W)y=f(x;W), then which of the following choices for ff are most appropriate when the task is binary classification?
Binary threshold
There can be more than one reasonable choice.
Linear
Linear threshold
Logistic sigmoid
2。第 2 个问题
题目解析:For every training case, the update to the weight matrix is determined by the output of the perceptron unit, this is 1 bit of information. However, we can also represent the model with an integer that stores whether we added / subtracted or left the weight matrix unchanged when we looked at that example (-1, 0, +1).
After learning using the Perceptron algorithm, how easy is it to express the learned weight vector in terms of the input vectors and the initial weight vector? Assume the input vectors have real-valued components.
It requires one bit per training case.
It is impossible.
It requires only one integer per training case.
It requires real numbers.
You might reasonably think that you needed real values to describe real-valued weights, but see the correct answer.
3。第 3 个问题
Suppose we are given three data points:
x1,01,10,1→t→1→1→0
Furthermore, we are given the following weight vector (where the bias is set to 0):
w=(0,−3)
Let ∣∣w(t)−w(t−1)∣∣2 be the distance between the weight vectors at iteration t and iterationt−1 of the perceptron learning algorithm. Here, for a given 2D vector v, ∣∣v∣∣2=v12+v22 (this is also called the Euclidean norm). What is the maximum amount by which the weight vectors can change between successive iterations? Note that in this example we are not learning the bias.
Let's say that at time tt we observe that we have misclassified some point \mathbf{\hat{x}}x^ with target \hat{t}t^. Then the learning algorithm will proceed as:
\mathbf{w}^{(t)} =
In either case, the distance between \mathbf{w}^{(t)}w(t) and \mathbf{w}^{(t-1)}w(t−1) will be ||\mathbf{w}^{(t-1)} - \mathbf{w}^{(t-1)} \pm \mathbf{\hat{x}}||_2=||\pm \mathbf{\hat{x}}||_2=||\mathbf{\hat{x}}||_2\leq\sqrt{2}∣∣w(t−1)−w(t−1)±x^∣∣2=∣∣±x^∣∣2=∣∣x^∣∣2≤2 since this is the length of the largest input vector (in this case, (1, 1)(1,1)).
答案:\sqrt{2}
4。第 4 个问题
Suppose that we have a perceptron with weight vector \mathbf{w}w and we create a new set of weights \mathbf{w}^*=c \mathbf{w}w∗=cw by scaling \mathbf{w}w by some positive constant cc.
Assume that the bias is zero.
True or false: if the perceptron now uses \mathbf{w}^*w∗ instead then it's classification decisions might change (that is, we have moved the classification boundary).
True
False
If the bias term is zero, all of the hyperplanes that represent individual cases go through the origin of weight space. So changing the length of the weight vector without changing its direction cannot change which side of the plane it lies on.
5。第 5 个问题
Suppose that we have a perceptron with weight vector \mathbf{w}w and we create a new set of weights \mathbf{w}^*=\mathbf{w} + \mathbf{c}w∗=w+c by adding some constant vector \mathbf{c}c to \mathbf{w}w. Assume that the bias is zero.
True or false: if the perceptron now uses \mathbf{w}^*w∗ instead then it's classification decisions might change (that is, we have moved the classification boundary).
False
True
Adding a constant vector can change the direction of the weight vector. This might change the side on which some data points lie.
6。第 6 个问题
Suppose we are given four training cases:
x1,11,00,10,0→t→1→0→0→1
It is impossible for a binary threshold unit to produce the desired target outputs for all four cases. Now suppose that we add an extra input dimension so that each of the four input vectors consists of three numbers instead of two.
Which of the following ways of setting the value of the extra input will create a set of four input vectors that is linearly separable (i.e. that can be given the right target values by a binary threshold unit with appropriate weights and bias).
Make the third value of each input vector be the same as the target value for that input vector.
Make the third value of each input vector be the same as the first value.
Make the third value of each input vector be the opposite of the first value (i.e. use 1 if the first value is 0 and 0 if the first value is 1)
Make the third value be 1 for one of the four input vectors and 0 for the other three.
7。第 7 个问题
Brian wants to use a neural network to predict the price of a stock tomorrow given today's price and the price over the last 10 days. The inputs to this network are price over the last 10 days and the output is tomorrow's price. The hidden units in this network receive information from the layer below, transmit information to the layer above and do not send information within the same layer. Is this an example of a feed-forward network or a recurrent network?
Recurrent
Feed-forward
Even though Brian's network is modelling a sequence, it is doing this in an entirely feed-forward fashion. Another name for this kind of model is a nonlinear autoregressive process. Recurrent networks are much more powerful for this task and can do a much better job, however they are also more difficult to train.
8。第 8 个问题
Brian and Andy are having an argument about the perceptron algorithm. They have a dataset that the perceptron cannot seem to classify (that is, it fails to converge to a solution). Andy reasons that if he could collect more examples, that might solve the problem by making the data set linearly separable and then the perceptron algorithm will converge. Brian claims that collecting more examples will not help. Which one of them is correct?
Andy
Brian
If any set A of points is not linearly separable from set B, then adding more examples to either set cannot make them linearly separable.