2379人阅读 评论(0)

# Neural Networks: Representation

## 1. Which of the following statements are true? Check all that apply.

• (OK) In a neural network with many layers, we think of each successive layer as being able to use the earlier layers as features, so as to be able to compute increasingly complex functions.
• (OK) If a neural network is overfitting the data, one solution would be to increase the regularization parameter λ.
• If a neural network is overfitting the data, one solution would be to decrease the regularization parameter λ.
• Suppose you have a multi-class classification problem with three classes, trained with a 3 layer network. Let a(3)1=(hΘ(x))1 be the activation of the first output unit, and similarly a(3)2=(hΘ(x))2 and a(3)3=(hΘ(x))3. Then for any input x, it must be the case that a(3)1+a(3)2+a(3)3=1.

## 2. Consider the following neural network which takes two binary-valued inputs x1,x2∈{0,1} and outputs hΘ(x). Which of the following logical functions does it (approximately) compute?

+1     -20
x1      30     ()     hΘ(x)
x2      30
• (OK) OR
• AND
• NAND (meaning “NOT AND”)
• XOR (exclusive OR)

## 3. Consider the neural network given below. Which of the following equations correctly computes the activation a(3)1? Note: g(z) is the sigmoid activation function.

       +1     +1     +1
x1     ()     ()     ()
x2     ()     ()
layer  1      2      3      4
• (OK) a(3)1=g(Θ(2)1,0a(2)0+Θ(2)1,1a(2)1+Θ(2)1,2a(2)2)
• a(3)1=g(Θ(1)1,0a(1)0+Θ(1)1,1a(1)1+Θ(1)1,2a(1)2)
• a(3)1=g(Θ(1)1,0a(2)0+Θ(1)1,1a(2)1+Θ(1)1,2a(2)2)
• The activation a(3)1 is not present in this network.

## 4. You have the following neural network:

       +1     ()
x1     ()     ()     hΘ(x)
x2     ()
layer  1      2      3

You’d like to compute the activations of the hidden layer a(2)∈ℝ3. One way to do so is the following Octave code:

% Theta1 is Theta with superscript "(1)" from lecture
% ie, the matrix of parameters for the mapping from layer 1 (input) to layer 2
% Theta1 has size 3x3
% Assume 'sigmoid' is a built-in function to compute 1 / (1 + exp(-z))

a2 = zeros (3, 1);
for i = 1:3
for j = 1:3
a2(i) = a2(i) + x(j) * Theta1(i, j);
end
a2(i) = sigmoid (a2(i));
end

You want to have a vectorized implementation of this (i.e., one that does not use for loops). Which of the following implementations correctly compute a(2)? Check all that apply.

• (OK) a2 = sigmoid (Theta1 * x);
• a2 = sigmoid (x * Theta1);
• a2 = sigmoid (Theta2 * x);
• z = sigmoid(x); a2 = Theta1 * z;

## 5. You are using the neural network pictured below and have learned the parameters Θ(1)=[1111.72.43.2] (used to compute a(2)) and Θ(2)=[10.3−1.2] (used to compute a(3)} as a function of a(2)). Suppose you swap the parameters for the first hidden layer between its two units so Θ(1)=[111.713.22.4] and also swap the output layer so Θ(2)=[1−1.20.3]. How will this change the value of the output hΘ(x)?

       +1     ()
x1     ()     ()     hΘ(x)
x2     ()
layer  1      2      3
• (OK) It will stay the same.
• It will increase.
• It will decrease
• Insufficient information to tell: it may increase or decrease.

# Programming Assignment: Multi-class Classification and Neural Networks

lrCostFunction.m

T = theta;
T(1) = 0;
S = sigmoid(X * theta);
J = ( (-y' * log(S)) - ((1 - y') * log(1-S)) ) / m + lambda / (2 * m) * sum(T .^ 2);
grad = (S - y)' * X / m + lambda / m * T';

oneVsAll.m

initial_theta = zeros(n + 1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c = 1:num_labels
[theta] = ...
fmincg(@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
initial_theta, options);
all_theta(c,:) = theta';
end;

predictOneVsAll.m

[max_value, p] = max(sigmoid(X * all_theta'), [], 2);

predict.m

a1 = [ones(m, 1) X];
a2 = [ones(m, 1) sigmoid(a1 * Theta1')];
[max_value, p] = max(sigmoid(a2 * Theta2'), [], 2);

-eof-

0
0

* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
个人资料
• 访问：42291次
• 积分：975
• 等级：
• 排名：千里之外
• 原创：54篇
• 转载：0篇
• 译文：0篇
• 评论：18条
文章分类
阅读排行
评论排行
最新评论
超级网站