The code of assignment 1
In this assignment, there are some algorithms that people rarely use. So, this article only discusses the components of CNN.
Softmax classifier
Formula:
P
(
Y
=
k
∣
X
=
x
i
)
=
e
k
s
∑
j
e
j
s
P(Y=k|X=x_i) = \frac{e^s_k}{\sum_j e^s_j}
P(Y=k∣X=xi)=∑jejseks
The s is the score, k is the actual label’s index, j is the j-th class.
There are two ways to implement softmax, one is use loop, other one is implemented by vector.
1. Naive implementation
PIPELINE:
- Loop all the example to calculate the score (X*W).
- Calculate the single loss, don’t forget to set the max to zero.
- Calculate the probability.
- Compute the grad of W. (Predicte True or False)
- Calculate the loss and gradient with regularization
'''
X:(N, D)
y:(N,)
W:(D, C)
here, the D is the features, N: number of examples, C: classes
'''
# get the example's num
num_example = X.shape[0]
num_classes = W.shape[1]
# loop to calculate the score and loss
for i in range(num_example):
# calculate each example's score
scores = X[i].dot(W)
# set the max value to 0, implement the numerical stability
scores_stab = scores - max(scores)
# calculate the loss: loss = log(1-e^scores)+ log(sum(e^scores)), (y=0,y=1)
loss_i = -scores_stab[y[i]] + np.log(np.sum(np.exp(scores_stab)))
# sum all the loss
loss += loss_i
# calculate the prob
for j in range(num_classes):
# prob = e^score(actual)/ sum(e^score(current))
prob = np.exp(scores_stab[y[j]]) / np.sum(np.exp(scores_stab))
# calculate dw
if j == y[j]:
# dw = (s[i] - 1)X
dw += (scores_stab - 1) * X[i]
else:
# dw = s[i]X
dw += scores_stab * X[i]
# loss avg
loss /= num_example
# add L2 regularization to loss and dW
loss = 0.5 * reg * np.sum(W*W)
dW = dW / num_example + reg * W
2. Vectorized implementation
The pipeline with navie implementation, just without using loop.
'''
X:(N, D)
y:(N,)
W:(D, C)
here, the D is the features, N: number of examples, C: classes
'''
# get the number of classes
num_classes = W.shape[1]
# get the number of examples
num_example = X.shape[0]
# calulate scores, scores' shape (N,C)
scores = X.dot(W)
# numeric stability
# np.max(scores, axis=1) shape (N,), we need to reshape it
scores_stab = scores - np.max(scores, axis=1).reshape(-1,1)
# softmax output, prob
softmaxt_output = np.exp(scores_stab) / np.sum(np.exp(scores_stab), axis=1).reshape(-1,1)
# loss
loss = -np.sum(np.log(scores_stab[range(num_example), list(y)]))
loss /= num_example
# add reg to loss
loss = 0.5 * reg * np.sum(W * W)
# grad
dScores = scores_stab.copy()
# add all the true label's set the value to -1, which is the derivative of the true label
dScores[range(num_example), list(y)] += -1
dW = (X.T).dot(dScores)
# add reg to dW
dW = dW / num_examples + reg * W
Affine layer
In a word, that affine layer forward is a linear layer with weight and bias
y
=
f
(
W
x
+
b
)
y = f(Wx+b)
y=f(Wx+b)
1. Forward
'''
x: (N, d_1,...d_k) need to reshape into (N, D)
w: (D, M)
b: (M,)
'''
# reshape x into (N, D)
x_reshape = x.reshape(x.shape[0], -1)
# out = wx + b
out = x_reshape.dot(w) + b
2. Backward
'''
dout: Upstream derivative, of shape (N, M)
'''
# out = wx + b
# dx = w*dout (N,M) * (M,D)= (N, D) =>reshape (N, d_1,...d_k)
dx = dout.dot(w.T).reshape(x.shape)
# dw = x_reshape* dout; (N, D) => transpose x_reshape.T => dot (D, N) => w (D, M)
dw = x_reshape.T.dot(dout)
# db = sum dout's row; b (M, )
db = np.sum(dout, axis=0)
ReLU activation
ReLU:
i
f
x
≤
0
:
y
=
0
;
i
f
x
>
0
:
y
=
x
if \ x\leq 0: y = 0; \\ if \ x > 0: y = x
if x≤0:y=0;if x>0:y=x
1. Forward
"""
Input:
- x: Inputs, of any shape
Returns a tuple of:
- out: Output, of the same shape as x
- cache: x
"""
# if x > 0, = x; otherwise = 0
out = np.maximum(0, x)
cache = x
2. Backward
"""
Input:
- dout: Upstream derivatives, of any shape
- cache: Input x, of same shape as dout
Returns:
- dx: Gradient with respect to x
"""
out = np.maximum(0, x)
# if x<=0, y(x)' = 0; otherwise, y(x)'= 1
# let out > 0 part to 1
out[out > 0] = 1
# Upstream derivatives
dx = out * dout
Affine_ReLU(“Sandwich Layers”)
Change the f() to ReLU function.
1. Forward
"""
1. calculate the forward by 'affine_forward'
2. pass the out to 'relu_forward'
"""
score, fc_cache = affine_forward(x,w,b)
# pass the score to relu_forward
out, relu_cache = relu_forward(score)
2.Backward
"""
from the chain rule, we know we need to calulate the derviative from the back function
relu_backward -> affine_backward
"""
daffine = relu_backward(dout, relu_cache)
dx,dw,db = affine_backward(daffine, fc_cache)