cs231n-AssignmentNote00-Assignment 1

The code of assignment 1

In this assignment, there are some algorithms that people rarely use. So, this article only discusses the components of CNN.

Softmax classifier

Formula:
P ( Y = k ∣ X = x i ) = e k s ∑ j e j s P(Y=k|X=x_i) = \frac{e^s_k}{\sum_j e^s_j} P(Y=kX=xi)=jejseks
The s is the score, k is the actual label’s index, j is the j-th class.

There are two ways to implement softmax, one is use loop, other one is implemented by vector.

1. Naive implementation

PIPELINE:

  1. Loop all the example to calculate the score (X*W).
  2. Calculate the single loss, don’t forget to set the max to zero.
  3. Calculate the probability.
  4. Compute the grad of W. (Predicte True or False)
  5. Calculate the loss and gradient with regularization
'''
X:(N, D)
y:(N,)
W:(D, C)
here, the D is the features, N: number of examples, C: classes
'''
# get the example's num
num_example = X.shape[0]
num_classes = W.shape[1]

# loop to calculate the score and loss
for i in range(num_example):
  # calculate each example's score
  scores = X[i].dot(W)
  # set the max value to 0, implement the numerical stability
  scores_stab = scores - max(scores)
  # calculate the loss: loss = log(1-e^scores)+ log(sum(e^scores)), (y=0,y=1)
  loss_i = -scores_stab[y[i]] + np.log(np.sum(np.exp(scores_stab)))
  # sum all the loss
  loss += loss_i
  
  # calculate the prob
  for j in range(num_classes):
    # prob = e^score(actual)/ sum(e^score(current))
    prob = np.exp(scores_stab[y[j]]) / np.sum(np.exp(scores_stab))
    # calculate dw
    if j == y[j]:
			# dw = (s[i] - 1)X
    	dw += (scores_stab - 1) * X[i]
  	else:
      # dw = s[i]X
      dw += scores_stab * X[i]
    
    # loss avg
    loss /= num_example
    # add L2 regularization to loss and dW
    loss = 0.5 * reg * np.sum(W*W)
    
    dW = dW / num_example + reg * W
    
  

2. Vectorized implementation

The pipeline with navie implementation, just without using loop.

'''
X:(N, D)
y:(N,)
W:(D, C)
here, the D is the features, N: number of examples, C: classes
'''
# get the number of classes
num_classes = W.shape[1]
# get the number of examples
num_example = X.shape[0]
# calulate scores, scores' shape (N,C)
scores = X.dot(W)
# numeric stability
	# np.max(scores, axis=1) shape (N,), we need to reshape it
scores_stab = scores - np.max(scores, axis=1).reshape(-1,1)
# softmax output, prob
softmaxt_output = np.exp(scores_stab) / np.sum(np.exp(scores_stab), axis=1).reshape(-1,1)
# loss
loss = -np.sum(np.log(scores_stab[range(num_example), list(y)]))
loss /= num_example
# add reg to loss
loss = 0.5 * reg * np.sum(W * W)
# grad
dScores = scores_stab.copy()
# add all the true label's set the value to -1, which is the derivative of the true label
dScores[range(num_example), list(y)] += -1
dW = (X.T).dot(dScores)
# add reg to dW
dW = dW / num_examples + reg * W

Affine layer

In a word, that affine layer forward is a linear layer with weight and bias
y = f ( W x + b ) y = f(Wx+b) y=f(Wx+b)

1. Forward

'''
x: (N, d_1,...d_k) need to reshape into (N, D)
w: (D, M)
b: (M,)
'''
# reshape x into (N, D)
x_reshape = x.reshape(x.shape[0], -1)
# out = wx + b
out = x_reshape.dot(w) + b

2. Backward

'''
dout: Upstream derivative, of shape (N, M)
'''
# out = wx + b
# dx = w*dout (N,M) * (M,D)= (N, D) =>reshape (N, d_1,...d_k)
dx = dout.dot(w.T).reshape(x.shape)
# dw = x_reshape* dout; (N, D) => transpose x_reshape.T => dot (D, N) => w (D, M)
dw = x_reshape.T.dot(dout) 
# db = sum dout's row; b (M, )
db = np.sum(dout, axis=0)

ReLU activation

ReLU:
i f   x ≤ 0 : y = 0 ; i f   x > 0 : y = x if \ x\leq 0: y = 0; \\ if \ x > 0: y = x if x0:y=0;if x>0:y=x

1. Forward

"""
Input:
  - x: Inputs, of any shape
Returns a tuple of:
  - out: Output, of the same shape as x
  - cache: x
"""
# if x > 0, = x; otherwise = 0
out = np.maximum(0, x)
cache = x

2. Backward

"""
Input:
  - dout: Upstream derivatives, of any shape
  - cache: Input x, of same shape as dout

Returns:
  - dx: Gradient with respect to x
"""
out = np.maximum(0, x)
# if x<=0, y(x)' = 0; otherwise, y(x)'= 1
# let out > 0 part to 1
out[out > 0] = 1
# Upstream derivatives
dx = out * dout

Affine_ReLU(“Sandwich Layers”)

Change the f() to ReLU function.

1. Forward

"""
1. calculate the forward by 'affine_forward'
2. pass the out to 'relu_forward'
"""
score, fc_cache = affine_forward(x,w,b)
# pass the score to relu_forward
out, relu_cache = relu_forward(score)

2.Backward

"""
from the chain rule, we know we need to calulate the derviative from the back function 
relu_backward -> affine_backward
"""
daffine = relu_backward(dout, relu_cache)
dx,dw,db = affine_backward(daffine, fc_cache)



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值