cs231n-AssignmentNote00-Assignment 1

最新推荐文章于 2024-10-01 23:50:13 发布

itspollyyy

最新推荐文章于 2024-10-01 23:50:13 发布

阅读量39

点赞数

分类专栏： CV 文章标签：机器学习深度学习神经网络

本文链接：https://blog.csdn.net/weixin_43399179/article/details/134527598

版权

CV 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

The code of assignment 1

In this assignment, there are some algorithms that people rarely use. So, this article only discusses the components of CNN.

Softmax classifier

Formula：
$P(Y=k|X=x_i) = \frac{e^s_k}{\sum_j e^s_j}$
The s is the score, k is the actual label’s index, j is the j-th class.

There are two ways to implement softmax, one is use loop, other one is implemented by vector.

1. Naive implementation

PIPELINE:

Loop all the example to calculate the score (X*W).
Calculate the single loss, don’t forget to set the max to zero.
Calculate the probability.
Compute the grad of W. (Predicte True or False)
Calculate the loss and gradient with regularization

'''
X:(N, D)
y:(N,)
W:(D, C)
here, the D is the features, N: number of examples, C: classes
'''
# get the example's num
num_example = X.shape[0]
num_classes = W.shape[1]

# loop to calculate the score and loss
for i in range(num_example):
  # calculate each example's score
  scores = X[i].dot(W)
  # set the max value to 0, implement the numerical stability
  scores_stab = scores - max(scores)
  # calculate the loss: loss = log(1-e^scores)+ log(sum(e^scores)), (y=0,y=1)
  loss_i = -scores_stab[y[i]] + np.log(np.sum(np.exp(scores_stab)))
  # sum all the loss
  loss += loss_i
  
  # calculate the prob
  for j in range(num_classes):
    # prob = e^score(actual)/ sum(e^score(current))
    prob = np.exp(scores_stab[y[j]]) / np.sum(np.exp(scores_stab))
    # calculate dw
    if j == y[j]:
			# dw = (s[i] - 1)X
    	dw += (scores_stab - 1) * X[i]
  	else:
      # dw = s[i]X
      dw += scores_stab * X[i]
    
    # loss avg
    loss /= num_example
    # add L2 regularization to loss and dW
    loss = 0.5 * reg * np.sum(W*W)
    
    dW = dW / num_example + reg * W

2. Vectorized implementation

The pipeline with navie implementation, just without using loop.

'''
X:(N, D)
y:(N,)
W:(D, C)
here, the D is the features, N: number of examples, C: classes
'''
# get the number of classes
num_classes = W.shape[1]
# get the number of examples
num_example = X.shape[0]
# calulate scores, scores' shape (N,C)
scores = X.dot(W)
# numeric stability
	# np.max(scores, axis=1) shape (N,), we need to reshape it
scores_stab = scores - np.max(scores, axis=1).reshape(-1,1)
# softmax output, prob
softmaxt_output = np.exp(scores_stab) / np.sum(np.exp(scores_stab), axis=1).reshape(-1,1)
# loss
loss = -np.sum(np.log(scores_stab[range(num_example), list(y)]))
loss /= num_example
# add reg to loss
loss = 0.5 * reg * np.sum(W * W)
# grad
dScores = scores_stab.copy()
# add all the true label's set the value to -1, which is the derivative of the true label
dScores[range(num_example), list(y)] += -1
dW = (X.T).dot(dScores)
# add reg to dW
dW = dW / num_examples + reg * W

Affine layer

In a word, that affine layer forward is a linear layer with weight and bias
$y = f (W x + b)$

1. Forward

'''
x: (N, d_1,...d_k) need to reshape into (N, D)
w: (D, M)
b: (M,)
'''
# reshape x into (N, D)
x_reshape = x.reshape(x.shape[0], -1)
# out = wx + b
out = x_reshape.dot(w) + b

2. Backward

'''
dout: Upstream derivative, of shape (N, M)
'''
# out = wx + b
# dx = w*dout (N,M) * (M,D)= (N, D) =>reshape (N, d_1,...d_k)
dx = dout.dot(w.T).reshape(x.shape)
# dw = x_reshape* dout; (N, D) => transpose x_reshape.T => dot (D, N) => w (D, M)
dw = x_reshape.T.dot(dout) 
# db = sum dout's row; b (M, )
db = np.sum(dout, axis=0)

ReLU activation

ReLU:
$\ x\leq 0: y = 0; \\ if \ x > 0: y = x$

1. Forward

"""
Input:
  - x: Inputs, of any shape
Returns a tuple of:
  - out: Output, of the same shape as x
  - cache: x
"""
# if x > 0, = x; otherwise = 0
out = np.maximum(0, x)
cache = x

2. Backward

"""
Input:
  - dout: Upstream derivatives, of any shape
  - cache: Input x, of same shape as dout

Returns:
  - dx: Gradient with respect to x
"""
out = np.maximum(0, x)
# if x<=0, y(x)' = 0; otherwise, y(x)'= 1
# let out > 0 part to 1
out[out > 0] = 1
# Upstream derivatives
dx = out * dout

Affine_ReLU(“Sandwich Layers”)

Change the f() to ReLU function.

1. Forward

"""
1. calculate the forward by 'affine_forward'
2. pass the out to 'relu_forward'
"""
score, fc_cache = affine_forward(x,w,b)
# pass the score to relu_forward
out, relu_cache = relu_forward(score)

2.Backward

"""
from the chain rule, we know we need to calulate the derviative from the back function 
relu_backward -> affine_backward
"""
daffine = relu_backward(dout, relu_cache)
dx,dw,db = affine_backward(daffine, fc_cache)