# 李理：自动梯度求解——cs231n的notes

## Optimization

### 复杂表达式的链式法则

# 设置自变量的值
x = -2; y = 5; z = -4

# “前向”计算f
q = x + y # q becomes 3
f = q * z # f becomes -12

# 从“后”往前“反向”计算
# 首先是 f = q * z
dfdz = q # 因为df/dz = q, 所以f对z的梯度是 3
dfdq = z # 因为df/dq = z, 所以f对q的梯度是 -4
# 然后 q = x + y
dfdx = 1.0 * dfdq # 因为dq/dx = 1，所以使用链式法则计算dfdx=-4
dfdy = 1.0 * dfdq # 因为dq/dy = 1，所以使用链式法则计算dfdy=-4


### Sigmoid模块的例子

σ(x)σ(x) 的导数我们之前已经推导过一次了，这里再列一下：

w = [2,-3,-3] # assume some random weights and data
x = [-1, -2]

# forward pass
dot = w[0]*x[0] + w[1]*x[1] + w[2]
f = 1.0 / (1 + math.exp(-dot)) # sigmoid function

# backward pass through the neuron (backpropagation)
ddot = (1 - f) * f # gradient on dot variable, using the sigmoid gradient derivation
dx = [w[0] * ddot, w[1] * ddot] # backprop into x
dw = [x[0] * ddot, x[1] * ddot, 1.0 * ddot] # backprop into w
# we're done! we have the gradients on the inputs to the circuit


### Staged computation练习

#### 前向计算

x = 3 # example values
y = -4

# forward pass
sigy = 1.0 / (1 + math.exp(-y)) # 分子上的sigmoid   #(1)
num = x + sigy # 分子                               #(2)
sigx = 1.0 / (1 + math.exp(-x)) # 分母上的sigmoid #(3)
xpy = x + y                                              #(4)
xpysqr = xpy**2                                          #(5)
den = sigx + xpysqr # 分母                        #(6)
invden = 1.0 / den                                       #(7)
f = num * invden # done!                                 #(8)

#### 反向计算

# backprop f = num * invden
dnum = invden # gradient on numerator                             #(8)
dinvden = num                                                     #(8)
# backprop invden = 1.0 / den
dden = (-1.0 / (den**2)) * dinvden                                #(7)
# backprop den = sigx + xpysqr
dsigx = (1) * dden                                                #(6)
dxpysqr = (1) * dden                                              #(6)
# backprop xpysqr = xpy**2
dxpy = (2 * xpy) * dxpysqr                                        #(5)
# backprop xpy = x + y
dx = (1) * dxpy                                                   #(4)
dy = (1) * dxpy                                                   #(4)
# backprop sigx = 1.0 / (1 + math.exp(-x))
dx += ((1 - sigx) * sigx) * dsigx # Notice += !! See notes below  #(3)
# backprop num = x + sigy
dx += (1) * dnum                                                  #(2)
dsigy = (1) * dnum                                                #(2)
# backprop sigy = 1.0 / (1 + math.exp(-y))
dy += ((1 - sigy) * sigy) * dsigy                                 #(1)
# done! phew

(8) f = num * invden

(7) invden = 1.0 / den

(6) den = sigx + xpysqr

(5) xpysqr = xpy**2

(4) xpy = x + y

(3) sigx = 1.0 / (1 + math.exp(-x))

(2) num = x + sigy

(1) sigy = 1.0 / (1 + math.exp(-y))

### 梯度的矩阵运算

# forward pass
W = np.random.randn(5, 10)
X = np.random.randn(10, 3)
D = W.dot(X)

# now suppose we had the gradient on D from above in the circuit
dD = np.random.randn(*D.shape) # same shape as D
dW = dD.dot(X.T) #.T gives the transpose of the matrix
dX = W.T.dot(dD)

#### cs231n - assignment1 - linear-svm 梯度推导

2016-07-15 22:59:27

#### cs231n-梯度下降及在线性分类器的应用

2017-02-26 09:21:18

#### cs231n笔记--到底什么是梯度消散

2017-09-04 15:33:17

#### 斯坦福CS231n课程笔记纯干货2

2017-03-28 21:32:39

#### 李理：自动梯度求解——使用自动求导实现多层神经网络

2016-11-22 16:47:49

#### cs231n的第一次作业svm

2017-02-14 16:19:19

#### CS231n作业笔记1.5：Softmax的误差以及梯度计算

2016-12-20 17:35:25

#### cs231n-assignment1-SVM/Softmax/two-layer-nets梯度求解

2017-03-27 15:19:21

#### cs231n课程作业assignment1（Softmax）

2016-11-26 11:44:10

#### softmax回归梯度公式推导及实现

2018-03-24 15:23:22