# TensorFlow技术内幕（十）：梯度的计算

#梯度的定义

$\nabla f(x_1, x_2, ..., x_n) = grad\space f(x_1, x_2, ..., x_n) = (\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n})$

##微分的计算

• Numerical Differentiation
• Symbolic Differentiation
• Automatic Differentiation

###Numertical differentiation

$\frac{\partial f}{\partial x}=f'(x)=\lim_{h\rightarrow0}\frac{f(x+h)-f(x)}{h}$

$error=|\frac{f(x+h)-f(x)}{h} - 3x^2|$

###Symbolic Differentitation

$\frac{\partial f}{\partial x}=sin(x)+xcos(x)$

###Automatic Differentiation

• case 1:

$\frac{dy}{dx}=\frac{dy}{dw_2}\frac{dw_2}{dw_1}\frac{dw_1}{dw_0}\frac{dw_0}{dx}$

• case 2:

$\frac{df}{dt}=\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}=\nabla f(x,y) \cdot \left[ \begin{matrix} \frac{dx}{dt} \\ \frac{dy}{dt} \end{matrix} \right]$

• case 3:

$\nabla f(s,t)=[\frac{df}{ds}, \frac{df}{dt}]=[\frac{df}{dx}\frac{dx}{ds}+\frac{df}{dy}\frac{dy}{ds}, \frac{df}{dx}\frac{dx}{dt}+\frac{df}{dy}\frac{dy}{dt}]=\nabla f(x,y)\cdot \left[ \begin{matrix} \frac{dx}{ds}, \frac{dx}{dt} \\ \frac{dy}{ds}, \frac{dy}{dt} \end{matrix} \right]$

• case 4:

$\frac{\partial f}{\partial t_i} = \sum_{j=1}^{n}\frac{\partial f}{\partial x_j}\frac{\partial x_j}{\partial t_i}$

• 正向计算：

$\frac{dw_i}{dx}=\frac{dw_i}{dw_{i-1}}\frac{dw_{i-1}}{dx}, w_3=y$

• 逆向计算：

$\frac{dy}{dw_i}=\frac{dy}{dw_{i+1}}\frac{dw_{i+1}}{dw_{i}}, w_0=x$

##梯度的计算

grad L(z) = dL/dz = 1
grad L(y) = dL/dy = dL/dz * dz/dy = grad L(z) * dz/dy
grad L(x) = dL/dx = dL/dy * dy/dx = grad L(y) * dy/dx



##复变函数的梯度计算

• 对于一个复变函数$C\rightarrow C： z=f(w), z=z_{real}+z_{imag}*i, w=w_{real}+w_{imag}*i$, 我们定义两个梯度，分别是实梯度$grad\_real\space f(w)=\frac{dz_{real}}{dw_{real}}+ i * \frac{dz_{real}}{dw_{imag}}$，虚梯度为$grad\_imag\space f(w)=\frac{dz_{imag}}{dw_{real}}+ i * \frac{dz_{imag}}{dw_{imag}}$.

grad_real L(z) = (dz_real/dz_real + i * dz_real/dz_imag) = (1, 0)
grad_imag L(z) = (dz_imag/dz_real + i * dz_imag/dz_real) = (0, i)

grad_real L(y) = (dz_real/dy_real + i * dz_real/dy_imag)
= (1, 0) * (dz_real/dy_real + i * dz_real/dy_imag)

And we know that for a analytic fucntion z = z(y) , wo get Cauchy–Riemann equations:
dz_real/dy_real = dz_imag/dy_imag
dz_real/dy_imag = - dz_imag/dz_real

and we know that :
dz/dy = dz_real/dy_real + i * dz_imag/dz_real
= dz_real/dy_real - i * dz_real/dy_imag
= dz_imag/dy_imag + i * dz_imag/dy_real
= dz_imag/dy_imag - i * dz_real/dy_imag

so,
grad_real L(y) = (1, 0) * (dz_real/dy_real + i * dz_real/dy_imag)
= grad_real(z) * conjugate(dz_real/dy_real - i * dz_real/dy_imag)

grad_imag L(y) = (dz_imag/dy_real + i * dz_imag/dy_imag)
= (0, i) * (dz_imag/dy_imag - i * dz_imag/dy_real)
= grad_imag(z) * conjugate(dz_imag/dy_imag + i * dz_imag/dy_real)

grad_real L(x) = dz_real/dx_real + i * dz_real/dx_imag
= (dz_real/dy_real * dy_real/dx_real + dz_real/dy_imag * dy_imag/dx_real) + i * (dz_real/dy_real*dy_real/dx_imag + dz_real/dy_imag * dy_imag/dx_imag)
= dz_real/dy_real * (dy_real/dx_real + i * dy_real/dx_imag) +  dz_real/dy_imag * (dy_imag/dx_real + i * dy_imag/dx_imag)
= dz_real/dy_real * (dy_real/dx_real + i * dy_real/dx_imag) +  i * dz_real/dy_imag * (dy_imag/dx_imag - i * dy_imag/dx_real)
= dz_real/dy_real * conjugate(dy_real/dx_real - i * dy_real/dx_imag) + i * dz_real/dy_imag * conjugate(dy_imag/dx_imag + i * dy_imag/dx_real)
= dz_real/dy_real * conjugate(dy/dx) + i * dz_real/dy_imag * conjugate(dy/dx)
= (dz_real/dy_real + i * dz_real/dy_imag) *  conjugate(dy/dx)

grad_real L(x) = dz_imag/dx_real + i * dz_imag/dx_imag
= (dz_imag/dy_real * dy_real/dx_real + dz_imag/dy_imag * dy_imag/dx_real) + i * (dz_imag/dy_real * dy_real/dx_imag + dz_imag/dy_imag * dy_imag/dx_imag)
= dz_imag/dy_real * (dy_real/dx_real + i * dy_real/dx_imag) + dz_imag/dy_imag * (dy_imag/dx_real + i * dy_imag/dx_imag)
= dz_imag/dy_real * (dy_real/dx_real + i * dy_real/dx_imag) + i * dz_imag/dy_imag * (dy_imag/dx_imag - dy_imag/dx_real)
= dz_imag/dy_real * conjugate(dy_real/dx_real - i * dy_real/dx_imag) + i * dz_imag/dy_imag * conjugate(dy_imag/dx_imag + dy_imag/dx_real)
= dz_imag/dy_real * conjugate(dy/dx) + i * dz_imag/dy_imag * conjugate(dy/dx)
= (dz_imag/dy_real + i * dz_imag/dy_imag) * conjugate(dy/dx)

Grad L(y) = Grad L(z) * conjugate(dz/dy)