深度学习Deep learning小白入门笔记——学习李宏毅大佬视频笔记 20230708~20230710

孤光一点萤❂

已于 2023-07-10 20:08:04 修改

阅读量67

点赞数

文章标签：深度学习笔记学习

于 2023-07-10 20:06:27 首次发布

本文链接：https://blog.csdn.net/weixin_45698813/article/details/131646883

版权

Deep learning 2023/07/08~2023/07/10

Backpropagation

$\frac{\partial C}{\partial w} \rightarrow \frac{\partial z}{\partial w}\frac{\partial C}{\partial z}$
- Forward pass :
$\frac{\partial z}{\partial w} \tag{1}$

Compute (1) for all parameters
$\frac{\partial z}{\partial w1}=? \tag{1a}$

$\frac{\partial z}{\partial w2}=? \tag{1b}$

The value of the input connected by the weight.
- Backward pass :

$\frac{\partial C}{\partial z} \tag{2}$

Compute (2) for all activation function inputs z.

Compute (2) from the output layer.
$\frac{\partial C}{\partial z}=\frac{\partial a}{\partial z}\frac{\partial C}{\partial a} \tag{2a}$

$\frac{\partial C}{\partial z}=\frac{\partial z'}{\partial a}\frac{\partial C}{\partial z'} + \frac{\partial z''}{\partial a}\frac{\partial C}{\partial z''} \tag{2b}$

Backpropagation Summary

$\ Pass \Rightarrow \frac{\partial z}{\partial w} = a$

$\ Pass \Rightarrow \frac{\partial C}{\partial z}$

$Forward\ Pass \ * \ Backward \ Pass \Rightarrow \frac{\partial z}{\partial w}\frac{\partial C}{\partial z} = \frac{\partial C}{\partial w}$

Regression

Stock Market Forecast.
Self-driving Car
Recommendation
Estimating the Combat Power(CP) of a pokemon after evolution

Estimating the Combat Power(CP) of a pokemon after evolution

Step 1: Model
Step 2: Goodness of Function
- Loss function L:
  - Input: a function, output: how bad it is
    $L (f) = L (w, b)$
- $y=b+\sum w_ix_i$
  
  $L=\sum_n$
Step 3: Best Function & Gradient Descent
- Best Function
  $f^*=arg\min_fL(f)$
  
  $w^*,b^*=arg\ min_{w,b}L(w,b)$
- Gradient Descent
  - Consider loss function L(w) with one parameter w:
    $w^*=arg\ \min_wL(w)$
    - Pick an initial value w^0
    - Compute (1)
      $\frac{dL}{dw}|_{w=w^0} \tag{1}$
      
      $w^1 \leftarrow w^0-\eta\frac{dL}{dw}|_{w=w^0}$
      - η is called “learning rate”
    - Compute (2)
      $\frac{dL}{dw}|_{w=w^1} \tag{2}$
      - next
      $w^2 \leftarrow w^1-\eta \frac{dL}{dw}|_{w=w^1}$
  - How about two parameters ?
    $w^*,b^*=arg\min_{w,b}L(w,b)$
- Worry
  - In liner regression the loss function L is convex. (No local optimal)
- A more complex model dose not always lead to better performance on testing data
  - This is Overfitting
How to do Classification
- Training data for Classification
- Classification as Regression ?
  - Binary classification as example ?
  - Training: Class 1 means the target is 1; Class 2 means the target is -1.
  - Testing: closer to 1 → class 1; closer to -1 → class 2
- Ideal Alternatives
  - Function (Model):
    $\Rightarrow f(x)=\begin{cases} g(x)>0 \ \ \ Output = class 1 \\ else \ \ \ \ \ \ \ \ \ \ \ Output = class 2 \end{cases}$
Loss function:
$L(f)=\sum_n \delta(f(x^n)) \neq \hat{y}^n$
The number of times f get incorrect results on training data.
Find the best function:
- Example: Perceptron, SVM
Gaussian Distribution
$f_{u,\Sigma}(x)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma|^{1/2}}exp\{-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\}$
- Input: vector x, output: probability of sampling x.
  - The shape of the function determines by mean μ and covariance matrix ∑
Maximun Likelihood
$f_{u,\Sigma}(x)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma|^{1/2}}exp\{-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\}$
The Gaussian with any mean μ and covariance matrix ∑ can generate these points.
$\Rightarrow P(C_1|x)=\frac{P(x|C_1)P(C_1)}{P(x|C_1)P(C_1)+P(x|C_2)P(C_2)}$
Three Steps
- Function Set(Model) :
$\Rightarrow \begin{cases} if \ \ \ P(C_1|x) > 0.5,output: class \ 1 \\ Otherwise, \ \ \ \ \ \ \ \ \ \ \ \ \ output: class \ \ 2\end{cases}$
- Goodness of a function
  - The mean μ and covariance ∑ that maximzing the likelihood(the probability of generating data)
  - Find the best function: easy
Posterior Probability

$P(C_1|x)=\sigma(z) \ \ \ \ \ \ sigmoid \ \ \ \ \ \ z=ln\frac{P(x|C_1)P(C_1)}{P(x|C_2)P(C_2)}$

$z=ln\frac{P(x|C_1)}{P(x|C_2)} + ln\frac{P(C_1)}{P(C_2)} \Rightarrow \frac{\frac{N_1}{N_1+N_2}}{\frac{N_2}{N_1+N_2}}=\frac{N_1}{N_2}$

$P(x|C_1)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma^1|^{1/2}}exp\{-\frac{1}{2}(x-\mu^1)^T (\Sigma^{1})^{-1}(x-\mu^1)\} \tag{3a}$

$P(x|C_2)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma^2|^{1/2}}exp\{-\frac{1}{2}(x-\mu^2)^T (\Sigma^{2})^{-1}(x-\mu^2)\} \tag{3b}$

According to (3a) and (3b), it is derived as follows

$ln\frac{\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma^1|^{1/2}}exp\{-\frac{1}{2}(x-\mu^1)^T (\Sigma^{1})^{-1}(x-\mu^1)\}}{\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma^2|^{1/2}}exp\{-\frac{1}{2}(x-\mu^2)^T (\Sigma^{2})^{-1}(x-\mu^2)\}}$