Machine Learning(李宏毅)
Machine Learning and Deep Learning
1. Functions
- Regression:PM2.5
- Classification:chess
- Others:structured learning
2. The procedures of finding the functions
- Functions with unknown parameters
- Define loss from training data
- Optimization
- gradient descent
A) randomly set an initial value w
B) compute the ∂ l / ∂ w \partial l/\partial w ∂l/∂w
C) update w iteratively
- gradient descent
3. Models
- linear model
- sophisticated model
- linear curves:
all piecewise linear curves = constant + sum of set (sigmoid)
activation function:
1.hard sigmoid: which can be represented by sum of two ReLU
2.rectified linear unit(ReLU): m a x ( 0 , w x + b ) max(0,wx+b) max(0,wx+b)
3.soft sigmoid: c 1 + e − ( w x + b ) = c ∗ s i g m o i d ( w x + b ) \cfrac{c}{1+e^{-(wx+b)}}=c*sigmoid(wx+b) 1+e−(wx+b)c=c∗sigmoid(wx+b)
- Beyond piecewise curves
approximate continuous curve by a piecewise linear curve
to have a good approximate, we need sufficient pieces
- New model: More Features
y = b + ∑ i c i ∗ s i g m o i d ( ∑ j w i j x j + b i ) y = b + \sum_{i}{c_i * sigmoid(\sum_{j}w_{ij}x_j+b_i)} y=b+i∑ci∗sigmoid(j∑wijxj+bi)
r i = W i X + b i , a i = s i g m o i d ( r i ) r_i = W_i X+b_i ,a_i=sigmoid(ri) ri=WiX+bi,ai=sigmoid(ri)
y = b + C A y = b + CA y=b+CA
optimization of new model:
Θ = [ W B C ] \varTheta = [W B C] Θ=[WBC]
g r a d i e n t = ∣ ∂ L ∂ Θ 1 ∂ L ∂ Θ 2 . . . ∂ L ∂ Θ n ∣ gradient = \begin{vmatrix} \cfrac{\partial L}{\partial\varTheta_1} \\ \cfrac{\partial L}{\partial\varTheta_2} \\...\\\cfrac{\partial L}{\partial\varTheta_n} \end{vmatrix} gradient=∣∣∣∣∣∣∣∣∣∣∣∣∣∣∂Θ1∂L∂Θ2∂L...∂Θn∂L∣∣∣∣∣∣∣∣∣∣∣∣∣∣
g = ∇ L ( Θ 0 ) g = \nabla{L(\varTheta^0)} g=∇L(Θ0)
∣ Θ 1 1 Θ 2 1 . . . Θ n 1 ∣ = ∣ Θ 1 0 Θ 2 0 . . . Θ n 0 ∣ − η ∗ g \begin{vmatrix}\varTheta_1^1 \\ \varTheta_2^1\\...\\\varTheta_n^1 \end{vmatrix}=\begin{vmatrix} \varTheta_1^0 \\ \varTheta_2^0\\...\\\varTheta_n^0 \end{vmatrix} - \eta * g ∣∣∣∣∣∣∣∣Θ11Θ21...Θn1∣∣∣∣∣∣∣∣=∣∣∣∣∣∣∣∣Θ10Θ20...Θn0∣∣∣∣∣∣∣∣−η∗g
- epoch | batch | update | iteration
number of samples: 1000
batch size: 10
iterations: 100
epoch:1
video link: https://speech.ee.ntu.edu.tw/~hylee/ml/2021-spring.html.