ML基本概念
注意:对于公式显示问题,可复制文本到 Typora 中打开,主要看文字就好 ~~~
-
ML = Looking for function(f)
-
Different types of functions
-
Regression: f outputs a scalar
-
Classification: given classes, f outputs the correct one
-
Structured Learning: creat sth with structure(image,doc)
-
-
How to find a f ? => training
-
step1: f with unknown parameters
-
step2: define loss(L) from training data
-
loss is a f of parameters
-
loss means how good a set of value is
eg:
L=\frac{1}{n}\sum_{n}{e_n}\\MAE:e=|y-\hat y|\\MSE:e=(y-\hat y)^2
MAE: L is mean absolute error
MSE: L is mean square error
-
optimization: w^*,b^*=argmin_{w,b}L
method: Gradient Descent
-
randomly pick an initail value w0
-
compute \frac{\partial L}{\partial w}|_{w=w_0}
if nagative => increase w
elif positive => decrease w
so w_0 \to w_1
what about the increment ?
\textcolor{red}{\eta}\cdot\frac{\partial L}{\partial w}|_{w=w_0} (\eta is learning rate)
\eta : a parameter that needs to be set by self => hyperparameter(超参数)
in conclusion, w_1\leftarrow w_0-\eta\frac{\partial L}{\partial w}|_{w=w_0}
-
update w iteratively(反复迭代 w)
故梯度下降法存在:局部最优解的问题(won't cause a problem actually)
-
In cases with multiple parameters, it's similar to having only a single parameter.
-
prediction(then adjusting the model based on prediction results again...)
The above example is based on the foundation of a linear model.
但是线性模型具有一定的局限性(model bias)
solution: add s set of piecewise linear fs
You can modify the parameters(c,b,w) in the function to adjust the shape of it.
So the new model got more features.
$$
y=b+\sum_ic_isigmoid(b_i+w_ix_i)\\ y=b+\sum_ic_isigmoid(b_i+\textcolor{green}{\sum_jw_{ij}x_j})
$$i : number of sigmoid fs; j : number of features sigmoid()=\sigma()
so this time Loss = L(\theta)
-
-
step3: optimization
-
\vec{\theta^*}=argmin_{\vec\theta}L
-
randomly pick initial values \vec\theta_0
gradient: \nabla
-
update \vec\theta iteratively
-
$$
\vec\theta_1\leftarrow\vec\theta_0-\eta\vec g\\ \vec\theta_2\leftarrow\vec\theta_1-\eta\vec g\\ ......
$$
-
if N = 10000, batch size = 10, how many update in 1 epoch?
answer: 1000 updates
-
-
sigmoid \toReLU(Rectified Linear Unit): cmax(0,b+wx_0)
$$
sigmoid:y=b+\sum_ic_isigmoid(b_i+\sum_iw_{ij}x_j)\\ ReLU:y=b+\sum_{\textcolor{red}{2i}}c_imax(0,b_i+\sum_jw_{ij}x_j)
$$which one is better? =>ReLU
-
multiple hidden layers
Increasing this hyperparameter can reduce the value of Loss, but increases the complexity of the model.
-
-
deep means many hidden layers, but why want "deep" but not "fat"(just put all the neurons in a row)??? --hhh