01-ML基本概念

本文介绍了机器学习中的基础概念,包括不同类型的函数(如回归和分类)、结构化学习中的模型构建,以及梯度下降法优化过程中的损失函数(如MSE和MAE)。文章详细解释了如何通过调整超参数和使用Sigmoid和ReLU激活函数来改进模型。此外,还探讨了深度学习中深度与宽度的关系。
摘要由CSDN通过智能技术生成

ML基本概念

注意:对于公式显示问题,可复制文本到 Typora 中打开,主要看文字就好 ~~~

  • ML = Looking for function(f)

  • Different types of functions

    • Regression: f outputs a scalar

    • Classification: given classes, f outputs the correct one

    • Structured Learning: creat sth with structure(image,doc)

  • How to find a f ? => training

    • step1: f with unknown parameters

    • step2: define loss(L) from training data

      • loss is a f of parameters

      • loss means how good a set of value is

        eg:

        L=\frac{1}{n}\sum_{n}{e_n}\\MAE:e=|y-\hat y|\\MSE:e=(y-\hat y)^2

        MAE: L is mean absolute error

        MSE: L is mean square error

      • optimization: w^*,b^*=argmin_{w,b}L

        method: Gradient Descent

        • randomly pick an initail value w0

        • compute \frac{\partial L}{\partial w}|_{w=w_0}

          if nagative => increase w

          elif positive => decrease w

          so w_0 \to w_1

          what about the increment ?

          \textcolor{red}{\eta}\cdot\frac{\partial L}{\partial w}|_{w=w_0} (\eta is learning rate)

          \eta : a parameter that needs to be set by self => hyperparameter(超参数)

          in conclusion, w_1\leftarrow w_0-\eta\frac{\partial L}{\partial w}|_{w=w_0}

        • update w iteratively(反复迭代 w)

          故梯度下降法存在:局部最优解的问题(won't cause a problem actually)

        • In cases with multiple parameters, it's similar to having only a single parameter.

      prediction(then adjusting the model based on prediction results again...)

      The above example is based on the foundation of a linear model.

      但是线性模型具有一定的局限性(model bias)

      solution: add s set of piecewise linear fs

      You can modify the parameters(c,b,w) in the function to adjust the shape of it.

      So the new model got more features.

      $$
      y=b+\sum_ic_isigmoid(b_i+w_ix_i)\\ y=b+\sum_ic_isigmoid(b_i+\textcolor{green}{\sum_jw_{ij}x_j})
      $$

       

      i : number of sigmoid fs; j : number of features sigmoid()=\sigma()

      so this time Loss = L(\theta)

    • step3: optimization

      • \vec{\theta^*}=argmin_{\vec\theta}L

        • randomly pick initial values \vec\theta_0

        • gradient: \nabla

        • update \vec\theta iteratively

          • $$
            \vec\theta_1\leftarrow\vec\theta_0-\eta\vec g\\ \vec\theta_2\leftarrow\vec\theta_1-\eta\vec g\\ ......
            $$

        • if N = 10000, batch size = 10, how many update in 1 epoch?

          answer: 1000 updates

      • sigmoid \toReLU(Rectified Linear Unit): cmax(0,b+wx_0)

        $$
        sigmoid:y=b+\sum_ic_isigmoid(b_i+\sum_iw_{ij}x_j)\\ ReLU:y=b+\sum_{\textcolor{red}{2i}}c_imax(0,b_i+\sum_jw_{ij}x_j)
        $$

         

        which one is better? =>ReLU

      • multiple hidden layers

        Increasing this hyperparameter can reduce the value of Loss, but increases the complexity of the model.

  • deep means many hidden layers, but why want "deep" but not "fat"(just put all the neurons in a row)??? --hhh

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值