机器学习分为三个步骤:
- 构建函数
![](https://pic4.zhimg.com/v2-f0c1f907a9a350111b1c3b66af925837_b.jpg)
![](https://pic4.zhimg.com/80/v2-f0c1f907a9a350111b1c3b66af925837_1440w.webp)
2. 定义loss(也是函数)
![](https://pic4.zhimg.com/v2-a249bf10a488f26eb83e6d7660e34aa3_b.jpg)
![](https://pic4.zhimg.com/80/v2-a249bf10a488f26eb83e6d7660e34aa3_1440w.webp)
![](https://pic2.zhimg.com/v2-7bbda24b418da0f08fcbec1ba591da81_b.jpg)
![](https://pic2.zhimg.com/80/v2-7bbda24b418da0f08fcbec1ba591da81_1440w.webp)
![](https://pic3.zhimg.com/v2-193b0a1ddddff56d9a0a6a452f7d01ee_b.jpg)
![](https://pic3.zhimg.com/80/v2-193b0a1ddddff56d9a0a6a452f7d01ee_1440w.webp)
3.优化
argminF(x,y) 就是指当 F(x,y) 取得最小值时,变量 x,y 的取值
![](https://pic2.zhimg.com/v2-a58612afb9de2e0b58bd43de07d834f1_b.jpg)
![](https://pic2.zhimg.com/80/v2-a58612afb9de2e0b58bd43de07d834f1_1440w.webp)
![](https://pic1.zhimg.com/v2-bde212d5c98db342216e3aac9ba2d158_b.jpg)
![](https://pic1.zhimg.com/80/v2-bde212d5c98db342216e3aac9ba2d158_1440w.webp)
每一步的大小取决于:斜率的大小,自己设定的超参数学习率
![](https://pic2.zhimg.com/v2-1b85e56d7f0e9dc2e49ad2f9a9552265_b.jpg)
![](https://pic2.zhimg.com/80/v2-1b85e56d7f0e9dc2e49ad2f9a9552265_1440w.webp)
是负号才能保证,L随着w的增大而减小,此时dL/dw斜率是负的,所以w才能增大
gradient decent结束更新循环的方式:1. 达到上限自己设置的次数,2. 微分值dL/dw为0-->缺点:可能停在loss局部最小值处,而不是全局(global minimum)
![](https://pic3.zhimg.com/v2-9666237684d87b237f96d6feb2f0ac22_b.jpg)
![](https://pic3.zhimg.com/80/v2-9666237684d87b237f96d6feb2f0ac22_1440w.webp)
![](https://pic3.zhimg.com/v2-68083c46ec89ca6bbf747d7c2ce34596_b.jpg)
![](https://pic3.zhimg.com/80/v2-68083c46ec89ca6bbf747d7c2ce34596_1440w.webp)
分析结果:
![](https://pic2.zhimg.com/v2-95f531acbb7e0a0cf06141b73a14f4a9_b.jpg)
![](https://pic2.zhimg.com/80/v2-95f531acbb7e0a0cf06141b73a14f4a9_1440w.webp)
![](https://pic2.zhimg.com/v2-6210fbf0f6f4905398f1ed64fb4a63cd_b.jpg)
![](https://pic2.zhimg.com/80/v2-6210fbf0f6f4905398f1ed64fb4a63cd_1440w.webp)
但是可以看出实际数据是有周期性的,定期会有峰谷。修改模型的时候要基于对此问题的理解:domain knowledge
![](https://pic2.zhimg.com/v2-be1ab494a3f147fd6ab93e8319b92b49_b.jpg)
![](https://pic2.zhimg.com/80/v2-be1ab494a3f147fd6ab93e8319b92b49_1440w.webp)
Linear model
可能x和y之间有很复杂的关系,但对linear model来说,关系就是一条直线。。
![](https://pic2.zhimg.com/v2-6712f2a3a494f7a2bf8189a3a745c01d_b.jpg)
![](https://pic2.zhimg.com/80/v2-6712f2a3a494f7a2bf8189a3a745c01d_1440w.webp)
![](https://pic3.zhimg.com/v2-52296e73e73583d7ab2929492ec917b2_b.jpg)
![](https://pic3.zhimg.com/80/v2-52296e73e73583d7ab2929492ec917b2_1440w.webp)
![](https://pic3.zhimg.com/v2-348ad356ba6f1e4af048293ed0c52372_b.jpg)
![](https://pic3.zhimg.com/80/v2-348ad356ba6f1e4af048293ed0c52372_1440w.webp)
怎么获得蓝色曲线?
![](https://pic1.zhimg.com/v2-f7568e25bc9e7ab5a46d873e75bd8a7c_b.jpg)
![](https://pic1.zhimg.com/80/v2-f7568e25bc9e7ab5a46d873e75bd8a7c_1440w.webp)
原理和正常理解是一样的,先是网络层输出y=b+wx,再加一个激活函数,z=sigmoid(y)=sigmoid(b+wx)
需要很多sigmoid逼近的时候,就会产生各种各样的sigmoid
sigmoid越多,能够逼近的越复杂(越像曲线)
![](https://pic3.zhimg.com/v2-81893f8df1211636cd9ea0edab0de1ba_b.jpg)
![](https://pic3.zhimg.com/80/v2-81893f8df1211636cd9ea0edab0de1ba_1440w.webp)
把各种sigmoid function组合在一起,就可以获得各种各样逼近于期望曲线的piecewise linear function
![](https://pic4.zhimg.com/v2-a98b5a0f0f5b42067a9ad755cb3fb977_b.jpg)
![](https://pic4.zhimg.com/80/v2-a98b5a0f0f5b42067a9ad755cb3fb977_1440w.webp)
![](https://pic1.zhimg.com/v2-d4f7d72c6b73d9873a2b83186bcd91fc_b.jpg)
![](https://pic1.zhimg.com/80/v2-d4f7d72c6b73d9873a2b83186bcd91fc_1440w.webp)
![](https://pic2.zhimg.com/v2-c9d21fe295a52805604790103e0ab265_b.jpg)
![](https://pic2.zhimg.com/80/v2-c9d21fe295a52805604790103e0ab265_1440w.webp)
![](https://pic4.zhimg.com/v2-cffd64e410516820d692b001f5e7d0df_b.jpg)
![](https://pic4.zhimg.com/80/v2-cffd64e410516820d692b001f5e7d0df_1440w.webp)
![](https://pic1.zhimg.com/v2-053370ee269815a6cf16da1f609b4758_b.jpg)
![](https://pic1.zhimg.com/80/v2-053370ee269815a6cf16da1f609b4758_1440w.webp)
![](https://pic2.zhimg.com/v2-ff402fc815f1e41df3cf3adf90ae5bf5_b.jpg)
![](https://pic2.zhimg.com/80/v2-ff402fc815f1e41df3cf3adf90ae5bf5_1440w.webp)
![](https://pic3.zhimg.com/v2-a3cfd7f3f85e73fab78d64c24901d122_b.jpg)
![](https://pic3.zhimg.com/80/v2-a3cfd7f3f85e73fab78d64c24901d122_1440w.webp)
![](https://pic4.zhimg.com/v2-1d662e04aebdd6bbf5cea9ebd75be68f_b.jpg)
![](https://pic4.zhimg.com/80/v2-1d662e04aebdd6bbf5cea9ebd75be68f_1440w.webp)
![](https://pic2.zhimg.com/v2-14c7057b2a441cc961fe33463412d691_b.jpg)
![](https://pic2.zhimg.com/80/v2-14c7057b2a441cc961fe33463412d691_1440w.webp)
有全局梯度下降和小批量梯度下降
![](https://pic2.zhimg.com/v2-7883ae833ab556757a56ac6e3e058f35_b.jpg)
![](https://pic2.zhimg.com/80/v2-7883ae833ab556757a56ac6e3e058f35_1440w.webp)
![](https://pic4.zhimg.com/v2-f366ce63607ac098c85604334ddfd1bb_b.jpg)
![](https://pic4.zhimg.com/80/v2-f366ce63607ac098c85604334ddfd1bb_1440w.webp)
![](https://pic3.zhimg.com/v2-bc801620661d129766228ebf20cfcef2_b.jpg)
![](https://pic3.zhimg.com/80/v2-bc801620661d129766228ebf20cfcef2_1440w.webp)
为啥Sigmoid去拼凑各种曲线(先拼出一个Hard sigmoid)?
不是必须的!两段ReLU也可以凑出Hard sigmoid
![](https://pic2.zhimg.com/v2-c55ffba59ab4fc6b2ddfc82424015f15_b.jpg)
![](https://pic2.zhimg.com/80/v2-c55ffba59ab4fc6b2ddfc82424015f15_1440w.webp)
![](https://pic2.zhimg.com/v2-c7ebe802a98dcf8281f803be4e650819_b.jpg)
![](https://pic2.zhimg.com/80/v2-c7ebe802a98dcf8281f803be4e650819_1440w.webp)
![](https://pic2.zhimg.com/v2-f3644414e9f7f07b64b73ae67cb1a825_b.jpg)
![](https://pic2.zhimg.com/80/v2-f3644414e9f7f07b64b73ae67cb1a825_1440w.webp)
![](https://pic3.zhimg.com/v2-fc3dabbfd93bf16ae0fcce9875cb3566_b.jpg)
![](https://pic3.zhimg.com/80/v2-fc3dabbfd93bf16ae0fcce9875cb3566_1440w.webp)
![](https://pic4.zhimg.com/v2-10c5d8631ef4e759e3d8c3db41f70b47_b.jpg)
![](https://pic4.zhimg.com/80/v2-10c5d8631ef4e759e3d8c3db41f70b47_1440w.webp)
![](https://pic1.zhimg.com/v2-b1ac08ef8b10acbff4479c217b892190_b.jpg)
![](https://pic1.zhimg.com/80/v2-b1ac08ef8b10acbff4479c217b892190_1440w.webp)
![](https://pic3.zhimg.com/v2-a0a24ac91ed682585e471950cc54b44a_b.jpg)
![](https://pic3.zhimg.com/80/v2-a0a24ac91ed682585e471950cc54b44a_1440w.webp)
![](https://pic3.zhimg.com/v2-7207a820842da508259cae1d37f14caa_b.jpg)
![](https://pic3.zhimg.com/80/v2-7207a820842da508259cae1d37f14caa_1440w.webp)
转载自:https://zhuanlan.zhihu.com/p/549552812