文章目录
主要记录李宏毅机器学习的一些笔记,视频链接:李宏毅2021/2022春机器学习课程
1. 机器学习
1.1 基本概念
![image-20230221091008223](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210910334.png)
机器学习就是具备找一个函数的能力
回归:函数输出值为数值
分类:给你一个选项,函数输出正确的选型
![image-20230221091121519](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210911442.png)
1.2 机器学习训练的过程
1.写出一个带有未知参数的函数
![image-20230221091744303](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210917612.png)
2.定义Loss
![image-20230221092128407](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210921532.png)
损失函数是一个基于参数的函数,例如L(b,w)
假设b=0.5k,w=1,将其带入预测函数,然后用训练数据计算预测出来的结果,计算预测的结果与真实的结果。
![image-20230221092230690](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210922712.png)
Loss就是真实结果与预测结果差值之和
![image-20230221092345017](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210923331.png)
3.优化
梯度下降:(找到一个w使得loss值最小)
-
随机初始化一个值 w 0 w_0 w0
-
计算微分值(切线斜率)
- 结果为正值:减小w
- 结果为负值:增大w
- 减小与增大w的多少是由增长率决定的
-
重复以上过程
梯度下降会找到局部最小值,但不一定是全局最小
![image-20230221094022140](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210940232.png)
![image-20230221094116595](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210941445.png)
![image-20230221094211174](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210942576.png)
1.3 模型的改进
![image-20230221095618664](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210956654.png)
可以看出,红色的线段可以由常数项+蓝色线段组成的
![image-20230221095706477](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210957297.png)
对于不同的曲线,可以在曲线上取不同的点,然后使得线段逼近曲线
![image-20230221095903948](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302210959352.png)
量化蓝色曲线:
![image-20230221100132300](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211001401.png)
调整b、w、c可以获得不同的sigmoid函数
![image-20230221100223203](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211002253.png)
计算上述红色曲线的函数,此时是单个特征:
![image-20230221100439216](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211004979.png)
有更多的特征时,下图有3个特征 x 1 , x 2 , x 3 x_1,x_2,x_3 x1,x2,x3
![image-20230221100810111](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211008087.png)
![image-20230221100918359](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211009383.png)
![image-20230221101038927](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211010063.png)
参数的定义:
![image-20230221101223084](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211012556.png)
计算loss:
![image-20230221101509906](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211015648.png)
优化:
![image-20230221101658087](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211016095.png)
在实际操作中:
将样本随机分为几个batch,第一个batch计算L1,然后更新参数,再计算L2,继续更新参数
![image-20230221102849523](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211028728.png)
![image-20230221103131834](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211031841.png)
模型的选择:(激活函数)
![image-20230221103354147](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211033147.png)
![image-20230221103428194](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211034291.png)
2. 深度学习
![image-20230221103822780](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302211038307.png)
2.1 反向传播
链式法则:
![image-20230222085408420](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220854658.png)
![image-20230222085755469](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220857611.png)
![image-20230222085821451](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220858648.png)
2.1.1前向传播
![image-20230222090016650](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220900527.png)
![image-20230222090042882](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220900850.png)
2.1.2 反向传播
![image-20230222090642496](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220906493.png)
假设已知 ∂ C ∂ z ′ \frac{\partial C}{\partial z'} ∂z′∂C和 ∂ C ∂ z ′ ′ \frac{\partial C}{\partial z''} ∂z′′∂C,将其作为输入可得:
![image-20230222091436037](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220914444.png)
计算 ∂ C ∂ z ′ \frac{\partial C}{\partial z'} ∂z′∂C和 ∂ C ∂ z ′ ′ \frac{\partial C}{\partial z''} ∂z′′∂C:
1.有输出层的情况
![image-20230222091536522](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220915693.png)
2.没有到输出层
继续往下走,直到找到输出层
![image-20230222091732134](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220917260.png)
换个方向计算,即从输出层 y 1 , y 2 y_1,y_2 y1,y2算到输入层
![image-20230222091935198](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220919089.png)
总结:
前向传播vs反向传播
![image-20230222092026570](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220920517.png)
3. 回归
回归的应用:
![image-20230222092654986](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220926764.png)
3.1 线性回归
1.定义线性模型
![image-20230222093059632](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220931161.png)
2.评价模型的好坏
定义损失函数(真正的数值 - 预测的值)用两者之间的差来衡量函数的好坏
![image-20230222093540294](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220935242.png)
![image-20230222093639192](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221025190.png)
3.优化函数
![image-20230222093815587](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220938892.png)
利用梯度下降计算:(具体流程和上面所写的相同)
![image-20230222094301277](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220943146.png)
对于线性回归,它的损失函数是凸函数,即没有局部最优
模型的结果:
![image-20230222095116538](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220951506.png)
选择更复杂的模型:
![image-20230222095539298](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220955489.png)
在训练数据上,选择更复杂的模型会有更小的误差
![image-20230222095629226](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302220956319.png)
但是在测试数据上不会一直有更好的表现,这就是过拟合,需要选择一个最适合的模型
3.2 优化模型
考虑有隐藏的因素:
![image-20230222102039650](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221020608.png)
![image-20230222102110493](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221021134.png)
![image-20230222102223173](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221022194.png)
但是注意,如果把所有考虑的特征都加进去,可能会造成过拟合,可以进一步使用正则化
![image-20230222102324952](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221023233.png)
正则化优化:使得模型更加平滑
![image-20230222101120705](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221011942.png)
![image-20230222101247039](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221012417.png)
λ \lambda λ越大,模型就越平滑,考虑训练数据的误差就越少,与之对应,训练数据的误差就会变大,但测试数据上的误差反而会减小,因此需要选择合适的 λ \lambda λ
4. 分类
分类的应用:
![image-20230222102749953](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221027109.png)
具体分类做法
![image-20230222103745192](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221037603.png)
把分类当作回归来看:
- 训练时可以把一个样本定义为1,第2个样本定义为-1
- 测试时:越接近于1,则为1;越接近于-1,则为样本2
![image-20230222104224367](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221042365.png)
利用上述算法进行处理会出现问题,会惩罚那些太正确的样本,如右图,利用回归算法,为了减少误差会得到紫色的那条线,但是实际分类确实绿色的线,运用该算法会出现一些问题
4.1 分类算法
算法的流程:
![image-20230222104450452](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221044384.png)
4.2 生成模型
假设有两个类别 C 1 , C 2 C_1,C_2 C1,C2, P ( C 1 ∣ x ) = P ( x ∣ C 1 ) P ( C 1 ) P ( x ∣ C 1 ) P ( C 1 ) + P ( x ∣ C 2 ) P ( C 2 ) P(C_1|x) = \frac{P(x|C_1)P(C_1)}{P(x|C_1)P(C_1) + P(x|C_2)P(C_2)} P(C1∣x)=P(x∣C1)P(C1)+P(x∣C2)P(C2)P(x∣C1)P(C1),其中 P ( C 1 ) 、 P ( C 2 ) P(C_1)、P(C_2) P(C1)、P(C2)为先验分布, P ( x ∣ C 1 ) 、 P ( x ∣ C 2 ) P(x|C_1)、P(x|C_2) P(x∣C1)、P(x∣C2)都是高斯分布
![image-20230222104934071](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221049146.png)
高斯分布:(正态分布)
![image-20230222105538370](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221055639.png)
假设样本服从高斯分布,根据已有的标签数据可以求得每一类均值和方差的估计:
![image-20230222105622466](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221056805.png)
极大似然估计:
![image-20230222105905037](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221059137.png)
可以求出均值 μ ∗ \mu^* μ∗和方差 Σ ∗ \Sigma^* Σ∗的估计:
![image-20230222110139512](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221101622.png)
计算出均值和方差:
![image-20230222110247302](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221102717.png)
计算分类:
![image-20230222110358258](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221103536.png)
总结
![image-20230222112010822](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221120651.png)
4.3 后验概率公式推导
![image-20230222112446507](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221124433.png)
![image-20230222112622163](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221126018.png)
![image-20230222112712923](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221127936.png)
![image-20230222112757589](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221127699.png)
![image-20230222113047866](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221130890.png)
即现在模型可以简化为只需要估计w和b
4.4 逻辑回归
1.算法模型:
![image-20230222140038101](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221400161.png)
![image-20230222140138302](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221401315.png)
2.评价算法的好坏
交叉熵损失函数
令 f w , b ( x ) = σ ( w x + b ) f_w,b(x) = \sigma(wx+b) fw,b(x)=σ(wx+b),则逻辑回归的损失函数为L(w,b):
![image-20230222140438406](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221404466.png)
![image-20230222140801527](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221408766.png)
![image-20230222140854983](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221408134.png)
H(p,q)即为交叉熵损失函数
3.优化函数
![image-20230222141225559](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221412533.png)
![image-20230222141303794](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221413250.png)
![image-20230222141435104](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221414300.png)
逻辑回归vs线性回归
![image-20230222141503612](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221415780.png)
逻辑回归运用均方误差:
![image-20230222142354699](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221423692.png)
进行梯度下降的时候,后面计算微分的时候,在距离目标很近以及距离目标很远的时候都会出现微分为0的情况,具体图像如下所示:
![image-20230222142647453](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221426496.png)
逻辑回归的限制:
![image-20230222144608482](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221446417.png)
![image-20230222144656818](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221446919.png)
逻辑回归的边界就是一条直线,对于上述的图形会发现逻辑回归无法划分出红色的点和蓝色的点
解决方法是:将特征转换
![image-20230222144834652](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221448561.png)
但是并不能一直可以找到一个好的转换,所以需要新的解决方法
将逻辑回归模型级联起来:
前面所做的就是将特征转换,而后面的才是进行分类
![image-20230222145042960](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221450126.png)
4.5 判别模型 vs 生成模型
![image-20230222142900601](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221429698.png)
一般情况下,两者的w和b不一定相同
生成式模型的优点:
- 因为生成模型有一个先验的假设,所以需要更少的训练数据,而且对噪声有更高的鲁棒性。
- 先验分布和类别依赖的概率分布可以从不同的来源估计。
4.6 多分类
![image-20230222144357187](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221443362.png)
![image-20230222144311715](https://images-1314224954.cos.ap-beijing.myqcloud.com/202302221443019.png)