线性回归就是这么so easy(附Python与Sklearn代码实现)

一、线性回归简介

1.1 线性回归

  我们以身高来举例,直觉告诉我们爸爸妈妈的身高会共同影响子女的身高,为了同时考虑到父母双方的身高的影响,可以取其两者的平均值作为因素进行研究,这里父母的平均身高就是自变量x,而我们的身高就是因变量y,y 和 x 之间存在线性关系:
  那我们怎么求出上面的参数 w 和 b 呢,就是需要我们收集足够多的 x, y ,然后通过线性回归算法就可以拟合数据帮我们求出参数 w 和 b
在这里插入图片描述

1.2 损失函数

  那么怎么求出最好的那条直线,可以最好拟合已有数据呢,我们引入损失函数,所有点损失和最小的直线就是拟合最好的那条直线。将y=wx+b换一种写法,方便之后的函数推导,矩阵运算:
在这里插入图片描述

  假设函数(hypothesesfunction) h θ ( x ) = ∑ i = 0 n θ i x i = θ T x 其 中 x 0 = 1 h_{\theta}(x)=\sum_{i=0}^{n} \theta_{i} x_{i}=\theta^{T} x其中x_{0}=1 hθ(x)=i=0nθixi=θTxx0=1
  损失函数(loss function) 这里采用平方损失 L ( θ ) = ( h θ ( x ) − y ) 2 L(\theta)=\left(h_{\theta}(x)-y\right)^{2} L(θ)=(hθ(x)y)2
  代价函数(costfunction) J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta)=\frac{1}{2 \mathrm{m}} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} J(θ)=2m1i=1m(hθ(x(i))y(i))2
  将每个预测值与实际值的差平方后加起来(加1/2,方便后面求导消掉系数2)

二、梯度下降法

2.1 什么是梯度

  梯度就是分别对每个变量进行微分,然后用逗号分割开,梯度是用<>包括起来,说明梯度其实一个向量。例:在这里插入图片描述
  梯度的方向实际就是函数在此点上升最快的方向!梯度下降的方向即为负梯度方向。

2.2 梯度下降法求解

  使用梯度下降法求解,使代价函数损失值最小。
∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 = 2 ∗ 1 2 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) ∂ ∂ θ j ( h θ ( x ( i ) ) − y ( i ) ) ] = 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) ∂ ∂ θ j ( ∑ f = 0 n θ f x f ( i ) − y ( i ) ) ] = 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{\partial}{\partial \theta_{j}} \frac{1}{2 \mathrm{m}} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} \\ =2 * \frac{1}{2 m} \sum_{i=1}^{m}\left[\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \frac{\partial}{\partial \theta_{j}}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)\right]\\ =\frac{1}{m} \sum_{i=1}^{m}\left[\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \frac{\partial}{\partial \theta_{j}}\left(\sum_{f=0}^{n} \theta_{f} x_{f}^{(i)}-y^{(i)}\right)\right] \\ =\frac{1}{m} \sum_{i=1}^{m}\left[\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right] θjJ(θ)=θj2m1i=1m(hθ(x(i))y(i))2=22m1i=1m[(hθ(x(i))y(i))θj(hθ(x(i))y(i))]=m1i=1m(hθ(x(i))y(i))θjf=0nθfxf(i)y(i)=m1i=1m[(hθ(x(i))y(i))xj(i)]
  参数更新
∂ J ( θ ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) 提 出 一 个 负 号 θ j : = θ j − α 1 m ∑ i = 1 m ∂ J ( θ ) ∂ θ j θ j : = θ j + α 1 m ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \frac{\partial J(\theta)}{\partial \theta_{j}}=-\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)} \quad 提出一个负号\\ \theta_{j}:=\theta_{j}-\alpha\frac{1}{m} \sum_{i=1}^{m}\frac{\partial J(\theta)}{\partial \theta_{j}}\\ \theta_{j}:=\theta_{j}+\alpha\frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)} θjJ(θ)=m1i=1m(y(i)hθ(x(i)))xj(i)θj:=θjαm1i=1mθjJ(θ)θj:=θj+αm1i=1m(y(i)hθ(x(i)))xj(i)
  α在梯度下降算法中被称作为学习率或者步长,α不能太大也不能太小,太小的话,可能导致迟迟走不到最低点,太大的话,会导致错过最低点!

三、Ridge回归

目标函数: J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∥ θ ∥ 2 2 = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 J(\theta)=\frac{1}{2 \mathrm{m}} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda\|\theta\|_{2}^{2}\\ =\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2} J(θ)=2m1i=1m(hθ(x(i))y(i))2+λθ22=2m1i=1m(hθ(x(i))y(i))2+λj=1nθj2
岭回归求解: ∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 = 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] + 2 λ θ j \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{\partial}{\partial \theta_{j}} \frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2} \\ =\frac{1}{m} \sum_{i=1}^{m}\left[\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right]+2 \lambda \theta_{j} θjJ(θ)=θj2m1i=1m(hθ(x(i))y(i))2+λj=1nθj2=m1i=1m[(hθ(x(i))y(i))xj(i)]+2λθj
迭代公式: θ j : = θ j + α 1 m ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) − 2 λ θ j \theta_{j}:=\theta_{j}+\alpha \frac{1}{m} \sum_{i=1}^{m}\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)}-2 \lambda \theta_{j} θj:=θj+αm1i=1m(y(i)hθ(x(i)))xj(i)2λθj

四、最小二乘法求线性回归

J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2  此时没有除以m  J ( θ ) = 1 2 ( X θ − y ) T ( X θ − y ) = 1 2 ( θ T X T − y T ) ( X θ − y ) = 1 2 ( θ T X T X θ − θ T X T y − y T X θ + y T y ) \begin{aligned} &J(\theta)=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} \quad \text { 此时没有除以m }\\ &J(\theta)=\frac{1}{2}(X \theta-y)^{T}(X \theta-y)\\ &=\frac{1}{2}\left(\theta^{T} X^{T}-y^{T}\right)(X \theta-y)\\ &=\frac{1}{2}\left(\theta^{T} X^{T} X \theta-\theta^{T} X^{T} y-y^{T} X \theta+y^{T} y\right) \end{aligned} J(θ)=21i=1m(hθ(x(i))y(i))2 此时没有除以J(θ)=21(Xθy)T(Xθy)=21(θTXTyT)(Xθy)=21(θTXTXθθTXTyyTXθ+yTy)
在这里插入图片描述

∇ θ J ( θ ) = 1 2 ( 2 X T X θ − X T y − ( y T X ) T ) 将 θ 带 入 = X T X θ − X T y \begin{array}{l} \nabla_{\theta} J(\theta)=\frac{1}{2}\left(2 X^{T} X \theta-X^{T} y-\left(y^{T} X\right)^{\mathrm{T}}\right) 将\theta带入\\ =X^{T} X \theta-X^{T} y \end{array} θJ(θ)=21(2XTXθXTy(yTX)T)θ=XTXθXTy
矩阵运算中用到的公式,小写字母表示向量,大写字母表示矩阵
( a b ) T = b T a T ∂ ( x T A x ) ∂ x = 2 A x ∂ ( x T A ) ∂ x = A ∂ ( A x ) ∂ x = A T A − 1 A = 1 \begin{array}{l} (a b)^{T}=b^{T} a^{T} \\ \frac{\partial\left(x^{T} A x\right)}{\partial x}=2 A x \\ \frac{\partial\left(x^{T} A\right)}{\partial x}=A \\ \frac{\partial(A x)}{\partial x}=A^{T} \\ A^{-1} A=1 \end{array} (ab)T=bTaTx(xTAx)=2Axx(xTA)=Ax(Ax)=ATA1A=1

import numpy as np
import matplotlib.pyplot as plt

def loaddata():
    data = np.loadtxt('data/data1.txt',delimiter=',')
    n = data.shape[1]-1 #特征数
    X = data[:,0:n] # 取X数据
    y = data[:,-1] # 取y数据
    return X,y
    
X_orgin,y = loaddata()
X = np.insert(X_orgin, 0, values=1, axis=1)

theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y) # X.T转置 np.linalg.inv 求逆矩阵

# 绘图
plt.scatter(X_orgin,y)
h_theta = theta[0]+theta[1]*X_orgin
plt.plot(X_orgin,h_theta)

输出图片

五、Sklearn实现

5.1 线性回归

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# 加载数据函数
def loaddata():
    data = np.loadtxt('data/data1.txt',delimiter=',')
    n = data.shape[1]-1 #特征数
    X = data[:,0:n]
    y = data[:,-1].reshape(-1,1)
    return X,y
X,y = loaddata()
# 线性回归,linear_model中的LinearRegression()方法可以实现线性回归
model1 = linear_model.LinearRegression()
#调用fit方法
model1.fit(X,y) # fit()方法可以加载训练数据并进行训练
print(model1.coef_) #intercept_属性可以输出 𝜃0 的值
print(model1.intercept_) # coef_属性可以输出 𝜃1 到 𝜃𝑛 的值

5.2 Ridge 回归

# linear_model中的Ridge方法可以实现Ridge回归
# 注意:这里的alpha表示正则化强度
# normalize设置为True表示对训练数据进行标准化
model2 = linear_model.Ridge(alpha=0.01)
model2.fit(X,y)
print(model2.coef_)
print(model2.intercept_)
plt.scatter(X,y)
y_hat = model2.predict(X)
plt.plot(X,y_hat)

5.3 LASSO回归

model3 = linear_model.Lasso(alpha=0.01)
model3.fit(X,y)
print(model3.coef_)
print(model3.intercept_)
plt.scatter(X,y)
y_hat = model3.predict(X)
plt.plot(X,y_hat)

在这里插入图片描述

附录

data1.txt中的数据点:
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值