多元线性回归

爱彤彤的小鱼

已于 2022-07-31 17:22:00 修改

阅读量461

点赞数

分类专栏：机器学习文章标签：线性回归机器学习算法

于 2022-07-31 17:20:47 首次发布

本文链接：https://blog.csdn.net/qq_59676027/article/details/126087715

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文介绍了使用Python进行一元线性回归的梯度下降法。作者通过读取dat文件中的房屋面积和卧室数量数据，进行预处理并应用批量梯度下降算法，探讨不同学习率对迭代过程的影响。实验结果显示，学习率的选择影响梯度下降的速度和精度，强调了数据预处理和合适学习率选择的重要性。

摘要由CSDN通过智能技术生成

在上一次完成一元的线性回归后，距离今天已经十多天了，本来想着早早更新下去，结果我的数据读取错误。导致我一直无法正确读取数据，还好我找到了解决方案。数据集合随后我会给出。

集合有两个，一个是ex2x.dat，第一列数据表示房屋的面积,第二列表示房屋的卧室数量

第二个是ex2y.dat，表示ex2x.dat对应行的房屋的价格。

而我们研究在房屋的面积和卧室数量与房屋价格的线性关系。

一. 梯度下降

我们的目标是预测值与真实值的差距越小越好，由此可以设置目标函数为：

$J(\theta )=\frac{1}{2m}\sum (y^i-h(x^i))^2$

批量梯度下降

$\frac{\partial J(\theta )}{\partial \theta _i}=-\frac{1}{m}\sum (y^i-h(x^i))x_j^i$

$\theta _j:=\theta _j+\frac{1}{m}\sum (y^i-h(x^i))x_j^i$

原理可参考梯度下降。

接下来展示代码

import matplotlib.pyplot as plt 
import pandas as pd
import numpy as np
x=pd.read_csv('C:\\Users\\zhanghongqing\\Desktop\\python\\逻辑回归\\ex2Data\\ex2x.dat',sep='\s+',header=None)
y=pd.read_csv('C:\\Users\\zhanghongqing\\Desktop\\python\\逻辑回归\\ex2Data\\ex2y.dat',sep='\s+',header=None)
x=np.array(x)
y=np.array(y)
a=np.ones((x.shape[0],1))
#在x数据集前加一列全为1的数据，作为常数项
x=np.hstack((a,x))
print(x)
#m，n分别为行和列的数量
m=x.shape[0]
n=x.shape[1]
theta=np.zeros((n,1))
print(theta)
J=np.zeros((50,1))
J1=np.zeros((50,1))
J2=np.zeros((50,1))
J3=np.zeros((50,1))
J4=np.zeros((50,1))
def deal():#预处理
    sigma=np.std(x,axis=0)
    mu=np.mean(x,axis=0)
    for i in range(x.shape[1]):
        if(i!=0):
            x[:,i]=(x[:,i]-mu[i])/sigma[i]
def lostfuntion(theta):#损失函数
    curentJ=1/(2*m)*np.dot((np.dot(x,theta)-y).T,np.dot(x,theta)-y)
    return curentJ
def h(x):
    return np.dot(x,theta)
def gradplus(alpha,nn,Jn):#梯度下降函数
    global theta
    for k in range(nn):
        Jn[k]=lostfuntion(theta)
        theta=theta-alpha/m*np.dot(x.T,h(x)-y)
deal()
#算出学习率为0.01，0.03，0.1，0.3的J
gradplus(0.01,50,J1)
gradplus(0.03,50,J2)
gradplus(0.1,50,J3)
gradplus(0.3,50,J4)
gradplus(0.1,50,J)
xx=np.array([i for i in range(50)])
#画图部分
plt.figure(num=1)
plt.plot(xx,J,'b-')
plt.legend(['0.1'])
plt.xlabel('number of iterations')
plt.ylabel('cost j')
plt.show()
plt.figure(num=2)
plt.plot(xx,J1,'b-')
plt.plot(xx,J2,'r-')
plt.plot(xx,J3,'k-')
plt.plot(xx,J4,'y-')
plt.legend(['0.01','0.03','0.1','0.3'])
plt.xlabel('number of iterations')
plt.ylabel('cost j')
plt.show()

最后我们拟合出了theta值，图片展示了拟合过程中theta的变化。

总结：

1.学习率越大，梯度下降的速度越快。

2.图像可以得出，学习率太小迭代次数太少是无法得出较为精确的结论的。

3.我的两个文件为dat文件，读取时如果不加sep='\s+'，将读取出str类型的数据。

ex2x.dat

2.1040000e+03 3.0000000e+00
1.6000000e+03 3.0000000e+00
2.4000000e+03 3.0000000e+00
1.4160000e+03 2.0000000e+00
3.0000000e+03 4.0000000e+00
1.9850000e+03 4.0000000e+00
1.5340000e+03 3.0000000e+00
1.4270000e+03 3.0000000e+00
1.3800000e+03 3.0000000e+00
1.4940000e+03 3.0000000e+00
1.9400000e+03 4.0000000e+00
2.0000000e+03 3.0000000e+00
1.8900000e+03 3.0000000e+00
4.4780000e+03 5.0000000e+00
1.2680000e+03 3.0000000e+00
2.3000000e+03 4.0000000e+00
1.3200000e+03 2.0000000e+00
1.2360000e+03 3.0000000e+00
2.6090000e+03 4.0000000e+00
3.0310000e+03 4.0000000e+00
1.7670000e+03 3.0000000e+00
1.8880000e+03 2.0000000e+00
1.6040000e+03 3.0000000e+00
1.9620000e+03 4.0000000e+00
3.8900000e+03 3.0000000e+00
1.1000000e+03 3.0000000e+00
1.4580000e+03 3.0000000e+00
2.5260000e+03 3.0000000e+00
2.2000000e+03 3.0000000e+00
2.6370000e+03 3.0000000e+00
1.8390000e+03 2.0000000e+00
1.0000000e+03 1.0000000e+00
2.0400000e+03 4.0000000e+00
3.1370000e+03 3.0000000e+00
1.8110000e+03 4.0000000e+00
1.4370000e+03 3.0000000e+00
1.2390000e+03 3.0000000e+00
2.1320000e+03 4.0000000e+00
4.2150000e+03 4.0000000e+00
2.1620000e+03 4.0000000e+00
1.6640000e+03 2.0000000e+00
2.2380000e+03 3.0000000e+00
2.5670000e+03 4.0000000e+00
1.2000000e+03 3.0000000e+00
8.5200000e+02 2.0000000e+00
1.8520000e+03 4.0000000e+00
1.2030000e+03 3.0000000e+00
ex2ydat

3.9990000e+05
3.2990000e+05
3.6900000e+05
2.3200000e+05
5.3990000e+05
2.9990000e+05
3.1490000e+05
1.9899900e+05
2.1200000e+05
2.4250000e+05
2.3999900e+05
3.4700000e+05
3.2999900e+05
6.9990000e+05
2.5990000e+05
4.4990000e+05
2.9990000e+05
1.9990000e+05
4.9999800e+05
5.9900000e+05
2.5290000e+05
2.5500000e+05
2.4290000e+05
2.5990000e+05
5.7390000e+05
2.4990000e+05
4.6450000e+05
4.6900000e+05
4.7500000e+05
2.9990000e+05
3.4990000e+05
1.6990000e+05
3.1490000e+05
5.7990000e+05
2.8590000e+05
2.4990000e+05
2.2990000e+05
3.4500000e+05
5.4900000e+05
2.8700000e+05
3.6850000e+05
3.2990000e+05
3.1400000e+05
2.9900000e+05
1.7990000e+05
2.9990000e+05
2.3950000e+05

爱彤彤的小鱼

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
多元线性回归

在上一次完成一元的线性回归后，距离今天已经十多天了，本来想着早早更新下去，结果我的数据读取错误。导致我一直无法正确读取数据，还好我找到了解决方案。数据集合随后我会给出。集合有两个，一个是ex2x.dat，第一列数据表示房屋的面积,第二列表示房屋的卧室数量第二个是ex2y.dat，表示ex2x.dat对应行的房屋的价格。而我们研究在房屋的面积和卧室数量与房屋价格的线性关系。......
复制链接

扫一扫