吴恩达ex1——多元线性回归实现

最新推荐文章于 2024-09-15 22:31:42 发布

热爱学习的小鲁同学

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量110

点赞数

分类专栏：吴恩达课程作业文章标签： python 机器学习

本文链接：https://blog.csdn.net/m0_45055763/article/details/124591292

版权

吴恩达课程作业专栏收录该内容

9 篇文章 0 订阅

订阅专栏

本文通过实例展示了如何使用Python进行数据预处理、梯度下降求解线性回归模型参数，并与Scikit-learn库的LinearRegression进行比较。内容包括数据标准化、成本函数计算、梯度下降算法实现和不同学习率对收敛速度的影响。

摘要由CSDN通过智能技术生成

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#读取数据
df=pd.read_csv('ex1data2.txt',header=None,
               names=['siez of housing ','numbers of bedroom','house price'])
df.head()

	siez of housing	numbers of bedroom	house price
0	2104	3	399900
1	1600	3	329900
2	2400	3	369000
3	1416	2	232000
4	3000	4	539900

#标准化
df=(df-df.mean())/df.std()
df.head()

	siez of housing	numbers of bedroom	house price
0	0.130010	-0.223675	0.475747
1	-0.504190	-0.223675	-0.084074
2	0.502476	-0.223675	0.228626
3	-0.735723	-1.537767	-0.867025
4	1.257476	1.090417	1.595389

X=df.iloc[:,0:2].values
X.shape

(47, 2)

X=np.insert(X,0,1,axis=1)

X=np.matrix(X)
X.shape

(47, 3)

y=df.iloc[:,-1].values.reshape(47,1)
y=np.matrix(y)
y.shape

(47, 1)

theta=np.zeros((3,1))
theta=np.matrix(theta)
theta.shape

(3, 1)

#定义损失函数
def cost(X,y,theta):
    inner=(X*theta-y).T*(X*theta-y)
    J=inner/(2*len(y))
    return J

J_0=cost(X,y,theta)
J_0

matrix([[0.4893617]])

#梯度下降实现
def GredientDescent(X,y,theta,iters,a):
    loss=np.zeros(iters)
    loss=np.insert(loss,0,J_0)#把θ=0的loss放在第一个位置
    
    for i in range(iters):
        for j in range(len(theta)):
            mid=np.sum(np.multiply((X*theta-y),X[:,j]))
            theta[j]=theta[j]-mid*a/len(y)
        loss[i+1]=cost(X,y,theta)
            
    return theta,loss

theta,loss=GredientDescent(X,y,theta,iters=1000,a=0.01)

theta.shape

(3, 1)

loss.shape

(1001,)

#绘制误差随着iters变化
plt.plot(range(1000),loss[1:],color='r')
plt.xlabel('iters')
plt.ylabel('loss')
plt.show()

在这里插入图片描述

#绘制不同学习率变化下iters-loss曲线
for i in range(5):
    theta=np.matrix(np.zeros((3,1)))#初始化theta，避免前面的theta影响
    a=[0.01,0.03,0.09,0.18,0.54]
    theta,loss=GredientDescent(X,y,theta,iters=100,a=a[i])
    plt.plot(range(101),loss,label='{}{}{}{}'.format('α',i,'=',a[i]))
    
plt.legend(loc='best')

<matplotlib.legend.Legend at 0x2afc8e334f0>

在这里插入图片描述

sklearn实现

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

df1=pd.read_csv('ex1data2.txt',header=None,
               names=['siez of housing ','numbers of bedroom','house price'])
df1.head()

	siez of housing	numbers of bedroom	house price
0	2104	3	399900
1	1600	3	329900
2	2400	3	369000
3	1416	2	232000
4	3000	4	539900

X1=df1.iloc[:,[0,1]].values
X1.shape

(47, 2)

y1=df1.iloc[:,-1].values.reshape(47,-1)
y1.shape

(47, 1)

#标准化
stds=StandardScaler()
X1_std=stds.fit_transform(X1)
y1_std=stds.fit_transform(y1)

#数据太小，每分成训练集和测试集
lr=LinearRegression()
lr.fit(X1_std,y1_std)

LinearRegression()

热爱学习的小鲁同学

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录