一元变量线性回归（房价预测） Python代码

最新推荐文章于 2023-10-31 12:17:03 发布

襟铭心缘

最新推荐文章于 2023-10-31 12:17:03 发布

阅读量1.8k

点赞数 5

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_41070133/article/details/108885062

版权

机器学习 python

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

前言

本文主要内容是一元变量下的线性回归问题，给定2000~2013年的房价，预测2014年的房价。文中使用了梯度下降法和最小二乘法两种方法去求解。

一、训练数据

year price
2000 2.0
2001 2.5
2002 2.9
2003 3.147
2004 4.515
2005 4.903
2006 5.365
2007 5.704
2008 6.853
2009 7.971
2010 8.561
2011 10.0
2012 11.280
2013 12.900

二、梯度下降法

1、代码

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#加载数据
def loadData(filename):
    traindata = pd.read_csv(filename,' ')
    traindata.insert(0,'Ones',1)
    x = traindata.iloc[:,[0,1]]
    y = traindata.iloc[:,2]
    return x.values,y.values
#损失函数
def costFunction(X,Y,Theta):
    m=len(Y)
    inner=np.power((X@Theta-Y),2)
    return np.sum(inner)/(2*m)
#梯度下降法
def gradientDescent(X,Y,Theta,alpha,iters):
    m=len(Y)
    temp=np.zeros(Theta.shape)
    cols=len(Theta)
    costs=np.zeros(iters)
    for i in range(iters):
        dis = X@Theta-Y
        for j in range(cols):
            term=dis*X[:,j]
            temp[j] = Theta[j]-alpha*np.sum(term)/m
        Theta=temp
        costs[i]=costFunction(X,Y,Theta)
    return Theta,costs

def linear_regression(predictYear = 2014):
    X,Y=loadData('./inputdata.txt')
    X[:,1]-=2000
    theta=np.array([0,0])
    iters = 3000
    alpha = 0.0001
    w,costs=gradientDescent(X,Y,theta,alpha,iters)
    print(w)

    x=np.arange(0,20,2)
    f=w[0]+w[1]*x
    pyear=predictYear-2000
    pprice=w[0]+w[1]*pyear
    x_ticks=x+2000
    plt.xticks(x_ticks)
    plt.xlabel('Year')
    plt.ylabel('price')
    plt.title('Price of house')
    plt.scatter(X[:,1]+2000,Y,color='red')
    plt.scatter(predictYear,pprice,color='green')
    plt.plot(x_ticks,f)
    
    plt.figure()
    plt.title('Cost')
    plt.xlabel('iterations')
    plt.ylabel('cost')
    plt.plot(costs)
    plt.show()

详细了解梯度下降法。

2、结果

其中，红色的点是训练样本，绿色的点是预测值，蓝色的线是拟合的直线。

上图是损失函数与迭代次数的关系，可以看到迭代500次后损失函数值就平稳了。

三、最小二乘法

1、代码

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#加载数据
def loadData(filename):
    traindata = pd.read_csv(filename,' ')
    traindata.insert(0,'Ones',1)
    x = traindata.iloc[:,[0,1]]
    y = traindata.iloc[:,2]
    return x.values,y.values

#损失函数
def costFunction(X,Y,Theta):
    m=len(Y)
    inner=np.power((X@Theta-Y),2)
    return np.sum(inner)/(2*m)

def calcuateTheta(X,Y):
    theta=np.zeros(2)
    m=len(X)
    x_mean= np.sum(X)/m
    theta[1]=np.sum((X-x_mean)*Y)/(np.sum(X**2)-(np.sum(X))**2/m)
    theta[0]=np.sum(Y-theta[1]*X)/m
    return theta


def linear_regression(predictYear = 2014):
    X,Y=loadData('./inputdata.txt')
    X[:,1]-=2000
    theta=calcuateTheta(X[:,1],Y)
    print(theta,costFunction(X,Y,theta))

    x=np.arange(0,20,2)
    f=theta[0]+theta[1]*x
    pyear=predictYear-2000
    pprice=theta[0]+theta[1]*pyear
    x_ticks=x+2000
    plt.xticks(x_ticks)
    plt.xlabel('Year')
    plt.ylabel('price')
    plt.title('Price of house')
    plt.scatter(X[:,1]+2000,Y,color='red')
    plt.scatter(predictYear,pprice,color='green')
    plt.plot(x_ticks,f)
    plt.show()

最小二乘法不需要像梯度下降法进行迭代，直接用公式即可。对公式推导感兴趣的可以看这篇博客。