ML线性回归 python实现

最新推荐文章于 2024-02-01 15:51:09 发布

lcjia_you

最新推荐文章于 2024-02-01 15:51:09 发布

阅读量388

点赞数

分类专栏：机器学习 python 文章标签：线性回归 Linear Regression Python实现

本文链接：https://blog.csdn.net/u011864852/article/details/86419292

版权

python 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

机器学习

0 篇文章 0 订阅

订阅专栏

最近在学习Andrew Ng的课程，仅做完Octave的编程作业感觉并没有理解算法的精髓。故打算从今天开始，将重要的ML算法用Python重新实现一遍，加深印象，提高自己实际应用的能力。

编程环境：Windows10+Jupyter Lab+Python3.6+Numpy+Matplotlib+scipy+sklearn;数据：ex1data1.txt,ex1data2.txt(文中有下载链接)

Linear Regression With One variable

线性回归解决的是连续的预测问题，例如房价预测。

costFunction（代价函数）：

$J\left (\Theta \right )=\frac{1}{2m}\sum_{i=1}^{m}\left ( h_{\Theta }\left ( x^{(i)} \right ) -y\right )^{2}$

Gradient Descent(梯度下降法)：

$Repeat \,\, until \, \, convergence \left \{\! \Theta _{i}:=\Theta_{i}- a\frac{\partial J\left ( \Theta \right )}{\partial \Theta_{i}}\! \right \}$

ex1data1.txt

这个数据文件中包含两列，第一列是input，第二列是Output。下载链接：https://pan.baidu.com/s/1Y_T7WotaN74AVULlWKn1yw 提取码: j9b8

代码实现

cost:

def costFunction(X,y,theta):
    length=X.shape[0];
    error=np.dot(X,theta)-y;
    return np.sum(np.square(error))/(2*length)

gradient descent:

def gradientDescent(X,y,theta,alpha,numIters):
    length=y.shape[0];
    J_history=[];
    for i in range(numIters):
        theta=theta-((X*theta-y).T*X).T*alpha/length;
        J_history.append(costFunction(X,y,theta));
    J_history=np.mat(J_history).T;
    return (theta,J_history)

linear regression:

import numpy as np
import matplotlib.pylab as pl
from mpl_toolkits.mplot3d import Axes3D
data=np.mat(np.loadtxt('../data/ex1data1.txt',delimiter=','))
X=data[:,0]
y=data[:,1]
pl.plot(X,y,'rx',label='first')
pl.xlabel('population')
pl.ylabel('revenue')
X=np.column_stack((np.mat(np.ones((y.shape[0],1))),X))
print("testCostFunction:J="+str(costFunction(X,y,np.mat('0;0'))))
alpha=0.01
iterations=1500
(theta,J_history)=gradientDescent(X,y,np.mat('0;0'),alpha,iterations)
pl.plot(X[:,1],X*theta,label='model')
pl.legend()
pl.show()
pl.plot(range(iterations),J_history,'y-')
pl.xlabel('iterations')
pl.ylabel('value of costFuntion')
pl.show()
predict1=np.mat('1,3.5')*theta
print('For population = 35,000, we predict a profit of '+str(predict1[0,0]*10000))
predict2=np.mat('1,7')*theta
print('For population = 70,000, we predict a profit of '+str(predict2[0,0]*10000))

输出：

Linear Regression With Multi Variable

ex1data2.txt:

第一列和第二列为input，第三列为output。下载链接：https://pan.baidu.com/s/1s1xM5N2fcbqmKKPXkDJkfA 提取码: axbs

数据预处理：min-max标准化和Z-score标准化

min-max标准化(Min-Max Normalization)，也称线性函数归一化

定义: 对原始数据进行线性变换，使数据结果映射到0-1之间。

本质: 把数变为[0,1]之间的小数。

转换函数: (X-Min)/(Max-Min)

如果将数据映射到[-1,1]范围，则将转换公式写成 :

$\frac{X-Mean}{max-min}$

其中max为样本数据的最大值，min为样本数据的最小值，Mean表示数据的均值。

缺陷:

(1). 当有新数据加入时，可导致max和min的变化，需要重新定义。

(2). 若数据出现较大或较小的异常值，则该标准化方法存在较大的误差。

0均值标准化(Z-score standardization)

定义：通过原始数据的均值(mean)和标准差(standard deviation)进行标准化，经过处理的数据符合标准正态分布，即均值为0，标准差为1。

本质：把有量纲表达式变成无量纲表达式。

转换函数：（X-Mean）/(Standard deviation)

其中，Mean为所有样本数据的均值。Standard deviation为所有样本数据的标准差。

（转自https://blog.csdn.net/program_developer/article/details/78637711）

代码实现：

featureNormalize(Z-score standardization):

def featureNormalize(X):
    mu=X.mean(axis=0)
    sigma=X.std(axis=0)#此为总体标准差，Octave/MatLab的Std()得到的样本标准差
    X_norm=np.true_divide((X-mu),sigma)
    return (mu,sigma,X_norm)

normalEqn(标准方程法求θ)：

def normalEqn(X,y):
    return (X.T*X).I*X.T*y;

linea regression:

import numpy as np
import matplotlib.pylab as pl
data=np.mat(np.loadtxt('../data/ex1data2.txt',delimiter=','))
X=data[:,[0,1]]
y=data[:,2]
(mu,sigma,X_norm)=featureNormalize(X)
X_norm=np.column_stack((np.mat(np.ones((y.shape[0],1))),X_norm))
alpha=0.01
iterations=400
(theta,J_history)=gradientDescent(X_norm,y,np.mat('0;0;0'),alpha,iterations)
pl.plot(range(iterations),J_history,'b-')
pl.xlabel('iterations')
pl.ylabel('Cost J')
pl.show()
price=np.mat([1,np.true_divide((1650-mu[0,0]),sigma[0,0]),np.true_divide((3-mu[0,1]),sigma[0,1])])*theta;
print('Predicted price of a 1650 sq-ft, 3 br house (using gradient descent):'+str(price))
X=np.column_stack((np.mat(np.ones((y.shape[0],1))),X))
theta=normalEqn(X,y)
price=np.mat([1,1650,3])*theta;
print('Predicted price of a 1650 sq-ft, 3 br house (using normal equations):'+str(price))

输出：

Linear Regression (使用sklearn):

代码实现：

import numpy as np
import matplotlib.pylab as pl
import sklearn
import scipy.optimize as op
data=np.mat(np.loadtxt('../data/ex1data1.txt',delimiter=','))
X=data[:,0]
y=data[:,1]
#sklearn
lin_reg_model=sklearn.linear_model.LinearRegression()
lin_reg_model.fit(X,y)
print('X=3.5, we predict y=(using sklearn linear regression)'+str(lin_reg_model.predict([[3.5]])))
print('X=7, we predict y=(using sklearn linear regression)'+str(lin_reg_model.predict([[7]])))

输出：

sklearn输出

输出虽然成功了，但是与之前linear regression with one variable的输出结果并不一样。为了方便对照，下面贴上linear regression with one variable的输出。

可以看到在population=35000的时候相差非常大，猜测有可能是因为gradient descent的学习速率或迭代次数有问题，sklearn中实现的线性回归算法应该会取最佳（opt）的θ值，所以想用MatLab中的fminunc()寻找最优θ值来验证这个想法。在python的scipy库中存在minimize函数与fminunc有相同作用，故使用它来验证。代码如下：

import numpy as np
import matplotlib.pylab as pl
import sklearn
import scipy.optimize as op
data=np.mat(np.loadtxt('../data/ex1data1.txt',delimiter=','))
X=data[:,0]
y=data[:,1]
#sklearn
lin_reg_model=sklearn.linear_model.LinearRegression()
lin_reg_model.fit(X,y)
print('For population = 35,000, we predict a profit of(using sklearn linear regression) '+str(lin_reg_model.predict([[3.5]])[0,0]*10000))
print('For population = 70,000, we predict a profit of(using sklearn linear regression) '+str(lin_reg_model.predict([[7]])[0,0]*10000))
#scipy minimize = matlab fminunc
initial_theta=np.zeros((2,1))
X=np.column_stack((np.mat(np.ones((y.shape[0],1))),X))
result = op.minimize(fun=costFunction, x0=initial_theta.flatten(), args=(X, y), method='TNC', jac=gradient)
theta_opt=result.x
print('For population = 35,000, we predict a profit of(using minimize(fminunc)) '+str((np.mat(theta_opt)*np.mat('1;3.5'))[0,0]*10000))
print('For population = 70,000, we predict a profit of(using minimize(fminunc)) '+str((np.mat(theta_opt)*np.mat('1;7'))[0,0]*10000))

实验输出：

两者的输出基本相同，从而验证了我的假设是正确的！

lcjia_you

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ML线性回归 python实现

最近在学习Andrew Ng的课程，仅做完Octave的编程作业感觉并没有理解算法的精髓。故打算从今天开始，将重要的ML算法用Python重新实现一遍，加深印象，提高自己实际应用的能力。编程环境：Windows10+Jupyter Lab+Python3.6+Numpy+Matplotlib+scipy+sklearn;数据：ex1data1.txt,ex1data2.txt(文中有下载链接)...
复制链接

扫一扫

专栏目录