线性回归_怎么得到每年每个样本的残差做为变量-CSDN博客

一、一元线性回归

一元线性回归：一个响应变量和一个解释变量的一元问题。

1.分析匹萨的直径与价格的数据的线性关系，数据如下

2. 根据样本绘制散点图：

程序：

defrunplt():
    plt.figure()
    plt.title('匹萨价格与直径数据',fontproperties=font)
    plt.xlabel('直径（英寸）',fontproperties=font)
    plt.ylabel('价格（美元）',fontproperties=font)
    plt.axis([0,25,0,25])
    plt.grid(True)
    returnplt

import matplotlib.pyplotasplt
from matplotlib.font_managerimportFontProperties
font = FontProperties(fname=r"c:\windows\fonts\msyh.ttc",size=10)
plt = runplt()
X = [[6],[8],[10],[14],[18]]
y = [[7],[9],[13],[17.5],[18]]
plt.plot(X,y,'k.')
plt.show()

Python2.7会报如下错误：

解决方案，在import前加上下面三行即可

importsys
reload(sys)
sys.setdefaultencoding('utf-8')

3、用scikit-learn来构建模型

程序：

from sklearn.linear_modelimport LinearRegression
# 创建并拟合模型

X = [[6],[8],[10],[14],[18]]
y = [[7],[9],[13],[17.5],[18]]
model = LinearRegression()
model.fit(X,y)
print('预测一张12英寸匹萨价格：$%.2f'% model.predict([[12]])[0])
plt = runplt()
plt.plot(X,y,'k.')
X2 = [[0],[10],[14],[25]]
y2 = model.predict(X2)
plt.plot(X2,y2,'g-')
# 残差预测值
yr = model.predict(X)
for idx,xinenumerate(X):
plt.plot([x,x],[y[idx],yr[idx]],'r-')
plt.show()

上述代码中sklearn.linear_model.LinearRegression类是一个估计器（estimator）。在scikit-learn里面，所有的估计器都带有fit()和predict()方法。fit()用来分析模型参数，predict()是通过fit()算出的模型参数构成的模型，对解释变量进行预测获得的值。LinearRegression类的fit()方法学习下面的一元线性回归模型：

一元线性回归拟合模型的参数估计常用方法是普通最小二乘法（ordinary least squares ）或线性最小二乘法（linear least squares）。成本函数（cost function）也叫损失函数（loss function），用来定义模型与观测值的误差。

3、模型评估

R方也叫确定系数（coefficient of determination），表示模型对现实数据拟合的程度。

加入模型评估程序

X_test = [[8],[9],[11],[16],[12]]
y_test = [[11],[8.5],[15],[18],[11]]
model = LinearRegression()
model.fit(X,y)
print model.score(X_test,y_test)

0.662005

二、多项式回归

PolynomialFeatures转换器可以用来解决多项式回归问题

程序：

importnumpyasnp
from sklearn.linear_modelimportLinearRegression
from sklearn.preprocessingimportPolynomialFeatures
X_train = [[6],[8],[10],[14],[18]]
y_train = [[7],[9],[13],[17.5],[18]]
X_test = [[6],[8],[11],[16]]
y_test = [[8],[12],[15],[18]]
regressor = LinearRegression()
regressor.fit(X_train,y_train)
xx = np.linspace(0,26,100)
yy = regressor.predict(xx.reshape(xx.shape[0],1))
plt = runplt()
plt.plot(X_train,y_train,'k.')
plt.plot(xx,yy)

quadratic_featurizer = PolynomialFeatures(degree=2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)
regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic,y_train)
xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0],1))
plt.plot(xx,regressor_quadratic.predict(xx_quadratic),'r-')

cubic_featurizer = PolynomialFeatures(degree=3)
X_train_cubic = cubic_featurizer.fit_transform(X_train)
X_test_cubic = cubic_featurizer.transform(X_test)
regressor_cubic = LinearRegression()
regressor_cubic.fit(X_train_cubic,y_train)
xx_cubic = cubic_featurizer.transform(xx.reshape(xx.shape[0],1))
plt.plot(xx,regressor_cubic.predict(xx_cubic),'b*')
plt.show()
print("一元线性回归 r-squared：%.2f"%(regressor.score(X_test,y_test)))
print("二次回归 r-squared：%.2f"%(regressor_quadratic.score(X_test_quadratic,y_test)))
print("二次回归 r-squared：%.2f"%(regressor_cubic.score(X_test_cubic ,y_test)))

运行结果：

三、多元线性回归

训练样本：

X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]

y = [[7], [9], [13], [17.5], [18]]

测试样本：

X_test = [[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]

y_test = [[11], [8.5], [15], [18], [11]]

程序：

fromsklearn.linear_modelimport LinearRegression
X = [[6,2],[8,1],[10,0],[14,2],[18,0]]
y = [[7],[9],[13],[17.5],[18]]
model = LinearRegression()
model.fit(X,y)
X_test = [[8,2],[9,0],[11,2],[16,2],[12,0]]
y_test = [[11],[8.5],[15],[18],[11]]

predictions = model.predict(X_test)
for i,predictioninenumerate(predictions):
print('Predicted: %s, Target: %s'% (prediction,y_test[i]))
print('R-squared: %.2f'% model.score(X_test,y_test))

运行结果：