线性回归:
1.假设模型
线性模型和线性关系是不同的,线性关系一定是线性模型,而线性模型不一定是线性关系
2.优化算法
正规方程
正规方程可以比作成一个天才,只需要一次就可以求出各种权重和偏置
梯度下降
梯度下降算法可以比作一个勤奋努力的普通人,需要不断的迭代和试错
3.sklearn实现
LinearRegression
LinearRegression使用的是正规方程,正规方程的时间复杂度太大。一般不使用。
SGDRegressor
SGDRegressor使用的是梯度下降。其中,数据量在1000K以上,推荐SGDRegressor,可以调节的量有学习率、学习步长、最大迭代次数,因此我们可以采用网格搜索和交叉验证的方式进行参数调节
4.模型评估使用MSE均方差来评估
用线性回归实现波士顿房价的预测
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
# 1)数据获取
data = load_boston()
# 2)划分数据集
x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22)
# 3)特征工程,主要是数据标准化
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# 4)LinearRegression模型(正规方程)
estimator = LinearRegression()
estimator.fit(x_train, y_train)
print('正规方程:权重值量\n', estimator.coef_)
print('正规方程:偏置量\n', estimator.intercept_)
# 5)模型评估
y_predict = estimator.predict(x_test)
print('预测值\n', y_predict)
mean_squared_error(y_test, y_predict)
# 4)SGDRegressor模型(梯度下降)
estimator = SGDRegressor(learning_rate='constant', eta0=0.01, max_iter=10000)
estimator.fit(x_train, y_train)
print('梯度下降:权重值量\n', estimator.coef_)
print('梯度下降:偏置量\n', estimator.intercept_)
# 5)模型评估
y_predict = estimator.predict(x_test)
print('预测值\n', y_predict)
mean_squared_error(y_test, y_predict)
结果如下:
正规方程:权重值量 [-0.64817766 1.14673408 -0.05949444 0.74216553 -1.95515269 2.70902585 -0.07737374 -3.29889391 2.50267196 -1.85679269 -1.75044624 0.87341624 -3.91336869] 正规方程:偏置量 22.62137203166228 预测值 [28.22944896 31.5122308 21.11612841 32.6663189 20.0023467 19.07315705 21.09772798 19.61400153 19.61907059 32.87611987 20.97911561 27.52898011 15.54701758 19.78630176 36.88641203 18.81202132 9.35912225 18.49452615 30.66499315 24.30184448 19.08220837 34.11391208 29.81386585 17.51775647 34.91026707 26.54967053 34.71035391 27.4268996 19.09095832 14.92742976 30.86877936 15.88271775 37.17548808 7.72101675 16.24074861 17.19211608 7.42140081 20.0098852 40.58481466 28.93190595 25.25404307 17.74970308 38.76446932 6.87996052 21.80450956 25.29110265 20.427491 20.4698034 17.25330064 26.12442519 8.48268143 27.50871869 30.58284841 16.56039764 9.38919181 35.54434377 32.29801978 21.81298945 17.60263689 22.0804256 23.49262401 24.10617033 20.1346492 38.5268066 24.58319594 19.78072415 13.93429891 6.75507808 42.03759064 21.9215625 16.91352899 22.58327744 40.76440704 21.3998946 36.89912238 27.19273661 20.97945544 20.37925063 25.3536439 22.18729123 31.13342301 20.39451125 23.99224334 31.54729547 26.74581308 20.90199941 29.08225233 21.98331503 26.29101202 20.17329401 25.49225305 24.09171045 19.90739221 16.35154974 15.25184758 18.40766132 24.83797801 16.61703662 20.89470344 26.70854061 20.7591883 17.88403312 24.28656105 23.37651493 21.64202047 36.81476219 15.86570054 21.42338732 32.81366203 33.74086414 20.61688336 26.88191023 22.65739323 17.35731771 21.67699248 21.65034728 27.66728556 25.04691687 23.73976625 14.6649641 15.17700342 3.81620663 29.18194848 20.68544417 22.32934783 28.01568563 28.58237108]20.6275137630954梯度下降:权重值量 [-0.20772372 0.92645947 0.08913743 0.67508683 -1.81886124 3.31301882 -0.09589654 -3.41350815 2.40213736 -1.81839293 -1.99261014 0.19576098 -3.96829135] 梯度下降:偏置量 [22.90221188] 预测值 [28.69926108 32.0915701 20.79223736 32.8937585 19.84098535 19.30255953 20.58343958 18.78774851 19.06006298 34.5407245 20.61251147 28.11210719 15.14599618 20.04571413 39.40308551 18.20420392 9.77610014 18.09600492 31.33918263 24.13638754 19.47197815 35.49071922 29.88777789 18.17847282 36.04087245 26.24983122 34.35731476 27.20631046 20.23509946 14.29477105 32.01893577 15.93548213 37.60453406 14.66899567 15.72154622 17.4889796 8.96511029 20.037672 41.47414724 29.15744292 25.06806802 19.11772306 41.25061009 6.62917756 21.69709812 25.18807372 20.55495108 20.72539931 15.93705148 28.22776274 8.43071754 27.09047449 30.56231022 18.03051734 12.98449678 36.8738101 31.45091728 21.72921981 17.21819124 21.17067927 23.31278054 23.54205278 19.47221091 39.78947865 24.6728373 20.22064798 16.97877248 6.77752142 44.47977636 21.48633305 19.86995263 22.16357429 43.1219311 20.5142809 38.49077018 27.15577981 20.10382585 22.52967159 25.16014233 22.43394576 31.71259105 19.72882704 23.46073261 31.3022365 27.21230652 21.24926909 29.24259912 21.5515579 26.42326851 20.30778628 24.11303643 23.57204282 20.83597147 22.39879372 18.95797632 18.53305472 24.5827223 19.56052329 21.22868979 27.03304457 21.75240614 18.67303644 24.76943211 23.14815905 21.64948558 36.47095948 15.16765657 21.03525069 33.10610477 34.33994843 20.06509792 26.58272839 23.88930703 16.79020745 20.97231297 20.23254099 26.89956457 23.86782309 23.48138422 14.36490843 20.16681532 4.1253226 29.24471559 21.02254601 21.78107116 28.06316421 29.56039563]24.03864194527857