线性回归基本概念
- 什么是线性?
变量之间关系是一次函数,图像为一条直线。 - 什么是回归?
将变量之间的关系归结于一个值(直线)。 - 线性回归预测,通过样本特征的线性组合来进行预测的函数,即用多个变量X来预测Y。
- 特征之间是线性相关的。
- 基本形式:
f ( x ) = θ 1 x 1 + θ 2 x 2 + θ 3 x 3 + . . . θ n x n + θ 0 f(x) = \theta_1x_{1}+\theta_2x_{2}+\theta_3x_{3}+...\theta_nx_{n}+\theta_0 f(x)=θ1x1+θ2x2+θ3x3+...θnxn+θ0
单变量线性回归
-
每一个样本只有1个特征。
-
假设我们找到了最佳拟合的直线方程: y = w x + b y = wx + b y=wx+b。
-
对于每一个样本点 x i x_{i} xi , 直线方程预测值为: y i = w x i + b y_{i} = wx_{i}+b yi=wxi+b , 此样本真值为 y i ^ \hat{y_{i}} yi^,我们希望 y i y_{i} yi与 y i ^ \hat{y_{i}} yi^的差距越小越好,怎么计算差距? 均方误差: ( y i ^ − y i ) 2 (\hat{y_{i}}-y_{i})^2 (yi^−yi)2.
-
对于训练样本集,考虑所有样本: ∑ i = 1 m ( y i ^ − y i ) 2 \sum_{i=1}^m(\hat{y_{i}}-y_{i})^2 ∑i=1m(yi^−yi)2, 通常还会再乘上一个 1 m {1}\over{m} m1。
-
目标:找到最佳的 w w w和 b b b ,使 ∑ i = 1 m ( y i ^ − w x i − b ) 2 \sum_{i=1}^m(\hat{y_{i}}- wx_{i}-b)^2 ∑i=1m(yi^−wxi−b)2 尽可能的小。此方程又称为损失函数 J ( w , b ) J(w,b) J(w,b)。
-
正规方程
- 正规方程解:
w = ∑ i = 1 m ( x i − x ~ ) ( y ^ i − y ~ ) ∑ i = 1 m ( x i − x ~ ) 2 w = {{\sum_{i=1}^m(x_{i}-\tilde{x})(\hat y_{i}-\tilde{y})} \over {\sum_{i=1}^m(x_{i}-\tilde{x})^2}} w=∑i=1m(xi−x~)2∑i=1m(xi−x~)(y^i−y~) , b = y ~ − w x ~ b = {\tilde{y}-w{\tilde{x}}} b=y~−wx~ ( y ^ i 样 本 i 的 真 值 , y ~ x ~ 为 均 值 \hat y_{i}样本i的真值,\tilde{y}\tilde{x}为均值 y^i样本i的真值,y~x~为均值)
- 推导过程:
-
J ( w , b ) = ∑ i = 1 m ( y i ^ − w x i − b ) 2 J(w,b) = \sum_{i=1}^m(\hat{y_{i}}- wx_{i}-b)^2 J(w,b)=∑i=1m(yi^−wxi−b)2 ---->对于此二次函数,分别对 w w w和 b b b求偏导等于0即为 J ( w , b ) J(w,b) J(w,b)的最小值。
-
∂ J ( w , b ) ∂ b = ∑ i = 1 m 2 ( y i ^ − w x i − b ) ( − 1 ) = ∑ i = 1 m y i − w ∑ i = 1 m x i − ∑ i = 1 m b = ∑ i = 1 m y i − w ∑ i = 1 m x i − m b = 0 \frac{\partial J(w,b)}{\partial b} =\sum_{i=1}^m2(\hat{y_{i}}- wx_{i}-b) (-1) =\sum_{i=1}^my_{i}-w\sum_{i=1}^mx_{i}-\sum_{i=1}^mb = \sum_{i=1}^my_{i}-w\sum_{i=1}^mx_{i}-mb = 0 ∂b∂J(w,b)=∑i=1m2(yi^−wxi−b)(−1)=∑i=1myi−w∑i=1mxi−∑i=1mb=∑i=1myi−w∑i=1mxi−mb=0
----> ∑ i = 1 m y i − w ∑ i = 1 m x i = m b \sum_{i=1}^my_{i}-w\sum_{i=1}^mx_{i} = mb ∑i=1myi−w∑i=1mxi=mb
----> y ~ − w x ~ = b \tilde y-w\tilde x = b y~−wx~=b
- ∂ J ( w , b ) ∂ w = ∑ i = 1 m 2 ( y i ^ − w x i − b ) ( − x i ) = ∑ i = 1 m ( y i ^ − w x i − b ) x i = 0 \frac{\partial J(w,b)}{\partial w} = \sum_{i=1}^m2(\hat{y_{i}}- wx_{i}-b) (-x_{i}) =\sum_{i=1}^m(\hat{y_{i}}- wx_{i}-b) x_{i} = 0 ∂w∂J(w,b)=∑i=1m2(yi^−wxi−b)(−xi)=∑i=1m(yi^−wxi−b)xi=0
---->将上面求的b带入方程: ∑ i = 1 m ( y i ^ − w x i − y ~ + w x ~ ) x i = ∑ i = 1 m ( x i y i ^ − x i y ~ ) − w ∑ i = 1 m ( x i ) 2 − x i x ~ ) = 0 \sum_{i=1}^m(\hat{y_{i}}- wx_{i}-\tilde y+w\tilde x ) x_{i} =\sum_{i=1}^m(x_{i}\hat{y_{i}}- x_{i}\tilde y)-w\sum_{i=1}^m(x_{i})^2-x_{i}\tilde x ) = 0 ∑i=1m(yi^−wxi−y~+wx~)xi=∑i=1m(xiyi^−xiy~)−w∑i=1m(xi)2−xix~)=0
----> w = ∑ i = 1 m ( x i y i ^ − x i y ~ ) ∑ i = 1 m ( x i ) 2 − x i x ~ ) w = \frac{\sum_{i=1}^m(x_{i}\hat{y_{i}}- x_{i}\tilde y)}{\sum_{i=1}^m(x_{i})^2-x_{i}\tilde x )} w=∑i=1m(xi)2−xix~)∑i=1m(xiyi^−xiy~) , 又 ∑ i = 1 m x i y ~ = m y ~ x ~ = ∑ i = 1 m y i x ~ = ∑ i = 1 m x ~ y ~ \sum_{i=1}^mx_{i}\tilde y =m \tilde y \tilde x= \sum_{i=1}^my_{i}\tilde x= \sum_{i=1}^m\tilde x\tilde y ∑i=1mxiy~=my~x~=∑i=1myix~=∑i=1mx~y~ , 同理 ∑ i = 1 m x i x ~ \sum_{i=1}^mx_{i}\tilde x ∑i=1mxix~ 也有同样性质。
----> w = ∑ i = 1 m ( x i y i ^ − x i y ~ − y ^ i x ~ + x ~ y ~ ) ∑ i = 1 m ( x i ) 2 − x i x ~ − x i x ~ + ( x ~ ) 2 ) = ∑ i = 1 m ( x i − x ~ ) ( y ^ i − y ~ ) ∑ i = 1 m ( x i − x ~ ) 2 w =\frac{\sum_{i=1}^m(x_{i}\hat{y_{i}}- x_{i}\tilde y-\hat y_{i}\tilde x+\tilde x\tilde y)}{\sum_{i=1}^m(x_{i})^2-x_{i}\tilde x -x_{i}\tilde x+(\tilde x)^2)} = \frac{\sum_{i=1}^m(x_{i}-\tilde x)( \hat y_{i}-\tilde y)}{\sum_{i=1}^m(x_{i}-\tilde x )^2} w=∑i=1m(xi)2−xix~−xix~+(x~)2)∑i=1m(xiyi^−xiy~−y^ix~+x~y~)=∑i=1m(xi−x~)2∑i=1m(xi−x~)(y^i−y~)
多变量线性回归
- 每一个样本只有n个特征。
- 同样,找到最佳拟合的直线方程: y = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . . . + θ n x n y = \theta_0 +\theta_1x_{1}+\theta_2x_{2}+.....+\theta_nx_{n} y=θ0+θ1x1+θ2x2+.....+θnxn。
- 同样,对于训练样本集,找到最佳的 θ 0 , θ 1 , θ 2 , . . . , θ n \theta_0 ,\theta_1,\theta_2,...,\theta_n θ0,θ1,θ2,...,θn , 使得 ∑ i = 1 m ( y i ^ − y i ) 2 \sum_{i=1}^m(\hat{y_{i}}-y_{i})^2 ∑i=1m(yi^−yi)2 尽可能的小。
-
正规方程
- 对于m个样本,每个样本特征数为n的训练集,方程式:
2.最终解:
θ = ( X b T ⋅ X b ) − 1 ⋅ X b T ⋅ y ^ \theta = (X_{b}^T \cdot X_{b})^{-1} \cdot X_{b}^T \cdot \hat y θ=(XbT⋅Xb)−1⋅XbT⋅y^
Scikit-learn中的线性回归
- 正规方程线性回归(普通最小二乘线性回归)。
- API:sklearn.linear_model.LinearRegression
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
# 获取波士顿房价数据,490个样本,每个样本13个特征
boston = datasets.load_boston()
X = boston.data
Y = boston.target
# 分割训练测试数据集
X_train, X_test, y_train, y_test = train_test_split(X, Y,test_size = 0.2,random_state =666)
# 均值方差归一化
scaler = StandardScaler()
scaler.fit(X_train)
X_train_standard = standardScaler.transform(X_train)
X_test_standard = standardScaler.transform(X_test)
# 线性回归模型(正规方程)
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
# 参数w
lin_reg.coef_
# 截距b
lin_reg.intercept_
# 预测及准确率
lin_reg.predict(X_test)
lin_reg.score(X_test,y_test)
手写线性回归底层
import numpy as np
from sklearn.metrics import mean_squared_error
class LinearRegression:
def __int__(self):
# 系数
self.coef_ = None
# 截距
self.interception_ = None
# θ 即系数+截距
self._theta = None
def fit(self,x_train,y_train):
assert x_train.shape[0] == y_train.shape[0],"the size of x must equal y"
# np.ones(shape)
X_b = np.hstack([np.ones((len(x_train),1)),x_train])
# np.linalg.inv()矩阵求逆
self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)
self.coef_ = self._theta[1:]
self.interception_ = self._theta[0]
def predict(self,x_test):
assert self.interception_ is not None and self.coef_ is not None,"must fit before predict"
assert x_test.shape[1] == len(self.coef_),"the feature number of x_test must be equal to x_ train"
X_b = np.hstack([np.ones((len(x_test),1)),x_test])
return X_b.dot(self._theta)
# 准确率(R Squared评估)
def score(self,x_test,y_test):
y_predict = self.predict(x_test)
return 1-mean_squared_error(y_test, y_predict)/ np.var(y_test)
def __repr__(self)
return "Linear Regression"