- 解决回归问题
- 思想简单,实现容易
- 许多强大的非线性模型的基础
- 结果具有很好的可解释性
- 蕴含机器学习中的很多重要思想
回归问题:连续值
如果样本 特征 只有一个 称为简单线性回归 y=ax + b
通过 训练 数据集 预测出来的值我们希望它和真实值 之间差距尽可能的小
y
(
i
)
−
y
^
y^{(i)} - \hat{y}
y(i)−y^
如果想要计算距离 我们自然会想到可以使用绝对值
∣
y
(
i
)
−
y
^
(
i
)
∣
|y^{(i)} - \hat{y}^{(i)}|
∣y(i)−y^(i)∣
绝对值在计算中不是特别好的方式 方程不可导 没法求最优解
(
y
(
i
)
−
y
^
(
i
)
)
2
(y^{(i)} - \hat{y}^{(i)})^2
(y(i)−y^(i))2
另外计算距离 还可以想到的是 使用平方计算 并且他是一个凸函数 连续且处处可导在后续计算中会比较方便求极值。(凸优化中的最小二乘法)
考虑所有样本:
:
(
i
)
(i)
(i) 特征个数
:
(
m
)
(m)
(m) 为样本点个数
a
r
g
m
i
n
∑
i
=
1
m
(
y
(
i
)
−
y
^
(
i
)
)
2
{argmin}\sum_{i=1}^{m}(y^{(i)} - \hat{y}^{(i)})^2
argmin∑i=1m(y(i)−y^(i))2
由于
y
^
(
i
)
=
a
x
(
i
)
+
b
\hat{y}^{(i)} = ax^{(i)} + b
y^(i)=ax(i)+b
带入得
a
r
g
m
i
n
∑
i
=
1
m
(
y
(
i
)
−
a
x
(
i
)
−
b
)
2
{argmin}\sum_{i=1}^{m}(y^{(i)} - ax^{(i)} - b)^2
argmin∑i=1m(y(i)−ax(i)−b)2
目标找到 a 和 b 使 方程
∑
i
=
1
m
(
y
(
i
)
−
a
x
(
i
)
−
b
)
2
\sum_{i=1}^{m}(y^{(i)} - ax^{(i)} - b)^2
∑i=1m(y(i)−ax(i)−b)2尽可能的小.
∑ i = 1 m ( y ( i ) − a x ( i ) − b ) 2 \sum_{i=1}^{m}(y^{(i)} - ax^{(i)} - b)^2 ∑i=1m(y(i)−ax(i)−b)2 一般称为 损失函数(loss function).也可以成为 效用函数(utility function)
由线性回归 可以了解到机器学习算法的解决问题的一般思路:
通过分析问题,确定问题的损失偶函数或效用函数。
通过优化损失函数或者效用函数,获得机器学习的模型。
包括但不限于以下算法都是使用这种思路:
-
线性回归
-
SVM
-
多项式回归
-
神经网络
-
逻辑回归
-
…
最小二乘法 求导简单证明:
∑ i = 1 m ( y ( i ) − a x ( i ) − b ) 2 \sum_{i=1}^{m}(y^{(i)} - ax^{(i)} - b)^2 ∑i=1m(y(i)−ax(i)−b)2
对a求导: ∂ J ( a , b ) ∂ a = 0 \frac{\partial{J(a,b)}}{\partial{a}} = 0 ∂a∂J(a,b)=0
对b求导: ∂ J ( a , b ) ∂ b = 0 \frac{\partial{J(a,b)}}{\partial{b}} = 0 ∂b∂J(a,b)=0
∂ J ( a , b ) ∂ b = 0 \frac{\partial{J(a,b)}}{\partial{b}} = 0 ∂b∂J(a,b)=0 = ∑ i = 1 m 2 ( y ( i ) − a x ( i ) − b ) ( − 1 ) \sum_{i=1}^m2(y^{(i)}-ax^{(i)}-b)(-1) ∑i=1m2(y(i)−ax(i)−b)(−1)
∑ i = 1 m 2 ( y ( i ) − a x ( i ) − b ) ( − 1 ) \sum_{i=1}^m2(y^{(i)}-ax^{(i)}-b)(-1) ∑i=1m2(y(i)−ax(i)−b)(−1) = 0
等式左右2便 除以 -2得:
∑ i = 1 m y ( i ) − a ∑ i = 1 m x ( i ) − ∑ i = 1 m b \sum_{i=1}^my^{(i)}-a\sum_{i=1}^mx^{(i)}-\sum_{i=1}^mb ∑i=1my(i)−a∑i=1mx(i)−∑i=1mb = 0
由于b是常数 mb = ∑ i = 1 m b \sum_{i=1}^mb ∑i=1mb
∑ i = 1 m y ( i ) − a ∑ i = 1 m x ( i ) − m b \sum_{i=1}^my^{(i)}-a\sum_{i=1}^mx^{(i)}-mb ∑i=1my(i)−a∑i=1mx(i)−mb = 0
m b mb mb = ∑ i = 1 m y ( i ) − a ∑ i = 1 m x ( i ) \sum_{i=1}^my^{(i)} - a\sum_{i=1}^mx^{(i)} ∑i=1my(i)−a∑i=1mx(i)
由于 ∑ i = 1 m y ( i ) = m y ˉ \sum_{i=1}^my^{(i)}=m\bar{y} ∑i=1my(i)=myˉ
a ∑ i = 1 m x ( i ) m = a x ˉ a\frac{\sum_{i=1}^mx^{(i)}}{m}=a\bar{x} am∑i=1mx(i)=axˉ m个x的和除以m得到的是x的均值
可得:
b = y ˉ − a x ˉ b =\bar{y}-a\bar{x} b=yˉ−axˉ
J所以对b求导得: b = y ˉ − a x ˉ b =\bar{y}-a\bar{x} b=yˉ−axˉ
再求 J对a求导:
∂
J
(
a
,
b
)
∂
a
=
0
\frac{\partial{J(a,b)}}{\partial{a}} = 0
∂a∂J(a,b)=0 =
∑
i
=
1
m
2
(
y
(
i
)
−
a
x
(
i
)
−
b
)
(
−
x
(
i
)
)
\sum_{i=1}^m2(y^{(i)}-ax^{(i)}-b)(-x^{(i)})
∑i=1m2(y(i)−ax(i)−b)(−x(i))
提出-2得:
∑
i
=
1
m
(
y
(
i
)
−
a
x
(
i
)
−
b
)
x
(
i
)
=
0
\sum_{i=1}^m(y^{(i)}-ax^{(i)}-b)x^{(i)}=0
∑i=1m(y(i)−ax(i)−b)x(i)=0
把上面求出来的
b
=
y
ˉ
−
a
x
ˉ
b =\bar{y}-a\bar{x}
b=yˉ−axˉ带入得:
∑
i
=
1
m
(
y
(
i
)
−
a
x
(
i
)
−
y
ˉ
+
a
x
ˉ
)
x
(
i
)
=
0
\sum_{i=1}^m(y^{(i)}-ax^{(i)}-\bar{y}+a\bar{x})x^{(i)}=0
∑i=1m(y(i)−ax(i)−yˉ+axˉ)x(i)=0
∑
i
=
1
m
(
x
(
i
)
y
(
i
)
−
a
x
(
i
)
x
(
i
)
−
y
ˉ
x
(
i
)
+
a
x
ˉ
x
(
i
)
)
=
0
\sum_{i=1}^m(x^{(i)}y^{(i)}-ax^{(i)}x^{(i)}-\bar{y}x^{(i)}+a\bar{x}x^{(i)})=0
∑i=1m(x(i)y(i)−ax(i)x(i)−yˉx(i)+axˉx(i))=0
∑
i
=
1
m
(
x
(
i
)
y
(
i
)
−
y
ˉ
x
(
i
)
−
a
x
(
i
)
x
(
i
)
+
a
x
ˉ
x
(
i
)
)
=
0
\sum_{i=1}^m(x^{(i)}y^{(i)}-\bar{y}x^{(i)}-ax^{(i)}x^{(i)}+a\bar{x}x^{(i)})=0
∑i=1m(x(i)y(i)−yˉx(i)−ax(i)x(i)+axˉx(i))=0
∑
i
=
1
m
(
x
(
i
)
y
(
i
)
−
x
(
i
)
y
ˉ
)
−
∑
i
=
1
m
(
a
(
x
(
i
)
)
2
−
a
x
ˉ
x
(
i
)
)
\sum_{i=1}^m(x^{(i)}y{(i)}-x^{(i)}\bar{y}) - \sum_{i=1}^m(a(x^{(i)})^2 - a\bar{x}x^{(i)})
∑i=1m(x(i)y(i)−x(i)yˉ)−∑i=1m(a(x(i))2−axˉx(i))
化简可得:
∑
i
=
1
m
(
x
(
i
)
y
(
i
)
−
x
(
i
)
y
ˉ
)
−
a
∑
i
=
1
m
(
(
x
(
i
)
)
2
−
x
ˉ
x
(
i
)
)
=
0
\sum_{i=1}^m(x^{(i)}y{(i)}-x^{(i)}\bar{y}) - a\sum_{i=1}^m((x^{(i)})^2 - \bar{x}x^{(i)})=0
∑i=1m(x(i)y(i)−x(i)yˉ)−a∑i=1m((x(i))2−xˉx(i))=0
a
=
∑
i
=
1
m
(
x
(
i
)
y
(
i
)
−
x
(
i
)
y
ˉ
)
∑
i
=
1
m
(
(
x
(
i
)
)
2
−
x
ˉ
x
(
i
)
)
a = \frac{\sum_{i=1}^m(x^{(i)}y{(i)}-x^{(i)}\bar{y})}{\sum_{i=1}^m((x^{(i)})^2 - \bar{x}x^{(i)})}
a=∑i=1m((x(i))2−xˉx(i))∑i=1m(x(i)y(i)−x(i)yˉ)
变换公式:
∑
i
=
1
m
x
(
i
)
y
ˉ
=
y
ˉ
∑
i
=
1
m
x
(
i
)
=
m
y
ˉ
x
ˉ
=
x
ˉ
∑
i
=
1
m
y
(
i
)
=
∑
i
=
1
m
x
ˉ
y
(
i
)
\sum_{i=1}^mx^{(i)}\bar{y}=\bar{y}\sum_{i=1}^{m}x^{(i)}=m\bar{y}\bar{x}=\bar{x}\sum_{i=1}^{m}y^{(i)} =\sum_{i=1}^{m} \bar{x}y^{(i)}
i=1∑mx(i)yˉ=yˉi=1∑mx(i)=myˉxˉ=xˉi=1∑my(i)=i=1∑mxˉy(i)
m
y
ˉ
x
ˉ
=
∑
i
=
1
m
y
ˉ
x
ˉ
m\bar{y}\bar{x}=\sum_{i=1}^{m}\bar{y}\bar{x}
myˉxˉ=i=1∑myˉxˉ
公式a 可变为(变换后的公式能用矩阵表示在计算机中是非常方便的):
a
=
∑
i
=
1
m
(
x
(
i
)
y
(
i
)
−
x
(
i
)
y
ˉ
−
x
ˉ
y
(
i
)
+
x
ˉ
⋅
y
ˉ
)
∑
i
=
1
m
(
(
x
(
i
)
)
−
x
ˉ
x
(
i
)
−
x
ˉ
x
i
−
x
ˉ
2
)
a =\frac{\sum_{i=1}^{m}(x^{(i)}y^{(i)}-x^{(i)}\bar{y}-\bar{x}y^{(i)}+\bar{x}\cdot \bar{y})}{\sum_{i=1}^{m}((x^{(i)})-\bar xx^{(i)}-\bar xx^{i}-\bar {x}^{2} )}
a=∑i=1m((x(i))−xˉx(i)−xˉxi−xˉ2)∑i=1m(x(i)y(i)−x(i)yˉ−xˉy(i)+xˉ⋅yˉ)=
∑
i
=
1
m
(
x
(
i
)
−
x
ˉ
)
(
y
(
i
)
−
y
ˉ
)
∑
i
=
1
m
(
x
(
i
)
−
x
ˉ
)
2
\frac{\sum_{i=1}^{m}(x^{(i)}-\bar{x})(y^{(i)}-\bar{y})}{\sum_{i=1}^{m}(x^{(i)}-\bar{x})^2}
∑i=1m(x(i)−xˉ)2∑i=1m(x(i)−xˉ)(y(i)−yˉ)
最后可得:
b
=
y
ˉ
−
a
x
ˉ
b =\bar{y}-a\bar{x}
b=yˉ−axˉ
a
=
∑
i
=
1
m
(
x
(
i
)
−
x
ˉ
)
(
y
(
i
)
−
y
ˉ
)
∑
i
=
1
m
(
x
(
i
)
−
x
ˉ
)
2
a =\frac{\sum_{i=1}^{m}(x^{(i)}-\bar{x})(y^{(i)}-\bar{y})}{\sum_{i=1}^{m}(x^{(i)}-\bar{x})^2}
a=∑i=1m(x(i)−xˉ)2∑i=1m(x(i)−xˉ)(y(i)−yˉ)
变换成矩阵表示:
x
=
(
x
1
,
x
2
,
x
3
,
.
.
.
.
.
,
x
m
)
T
x =(x_1,x_2,x_3,.....,x_m)^T
x=(x1,x2,x3,.....,xm)T
x
d
=
(
x
1
−
x
ˉ
,
x
2
−
x
ˉ
,
x
3
−
x
ˉ
,
.
.
.
.
.
,
x
m
−
x
ˉ
)
T
x_d =(x_1 - \bar x,x_2- \bar x,x_3- \bar x,.....,x_m- \bar x)^T
xd=(x1−xˉ,x2−xˉ,x3−xˉ,.....,xm−xˉ)T
y
=
(
y
1
,
y
2
,
y
3
,
.
.
.
.
.
,
y
m
)
T
y =(y_1,y_2,y_3,.....,y_m)^T
y=(y1,y2,y3,.....,ym)T
y
d
=
(
y
1
−
y
ˉ
,
y
2
−
y
ˉ
,
y
3
−
y
ˉ
,
.
.
.
.
.
,
y
m
−
y
ˉ
)
T
y_d =(y_1 - \bar y,y_2 - \bar y,y_3 - \bar y,.....,y_m - \bar y)^T
yd=(y1−yˉ,y2−yˉ,y3−yˉ,.....,ym−yˉ)T
a = x d T y d x d T x d a = \frac {x_d^Ty_d}{x_d^Tx_d} a=xdTxdxdTyd b= y d − a x d y_d-ax_d yd−axd
代码实现:
导入需要的模块
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
#随机创建 25个点
x = np.arange(25)
#将x映射成x^2 +1 上的点y
y = x ** 2 + 1
画出图形
plt.scatter(x,y)
plt.axis([0,50,0,600])
plt.show()
#求x,y的均值
x_mean = np.mean(x)
y_mean = np.mean(y)
非向量化计算a,b需要循环m个样本遍历 效率较低
num = 0.0
d = 0.0
for x_i, y_i in zip(x,y):
#遍历计算a 公式的分子部分
num +=(x_i - x_mean) *(y_i -y_mean)
#计算a 分母部分
d += (x_i - x_mean) ** 2
#根据公式计算出a 和 b
a = num / d
b = y_mean - a * x_mean
使用循环 1.1m 和 使用矩阵 20ms 性能差距 50倍
向量计算方式
num = (x_train - x_mean).dot(y_train - y_mean)
d = (x_train - x_mean).dot (x_train - x_mean)
计算出来a和b 来看看训练效果
y_predict = a * x + b
plt.scatter(x,y)
plt.plot(x,y_predict,color ='r')
一元线性回归是一条 直线 对于高次方程效果较差。图中对二次方程的拟合
封装算法
import numpy as np
class SimpleLinearRegression:
def __init__(self):
"""初始化 Simple Linear Regression 模型"""
self.a_ =None
self.b_ =None
def fit(self,x_train,y_train):
"""根据数据及x_train,y_train 训练Simple Linear Regression 模型"""
assert x_train.ndim == 1,\
"Simple Linear Regressor can only only solve single feature traing data."
assert len(x_train) == len(y_train),\
"the size of x_train must be equal to the size of y_train"
x_mean = np.mean(x_train)
y_mean = np.mean(y_train)
num = (x_train - x_mean).dot(y_train - y_mean)
d = (x_train - x_mean).dot (x_train - x_mean)
self.a_ = num / d
self.b_ = y_mean - self.a_ * x_mean
return self
def predict(self,x_predict):
"""给定带预测数据集x_predict"""
assert x_predict.ndim == 1, \
"Simple Linear Regressor can only only solve single feature traing data."
assert self.b_ is not None and self.a_ is not None, \
"must fit before predict!"
return [self._predict(x) for x in x_predict ]
def _predict(self,x):
return self.a_ * x + self.b_
def __repr__(self):
return "SimpleLinearRegression()"
评测标准
训练完 要怎么评价训练模型的好坏?
分类的准确度:accuracy
按照训练损失函数的公式
∑
i
=
1
m
(
y
t
e
s
t
(
i
)
−
y
^
t
e
t
s
(
i
)
)
2
\sum_{i=1}^m(y^{(i)}_{test} - \hat {y}^{(i)}_{tets})^2
∑i=1m(ytest(i)−y^tets(i))2 去计算 预测的值和真实值之间距离平方这样可行吗?
想一下这样一个场景如果有2个人 都训练自己的模型 但是人家误差只有几十 你却有几百 难道他的模型就一定比你好吗?
和m有关? 想象以下 你的测试集有100 个 而他的测试集只有10 差距能不大吗。那怎么解决呢?平均下?
-
MSE: 1 m ∑ i = 1 m ( y t e s t ( i ) − y ^ t e t s ( i ) ) 2 \frac {1}{m}\sum_{i=1}^m(y^{(i)}_{test} - \hat {y}^{(i)}_{tets})^2 m1∑i=1m(ytest(i)−y^tets(i))2 均方误差MSE(Mean Squared Error)
-
RMSE: 1 m ∑ i = 1 m ( y t e s t ( i ) − y ^ t e t s ( i ) ) 2 \sqrt{\frac {1}{m}\sum_{i=1}^m(y^{(i)}_{test} - \hat {y}^{(i)}_{tets})^2} m1∑i=1m(ytest(i)−y^tets(i))2均方根误差RMSE(Root Mean Squared Error) 由于计算差距平方了 相当于 单位平方了 比如你要预测 的是价格(万元)但是 计算的误差是 万元方 所以在观测的时候 数据可能不直观。
-
1 m ∑ i = 1 m ∣ y t e s t ( i ) − y ^ t e t s ( i ) ∣ \frac {1}{m}\sum_{i=1}^m|y^{(i)}_{test} - \hat {y}^{(i)}_{tets}| m1∑i=1m∣ytest(i)−y^tets(i)∣ 平均绝对误差MAE(Mean Absolute Error)
实现代码
def root_mean_squared_error(y_true,y_predict):
assert len(y_true) == len(y_predict),\
"the size of y_true must be equal to the size of y_predict"
return np.sqrt(np.sum((y_predict - y_true) ** 2) / len(y_true))
def mean_squared_error(y_true,y_predict):
"""MSE"""
assert len(y_true) == len(y_predict), \
"the size of y_true must be equal to the size of y_predict"
return np.sum((y_predict - y_true) ** 2) / len(y_true)
def mean_absolute_error(y_true,y_predict):
"""MAE"""
assert len(y_true) == len(y_predict), \
"the size of y_true must be equal to the size of y_predict"
return np.sum(np.absolute((y_predict - y_true))) / len(y_true)
测试代码
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from ml_utils.data_split import train_test_split
from sklearn import datasets
from LinearRegression.SimpleLinearRegression import SimpleLinearRegression
from ml_utils.metrics import mean_absolute_error,mean_squared_error,root_mean_squared_error
if __name__ == '__main__':
boston = datasets.load_boston()
x = boston.data[:, 5]
y = boston.target
x = x[y < 50.0]
y = y[y < 50.0]
x_train, y_train, x_test, y_test = train_test_split(x, y,seed = 666);
reg = SimpleLinearRegression()
reg.fit(x_train,y_train)
print(reg.a_)
print(reg.b_)
plt.scatter(x_train,y_train)
plt.plot(x_train,reg.predict(x_train),color ='r')
plt.show()
predict_y = reg.predict(x_test)
print(mean_squared_error(predict_y,y_test))
print(mean_absolute_error(predict_y,y_test))
print(root_mean_squared_error(predict_y,y_test))
R Squared
R
2
=
1
−
s
s
r
e
s
i
d
u
a
l
S
S
t
o
t
a
l
R^2 = 1- \frac {ss_{residual}}{SStotal}
R2=1−SStotalssresidual
R
2
=
1
−
∑
i
(
y
^
(
i
)
−
y
(
i
)
)
2
∑
i
(
y
^
−
y
(
i
)
)
2
R^2 = 1 - \frac {\sum_i (\hat {y} ^{(i)}-y^{(i)})^2}{\sum_i(\hat{y} - y^{(i)})^2}
R2=1−∑i(y^−y(i))2∑i(y^(i)−y(i))2
使用我们的模型预测产生的错误
使用 y =
y
^
\hat y
y^预测产生的错误
Baseline Model
- R^2 <= 1
- R 2 R^2 R2 越大越好。当我们的预测模型不犯任何错误, R 2 R^2 R2得到最大值1
- 当我们的模型等于基准模型时, R 2 R^2 R2 为 0
- 如果
R
2
R^2
R2 < 0,说明我们学习到的模型还不如基准模型。此时,很有可能我们的数据不存在任何线性关系
R 2 = 1 − ∑ i ( y ^ ( i ) − y ( i ) ) 2 / m ∑ i ( y ^ − y ˉ ) / m → 1 − M S E ( y ^ , y ) V a r ( y ) R^2 = 1 - \frac {\sum_i (\hat {y} ^{(i)}-y^{(i)})^2/m}{\sum_i(\hat{y} - \bar y)/m} →1 - \frac {MSE(\hat y,y)}{Var(y)} R2=1−∑i(y^−yˉ)/m∑i(y^(i)−y(i))2/m→1−Var(y)MSE(y^,y)
def r_Squared(y_true,y_predict):
assert len(y_true) == len(y_predict), \
"the size of y_true must be equal to the size of y_predict"
return 1- (mean_squared_error(y_true,y_predict) / np.var(y_true))