预测模型
预测模型是一种用于预测未来事件或未知数据的模型。它通过学习历史数据的规律,推断出未来的结果。
回归模型
回归模型是一种用于建模因变量(目标变量)与一个或多个自变量(特征变量)之间关系的统计模型。它通常用于预测连续值。
一元线性回归
回归模型
我有两个样本(1,2)和(2,3),那么很明显,此时回归模型的k和d分别是1,1。
y
=
x
+
1
y=x+1
y=x+1
如果我现在增加了一个样本(3,6),它并不满足
y
=
x
+
1
y=x+1
y=x+1。那么此时,如何让模型能够预测样本4最可能出现的地方?
那么这个模型必须满足对之前3个样本的y轴上的距离之和最小。(这里可不是垂直于要求的直线方程,而是y轴上,因为y是你模型根据特征x所预测的值,y轴方向上越靠近,才是模型预测的越准确)
因为距离是个正数,所以可以用距离的平方来代替。假设每一个样本在模型中输入特征
x
i
x_i
xi,所得到的预测值)为
y
i
y_i
yi,那么我们要做的事情就是求一组k和d的值,能够使得所有样本的y轴距离(的平方)最小:
arg
min
k
,
d
∑
i
=
1
m
(
y
i
−
y
^
i
)
2
\arg \min_{k,d} \sum _{i=1}^m(y_i-\hat y_i)^2
argk,dmini=1∑m(yi−y^i)2
因为特征值
x
i
x_i
xi对于
y
i
y_i
yi和
y
^
i
\hat y_i
y^i是相同的,可以把
y
^
i
=
k
x
i
+
d
\hat y_i=kx_i+d
y^i=kxi+d代入,得到
arg
min
k
,
d
∑
i
=
1
m
(
y
i
−
k
x
i
−
d
)
2
\arg \min_{k,d} \sum _{i=1}^m(y_i-kx_i-d)^2
argk,dmini=1∑m(yi−kxi−d)2
视最小化误差平方和为多元函数f(k,d)
(1)对 d d d求偏导
设
f
(
k
,
d
)
=
∑
i
=
1
m
(
y
i
−
k
x
i
−
d
)
2
f(k,d)=\sum _{i=1}^m(y_i-kx_i-d)^2
f(k,d)=∑i=1m(yi−kxi−d)2,求d的偏导数,并且令导数是为0;
∂
f
∂
d
0
=
∑
i
=
1
m
(
y
i
−
k
x
i
−
d
0
)
2
=
0
\frac{\partial f}{\partial d_0}=\sum _{i=1}^m(y_i-kx_i-d_0)^2=0
∂d0∂f=i=1∑m(yi−kxi−d0)2=0
只有0的平方才是0,因此
∑
i
=
1
m
2
(
y
i
−
k
x
i
−
d
0
)
=
0
\sum _{i=1}^m2(y_i-kx_i-d_0)=0
i=1∑m2(yi−kxi−d0)=0
两遍同时除以数量m,得到平均数
∑
i
=
1
m
(
y
i
−
k
x
i
−
d
0
)
m
=
0
\frac {\sum_{i=1}^m(y_i-kx_i-d_0)}{m} =0
m∑i=1m(yi−kxi−d0)=0
∑
i
=
1
m
y
i
−
∑
i
=
1
m
k
x
i
−
∑
i
=
1
m
d
0
)
m
=
0
\frac {\sum_{i=1}^m y_i-\sum_{i=1}^m kx_i- \sum_{i=1}^m d_0)}{m} =0
m∑i=1myi−∑i=1mkxi−∑i=1md0)=0
y
‾
−
x
‾
k
−
m
d
0
m
=
0
\frac {\overline y -\overline x k -md_0}{m} =0
my−xk−md0=0
y
‾
−
x
‾
k
−
d
0
=
0
\overline y -\overline x k -d_0=0
y−xk−d0=0
d
0
=
y
‾
−
x
‾
k
d_0=\overline y -\overline x k
d0=y−xk
(2)对 k k k求偏导
还是 f ( k , d ) = ∑ i = 1 m ( y i − k 0 x i − d 0 ) 2 f(k,d)=\sum _{i=1}^m(y_i-k_0x_i-d_0)^2 f(k,d)=∑i=1m(yi−k0xi−d0)2=0,对 k 0 k_0 k0求偏导
先把误差平方和公式的平方展开
(
y
i
−
k
0
x
i
−
d
0
)
2
=
(
y
i
−
k
0
x
i
)
2
−
2
(
y
i
−
d
0
)
(
k
0
x
i
)
+
(
k
0
x
i
)
2
(y_i-k_0x_i-d_0)^2=(y_i-k_0x_i)^2-2(y_i-d_0)(k_0x_i)+(k_0x_i)^2
(yi−k0xi−d0)2=(yi−k0xi)2−2(yi−d0)(k0xi)+(k0xi)2
对每一项分别求导
∂
(
(
y
i
−
k
0
x
i
)
2
)
∂
k
0
=
0
\frac{\partial ((y_i-k_0x_i)^2)}{ \partial k_0 } = 0
∂k0∂((yi−k0xi)2)=0
∂
(
(
−
2
(
y
i
−
d
0
)
(
k
0
x
i
)
)
∂
k
0
=
−
2
(
y
i
−
d
0
)
(
x
i
)
\frac{\partial ((-2(y_i-d_0)(k_0x_i))}{ \partial k_0 } = -2(y_i-d_0)(x_i)
∂k0∂((−2(yi−d0)(k0xi))=−2(yi−d0)(xi)
∂
(
(
k
0
x
i
)
2
)
∂
k
0
=
2
k
0
x
i
2
\frac{\partial ((k_0x_i)^2)}{ \partial k_0 } = 2k_0x_i^2
∂k0∂((k0xi)2)=2k0xi2
合并结果后得到
∂
f
∂
k
0
=
∑
i
=
1
m
(
−
2
(
y
i
−
d
0
)
(
x
i
)
+
2
k
0
x
i
2
)
\frac{\partial f}{ \partial k_0 } = \sum_{i=1}^m(-2(y_i-d_0)(x_i)+2k_0x_i^2)
∂k0∂f=i=1∑m(−2(yi−d0)(xi)+2k0xi2)
令偏导为0
∑
i
=
1
m
(
−
2
(
y
i
−
d
0
)
(
x
i
)
+
2
k
0
x
i
2
)
=
0
\sum_{i=1}^m(-2(y_i-d_0)(x_i)+2k_0x_i^2) =0
i=1∑m(−2(yi−d0)(xi)+2k0xi2)=0
两遍同时除以2
∑
i
=
1
m
(
−
(
y
i
−
d
0
)
(
x
i
)
+
k
0
x
i
2
)
=
0
\sum_{i=1}^m(-(y_i-d_0)(x_i)+k_0x_i^2) =0
i=1∑m(−(yi−d0)(xi)+k0xi2)=0
− ∑ i = 1 m ( y i − d 0 ) ( x i ) + ∑ i = 1 m k 0 x i 2 = 0 -\sum_{i=1}^m(y_i-d_0)(x_i)+\sum_{i=1}^mk_0x_i^2 =0 −i=1∑m(yi−d0)(xi)+i=1∑mk0xi2=0
∑ i = 1 m ( y i − d 0 ) ( x i ) = ∑ i = 1 m k 0 x i 2 \sum_{i=1}^m(y_i-d_0)(x_i)=\sum_{i=1}^mk_0x_i^2 i=1∑m(yi−d0)(xi)=i=1∑mk0xi2
带入 d 0 = y ‾ − x ‾ k 0 d_0=\overline y -\overline x k_0 d0=y−xk0
∑ i = 1 m ( y i − y ‾ + x ‾ k 0 ) ( x i ) = ∑ i = 1 m k 0 x i 2 \sum_{i=1}^m(y_i-\overline y +\overline x k_0)(x_i)=\sum_{i=1}^mk_0x_i^2 i=1∑m(yi−y+xk0)(xi)=i=1∑mk0xi2
展开括号
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
+
∑
i
=
1
m
(
(
x
‾
k
0
)
x
i
)
=
∑
i
=
1
m
k
0
x
i
2
\sum_{i=1}^m((y_i-\overline y)x_i)+\sum_{i=1}^m((\overline x k_0)x_i)=\sum_{i=1}^mk_0x_i^2
i=1∑m((yi−y)xi)+i=1∑m((xk0)xi)=i=1∑mk0xi2
把
k
0
k_0
k0放到一遍
∑
i
=
1
m
k
0
x
i
2
−
∑
i
=
1
m
(
(
x
‾
k
0
)
x
i
)
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
\sum_{i=1}^mk_0x_i^2 -\sum_{i=1}^m((\overline x k_0)x_i)=\sum_{i=1}^m((y_i-\overline y)x_i)
i=1∑mk0xi2−i=1∑m((xk0)xi)=i=1∑m((yi−y)xi)
k
0
∑
i
=
1
m
x
i
2
−
k
0
∑
i
=
1
m
(
(
x
‾
)
x
i
)
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
k_0\sum_{i=1}^mx_i^2 - k_0\sum_{i=1}^m((\overline x)x_i)=\sum_{i=1}^m((y_i-\overline y)x_i)
k0i=1∑mxi2−k0i=1∑m((x)xi)=i=1∑m((yi−y)xi)
k
0
(
∑
i
=
1
m
x
i
2
−
x
‾
∑
i
=
1
m
x
i
)
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
k_0(\sum_{i=1}^mx_i^2 - \overline x\sum_{i=1}^mx_i)=\sum_{i=1}^m((y_i-\overline y)x_i)
k0(i=1∑mxi2−xi=1∑mxi)=i=1∑m((yi−y)xi)
k
0
∑
i
=
1
m
(
x
i
2
−
x
‾
x
i
)
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
k_0\sum_{i=1}^m(x_i^2 - \overline xx_i)=\sum_{i=1}^m((y_i-\overline y)x_i)
k0i=1∑m(xi2−xxi)=i=1∑m((yi−y)xi)
标记下面这一步的分母
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
(
x
i
−
x
‾
)
x
i
)
k_0=\frac {\sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)x_i)}
k0=∑i=1m((xi−x)xi)∑i=1m((yi−y)xi)
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
(
x
i
−
x
‾
)
(
x
i
−
x
‾
+
x
‾
)
)
k_0=\frac {\sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)(x_i-\overline x+\overline x))}
k0=∑i=1m((xi−x)(xi−x+x))∑i=1m((yi−y)xi)
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
(
x
i
−
x
‾
)
(
x
i
−
x
‾
+
x
‾
)
)
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)(x_i-\overline x+\overline x))}
k0=∑i=1m((xi−x)(xi−x+x))∑i=1m((yi−y)xi)
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
(
x
i
−
x
‾
)
2
+
x
‾
(
x
i
−
x
‾
)
)
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)^2+\overline x(x_i-\overline x))}
k0=∑i=1m((xi−x)2+x(xi−x))∑i=1m((yi−y)xi)
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
(
x
i
−
x
‾
)
2
+
∑
i
=
1
m
(
x
i
−
x
‾
)
x
‾
)
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)^2+ \sum_{i=1}^m(x_i-\overline x)\overline x)}
k0=∑i=1m((xi−x)2+∑i=1m(xi−x)x)∑i=1m((yi−y)xi)
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
(
x
i
−
x
‾
)
2
+
x
‾
∑
i
=
1
m
(
x
i
−
x
‾
)
)
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m((x_i - \overline x)^2+ \overline x \sum_{i=1}^m(x_i-\overline x))}
k0=∑i=1m((xi−x)2+x∑i=1m(xi−x))∑i=1m((yi−y)xi)
此时要看准分母中的
∑
i
=
1
m
(
x
i
−
x
‾
)
\sum_{i=1}^m(x_i-\overline x)
∑i=1m(xi−x)是必等于0的,因此
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
x
i
)
∑
i
=
1
m
(
x
i
−
x
‾
)
2
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)x_i)}{\sum_{i=1}^m(x_i - \overline x)^2}
k0=∑i=1m(xi−x)2∑i=1m((yi−y)xi)
在另外一个层面上,也就说明原来的分母式子
∑
i
=
1
m
(
(
x
i
−
x
‾
)
x
i
)
\sum_{i=1}^m((x_i - \overline x)x_i)
∑i=1m((xi−x)xi)等于现在的分母式子
∑
i
=
1
m
(
x
i
−
x
‾
)
2
\sum_{i=1}^m(x_i - \overline x)^2
∑i=1m(xi−x)2,同理得出
x
i
=
(
x
i
−
x
‾
)
x_i=(xi- \overline x)
xi=(xi−x),将这个式子带入分子,得到
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
(
x
i
−
x
‾
)
)
∑
i
=
1
m
(
x
i
−
x
‾
)
2
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)(xi- \overline x))}{\sum_{i=1}^m(x_i - \overline x)^2}
k0=∑i=1m(xi−x)2∑i=1m((yi−y)(xi−x))
至此,推理完成。对于斜率k和截距b的回归模型总结如下,这个公式又叫“最小二乘法公式”:
k
0
=
∑
i
=
1
m
(
(
y
i
−
y
‾
)
(
x
i
−
x
‾
)
)
∑
i
=
1
m
(
x
i
−
x
‾
)
2
k_0=\frac{ \sum_{i=1}^m((y_i-\overline y)(xi- \overline x))}{\sum_{i=1}^m(x_i - \overline x)^2}
k0=∑i=1m(xi−x)2∑i=1m((yi−y)(xi−x))
d
0
=
y
‾
−
x
‾
k
d_0=\overline y -\overline x k
d0=y−xk
再解释一下就是伴随着新的样本增加进来,我们可以通过样本训练,反推出预测模型的最优参数k和b。仔细想想,之前有一些概率模型都是有自己特有的参数的,通过不断修改这些参数的值,可以使其预测更加准确。
虽然这个推理是挺简单的,但是我搞了两天的时间才推理出来。终于可以开香槟了