定义线性回归方程 h w ( x ) h_w(x) hw(x)
h
w
(
x
)
=
w
0
x
0
+
w
1
x
1
+
w
2
x
2
+
.
.
.
+
w
n
x
n
=
∑
i
=
0
n
w
i
x
i
=
w
T
x
\begin{align} h_w(x) &=w_0x_0 + w _1x_1 +w_2x_2 + ... +w_nx_n \\ &=\sum_{i=0}^{n}w_ix_i\\ &= w^Tx\\ \end{align}
hw(x)=w0x0+w1x1+w2x2+...+wnxn=i=0∑nwixi=wTx
注:
w
T
x
w^Tx
wTx 代表所有
w
n
w_n
wn组成的一维矩阵的转置,矩阵计算会更方便
以二元线性回归为例,所有样本数据为S,样本量为 m m m,预测函数为 h w ( x ) h_w(x) hw(x)
h
w
(
x
)
=
w
0
x
0
+
w
1
x
1
+
w
2
x
2
S
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
(
x
3
,
y
3
)
,
.
.
.
,
,
(
x
m
,
y
m
)
}
\begin{align} &h_w(x) =w_0x_0 + w _1x_1 +w_2x_2\\ &S=\{(x^1,y^1),(x^2,y^2),(x^3,y^3),...,,(x^m,y^m)\} \\ \end{align}
hw(x)=w0x0+w1x1+w2x2S={(x1,y1),(x2,y2),(x3,y3),...,,(xm,ym)}
则对于任意一点
(
x
i
,
y
i
)
(x^i,y^i)
(xi,yi)误差为:
ε
i
=
y
i
−
w
T
x
i
\varepsilon^i = y^i - w^Tx^i
εi=yi−wTxi
则衡量所有样本的误差程度的损失函数,使用均方误差
M
S
E
MSE
MSE表示:
J
(
w
0
,
w
1
)
=
1
n
∑
i
=
1
n
(
y
i
−
w
T
x
i
)
2
J(w_0, w_1) = \frac{1}{n} \sum_{i = 1}^{n}(y^i - w^Tx^i)^2
J(w0,w1)=n1i=1∑n(yi−wTxi)2
观察此方程
w
0
,
w
1
w_0,w_1
w0,w1是方程的变量,由于样本是确定的,则
n
n
n,
y
i
y^i
yi是常数,
x
i
x^i
xi是常数构成的矩阵
则方程
J
(
w
0
,
w
1
)
J(w_0, w_1)
J(w0,w1)就是一个普通的2元方程,自变量为
w
0
,
w
1
w_0,w_1
w0,w1
要使方程
h
w
(
x
)
h_w(x)
hw(x)拟合程度越高,即求得一个
w
w
w使的
M
S
E
MSE
MSE最小.
即求得函数
J
(
w
0
,
w
1
)
J(w_0, w_1)
J(w0,w1)最小值,以及取得最小值时的,自变量为
w
0
,
w
1
w_0,w_1
w0,w1取值.
简单表示一下,然后按照完全平方展开
J ( w ) = 1 n ∑ i = 1 n ( y i − x i w ) 2 = 1 n ∑ i = 1 n ( y i 2 − 2 x i y i w + x i 2 w 2 ) = 1 n ∑ i = 1 n x i 2 w 2 − 2 n ∑ i = 1 n x i y i w + 1 n ∑ i = 1 n y i 2 \begin{align} J(w) &= \frac{1}{n} \sum_{i = 1}^{n}(y_i - x_iw)^2\\ &= \frac{1}{n} \sum_{i = 1}^{n}(y_i^2 - 2x_iy_iw + x_i^2w^2)\\ &= \frac{1}{n} \sum_{i = 1}^{n}x_i^2w^2- \frac{2}{n} \sum_{i = 1}^{n}x_iy_iw + \frac{1}{n} \sum_{i = 1}^{n}y_i^2\\ \end{align} J(w)=n1i=1∑n(yi−xiw)2=n1i=1∑n(yi2−2xiyiw+xi2w2)=n1i=1∑nxi2w2−n2i=1∑nxiyiw+n1i=1∑nyi2
令
:
a
=
1
n
∑
i
=
1
n
x
i
2
,
b
=
−
2
n
∑
i
=
1
n
x
i
y
i
,
c
=
1
n
∑
i
=
1
n
y
i
2
则
:
J
(
w
)
=
a
w
2
+
b
w
+
c
(
a
,
b
,
c
为常数
)
\begin{align} 令:a=\frac{1}{n} \sum_{i = 1}^{n}x_i^2,b=- \frac{2}{n} \sum_{i = 1}^{n}x_iy_i,c=\frac{1}{n} \sum_{i = 1}^{n}y_i^2\\\\ 则: J(w) = aw^2+bw+c (a,b,c为常数) \end{align}
令:a=n1i=1∑nxi2,b=−n2i=1∑nxiyi,c=n1i=1∑nyi2则:J(w)=aw2+bw+c(a,b,c为常数)
求导得
J ( w ) ′ = 2 n ∑ i = 1 n x i 2 w − 2 n ∑ i = 1 n x i y i \begin{align} J(w)' &= \frac{2}{n} \sum_{i = 1}^{n}x_i^2w- \frac{2}{n} \sum_{i = 1}^{n}x_iy_i\\ \end{align} J(w)′=n2i=1∑nxi2w−n2i=1∑nxiyi
梯度下降法求解
1.先任取任意一组参数,就1.2和2.3把,则回归方程为:
h
(
x
)
=
1.2
x
1
+
2.3
x
2
h(x) = 1.2x_1 +2.3x_2
h(x)=1.2x1+2.3x2
J
(
w
0
,
w
1
)
=
1
n
∑
i
=
1
n
(
y
i
−
w
T
x
i
)
2
J(w_0, w_1) = \frac{1}{n} \sum_{i = 1}^{n}(y^i - w^Tx^i)^2
J(w0,w1)=n1i=1∑n(yi−wTxi)2
则当前loss值为:
J
(
w
0
,
w
1
)
=
1
n
∑
i
=
1
n
(
y
i
−
(
1.2
x
0
i
+
2.3
x
1
i
)
)
2
J(w_0, w_1) = \frac{1}{n} \sum_{i = 1}^{n}(y^i - (1.2x_0^i + 2.3x_1^i))^2
J(w0,w1)=n1i=1∑n(yi−(1.2x0i+2.3x1i))2
求得当前点的最快的下降方向即求当前点得斜率,即求对
w
0
,
w
1
w_0, w_1
w0,w1的偏导数
对
w
j
w_j
wj的偏导数为:
∂
J
(
w
)
∂
w
j
=
−
1
2
n
∑
i
=
1
n
(
y
i
−
h
w
(
x
i
)
)
x
j
i
\begin{align} \frac{\partial J(w)}{\partial w_j} &= -\frac{1}{2n} \sum_{i = 1}^{n}(y^i− h_w(x^i))x^i_j\\ \end{align}
∂wj∂J(w)=−2n1i=1∑n(yi−hw(xi))xji
损失函数取
1
2
n
\frac{1}{2n}
2n1和
1
n
\frac{1}{n}
n1并无区别,故取
1
2
n
\frac{1}{2n}
2n1 则偏导数为:
∂
J
(
w
)
∂
w
j
=
−
1
n
∑
i
=
1
n
(
y
i
−
h
w
(
x
i
)
)
x
j
i
w
j
′
=
w
j
+
1
n
∑
i
=
1
n
(
y
i
−
h
w
(
x
i
)
)
x
j
i
\begin{align} \frac{\partial J(w)}{\partial w_j} &= -\frac{1}{n} \sum_{i = 1}^{n}(y^i− h_w(x^i))x^i_j\\ w_j' &= w_j + \frac{1}{n} \sum_{i = 1}^{n}(y^i− h_w(x^i))x^i_j\\ \end{align}
∂wj∂J(w)wj′=−n1i=1∑n(yi−hw(xi))xji=wj+n1i=1∑n(yi−hw(xi))xji