回归系数更新公式 w e i g t h s = w e i g h t s + a l p h a × d a t a M a t r i x . t r a n s p o s e ( ) × e r r o r weigths=weights+alpha \times dataMatrix.transpose() \times error weigths=weights+alpha×dataMatrix.transpose()×error 公式原理
损失函数定义
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
y
i
^
−
y
i
)
2
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
2
J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(\widehat{y_i}-y_i)^{2}=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i)^{2}
J(θ)=2m1i=1∑m(yi
−yi)2=2m1i=1∑m(hθ(xi)−yi)2
即为预测函数值(
y
^
\widehat{y}
y
)与真实值(
y
y
y)差的平方和,其中
θ
\theta
θ为此时预测函数所使用的回归系数(向量)
梯度上升(下降)法回归系数更新公式
书中所提到梯度上升算法迭代公式为:
w
=
w
±
α
▽
w
f
(
w
)
w=w±\alpha\triangledown_wf(w)
w=w±α▽wf(w)
即:
θ
j
=
θ
j
±
α
∂
∂
θ
j
J
(
θ
)
\theta_j=\theta_j±\alpha\frac{\partial}{\partial\theta_j}J(\theta)
θj=θj±α∂θj∂J(θ)
对
∂
∂
θ
j
J
(
θ
)
\frac{\partial}{\partial\theta_j}J(\theta)
∂θj∂J(θ)推导:
∂
∂
θ
j
J
(
θ
)
=
∂
∂
θ
j
1
2
(
h
θ
(
x
)
−
y
)
2
=
1
2
×
2
×
(
h
θ
(
x
)
−
y
)
×
∂
∂
θ
j
(
h
θ
(
x
)
−
y
)
=
(
h
θ
(
x
)
−
y
)
×
∂
∂
θ
j
(
(
θ
0
x
0
+
θ
1
x
1
+
⋯
+
θ
n
x
n
)
−
y
)
=
(
h
θ
(
x
)
−
y
)
×
x
j
\begin{aligned} \frac{\partial}{\partial\theta_j}J(\theta)&=\frac{\partial}{\partial\theta_j}\frac{1}{2}(h_\theta(x)-y)^{2}\\ &=\frac{1}{2}\times2\times(h_\theta(x)-y) \times \frac {\partial}{\partial\theta_j}(h_\theta(x)-y)\\ &=(h_\theta(x)-y)\times \frac{\partial}{\partial \theta_j}((\theta_0x_0+\theta_1x_1+\cdots+\theta_nx_n)-y)\\ &=(h_\theta(x)-y) \times x_j \end{aligned}
∂θj∂J(θ)=∂θj∂21(hθ(x)−y)2=21×2×(hθ(x)−y)×∂θj∂(hθ(x)−y)=(hθ(x)−y)×∂θj∂((θ0x0+θ1x1+⋯+θnxn)−y)=(hθ(x)−y)×xj
则回归系数
θ
j
\theta_j
θj:
θ
j
=
θ
j
±
α
×
(
h
θ
(
x
)
−
y
)
x
j
\theta_j = \theta_j ±\alpha \times (h_\theta (x) - y)x_j
θj=θj±α×(hθ(x)−y)xj
回归向量
θ
\theta
θ:
θ
=
θ
±
α
×
(
h
θ
(
x
)
−
y
)
x
\theta = \theta ±\alpha \times (h_\theta (x) - y)x
θ=θ±α×(hθ(x)−y)x
θ
\theta
θ(回归系数(向量))即为书中所提到的
w
e
i
g
h
t
s
weights
weights,
α
\alpha
α为步长,
x
x
x为输入数据
d
a
t
a
M
a
t
r
i
x
dataMatrix
dataMatrix
在此以
θ
=
[
θ
0
θ
1
]
\theta=\begin{bmatrix}\theta_0 \\ \theta_1 \end{bmatrix}
θ=[θ0θ1]为例,则可得更新量为:
[
θ
0
θ
1
]
→
[
θ
0
±
α
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
θ
1
±
α
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
]
\begin{bmatrix}\theta_0 \\ \theta_1 \end{bmatrix}\to\begin{bmatrix}\theta_0±\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i) \\ \theta_1±\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x_i)-y_i) \end{bmatrix}
[θ0θ1]→[θ0±αm1∑i=1m(hθ(xi)−yi)θ1±αm1∑i=1m(hθ(xi)−yi)]