线性回归算法
寻找一条直线,最大程度的”拟合“样本特征和样本标签的关系。
公式
h θ ( x ) = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n h_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n hθ(x)=θ0x0+θ1x1+θ2x2+...+θnxn
上式为基本形式。
h
θ
(
x
)
=
θ
T
x
h_\theta(x) = \theta^Tx
hθ(x)=θTx
上式为向量形式。
损失函数
J
(
θ
)
=
1
2
m
∑
i
=
0
m
(
h
θ
(
x
i
)
−
y
i
)
2
J(\theta) = \frac{1}{2m} \sum_{i=0} ^ m (h_{\theta} (x ^ {i}) - y ^ {i}) ^ 2
J(θ)=2m1i=0∑m(hθ(xi)−yi)2
其中:
y
(
i
)
=
θ
0
x
0
i
+
θ
1
x
1
i
+
θ
2
x
2
i
+
.
.
.
+
θ
n
x
n
i
y^{(i)} = \theta_0x^{i}_0 + \theta_1x^{i}_1 + \theta_2x^{i}_2 + ... + \theta_nx^{i}_n
y(i)=θ0x0i+θ1x1i+θ2x2i+...+θnxni
对损失函数求导
对
θ
\theta
θ求偏导
∂
J
(
θ
)
∂
θ
=
2
∑
i
=
1
m
(
y
i
−
θ
x
i
−
θ
0
)
∗
(
−
x
i
)
\frac{\partial J (\theta)}{\partial \theta} = 2\sum_{i=1} ^ m (y^{i} - \theta x^{i} - \theta_0) * (-x^{i})
∂θ∂J(θ)=2i=1∑m(yi−θxi−θ0)∗(−xi)
对
θ
0
\theta_0
θ0求偏导
∂
J
(
θ
)
∂
θ
0
=
2
∑
i
=
1
m
(
y
i
−
θ
x
i
−
θ
0
)
∗
(
−
1
)
\frac{\partial J (\theta)}{\partial \theta_0} = 2\sum_{i=1} ^ m (y^{i} - \theta x^{i} - \theta_0) * (-1)
∂θ0∂J(θ)=2i=1∑m(yi−θxi−θ0)∗(−1)
梯度下降公式
θ
j
:
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
)
\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J (\theta)
θj:=θj−α∂θj∂J(θ)
∂ ∂ θ j J ( θ ) = α ∂ ∂ θ j 1 2 m ∑ i = 0 m ( h θ ( x i ) − y i ) 2 \frac{\partial }{\partial \theta_j} J (\theta) = \alpha \frac{\partial }{\partial \theta_j}\frac{1}{2m} \sum_{i=0} ^ m {(h_\theta(x^{i}) - y^{i})}^2 ∂θj∂J(θ)=α∂θj∂2m1i=0∑m(hθ(xi)−yi)2
= α 1 m ∑ i = 0 m ( h θ ( x i ) − y i ) ∗ x i = \alpha \frac{1}{m} \sum_{i=0} ^ m {(h_\theta(x^{i}) - y^{i})} * x^{i} =αm1i=0∑m(hθ(xi)−yi)∗xi
- 式中 α \alpha α表示学习率
梯度下降公式推导
θ
j
:
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
)
\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J (\theta)
θj:=θj−α∂θj∂J(θ)
∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 m ∑ i = 0 m ( h θ ( x i ) − y i ) 2 \frac{\partial }{\partial \theta_j} J (\theta) = \frac{\partial }{\partial \theta_j}\frac{1}{2m} \sum_{i=0} ^ m {(h_\theta(x^{i}) - y^{i})}^2 ∂θj∂J(θ)=∂θj∂2m1i=0∑m(hθ(xi)−yi)2
= 1 2 m ∑ i = 0 m ( ∂ ∂ θ j ( h θ ( x i ) − y i ) 2 ) = \frac{1}{2m} \sum_{i=0} ^ m (\frac{\partial }{\partial \theta_j} (h_\theta(x^{i}) - y^{i})^2) =2m1i=0∑m(∂θj∂(hθ(xi)−yi)2)
- 连式法则:
z = f ( y ) z = f(y) z=f(y)
y = g ( x ) y = g(x) y=g(x)
z = f ( g ( x ) ) z = f(g(x)) z=f(g(x))
- 对 $ z $求导
( z ) ′ = ( f ( g ( x ) ) ) ′ ∗ ( g ( x ) ) ′ (z)' = (f(g(x)))' * (g(x))' (z)′=(f(g(x)))′∗(g(x))′
= 1 2 m ∑ i = 0 m ( ∂ ∂ θ j ( h θ ( x i ) − y i ) 2 ) ∗ ( ∂ ∂ θ j ( h θ ( x i ) − y i ) ) = \frac{1}{2m} \sum_{i=0} ^ m (\frac{\partial }{\partial \theta_j} (h_\theta(x^{i}) - y^{i})^2)* (\frac{\partial }{\partial \theta_j}(h_\theta(x^{i}) - y^{i})) =2m1i=0∑m(∂θj∂(hθ(xi)−yi)2)∗(∂θj∂(hθ(xi)−yi))
- 求幂导
= 1 2 m ∑ i = 0 m 2 ∗ ( ( h θ ( x i ) − y i ) ) ∗ ( ∂ ∂ θ i ( ∑ i = 0 n θ i x i − y i ) ) = \frac{1}{2m} \sum_{i=0} ^ m 2 * ((h_\theta(x^{i}) - y^{i})) * (\frac{\partial }{\partial \theta_i}(\sum_{i=0} ^ n \theta_i x_i - y^{i})) =2m1i=0∑m2∗((hθ(xi)−yi))∗(∂θi∂(i=0∑nθixi−yi))
= 1 m ∑ i = 0 m ( ( h θ ( x i ) − y i ) ) ∗ ( ∑ i = 0 n ( ∂ ∂ θ i θ i x i − ∂ ∂ θ i y i ) = \frac{1}{m} \sum_{i=0} ^ m ((h_\theta(x^{i}) - y^{i})) * (\sum_{i=0} ^ n (\frac{\partial }{\partial \theta_i} \theta_i x^{i} - \frac{\partial }{\partial \theta_i}y_i) =m1i=0∑m((hθ(xi)−yi))∗(i=0∑n(∂θi∂θixi−∂θi∂yi)
= 1 m ∑ i = 0 m ( h θ ( x i ) − y i ) ∗ x i = \frac{1}{m} \sum_{i=0} ^ m {(h_\theta(x^{i}) - y^{i})} * x^{i} =m1i=0∑m(hθ(xi)−yi)∗xi
算法原理
- 将需要预测的数据 ”喂给“模型,计算得到预测值 y i y^{i} yi 。
- 预测值和真实值相减,使得两者之间的误差尽可能的小。
- 定义损失函数。对损失函数求偏导,求极值 。
- 梯度下降,迭代去更新参数 θ j \theta_j θj 。
- 直到 θ \theta θ值达到我们预设的阈值,那么迭代停止 。
欢迎大家交流学习,任何问题都可以留言