逻辑回归的梯度下降公式
逻辑回归的梯度下降公式:
θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} θj:=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
其中:
h
θ
(
x
(
i
)
)
=
g
(
θ
T
x
(
i
)
)
=
1
1
+
e
−
θ
T
x
(
i
)
h_{\theta}(x^{(i)})=g\left(\theta^T x^{(i)}\right)=\frac{1}{1+e^{-\theta^{T} x^{(i)}}}
hθ(x(i))=g(θTx(i))=1+e−θTx(i)1
向量化后的公式为:
θ : = θ − α m X T ( g ( X θ ) − y ⃗ ) \theta:=\theta-\frac{\alpha}{m} X^{T}(g(X \theta)-\vec{y}) θ:=θ−mαXT(g(Xθ)−y)
其中:
y ⃗ = ( y ( 1 ) y ( 2 ) ⋮ y ( m ) ) θ = ( θ 0 θ 1 ⋮ θ n ) X = [ x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ x 0 ( m ) x 1 ( m ) ⋯ x n ( m ) ] m × ( n + 1 ) \vec{y}=\left(\begin{array}{c} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{array}\right)~~~~~~~\theta=\left(\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right)~~~~~~X=\left[\begin{array}{cccc} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & & &\vdots\\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{array}\right]_{m \times(n+1)} y=⎝⎜⎜⎜⎛y(1)y(2)⋮y(m)⎠⎟⎟⎟⎞ θ=⎝⎜⎜⎜⎛θ0θ1⋮θn⎠⎟⎟⎟⎞ X=⎣⎢⎢⎢⎢⎡x0(1)x0(2)⋮x0(m)x1(1)x1(2)x1(m)⋯⋯⋯xn(1)xn(2)⋮xn(m)⎦⎥⎥⎥⎥⎤m×(n+1)
X θ = [ θ 0 x 0 ( 1 ) + θ 1 x 1 ( 1 ) + θ 2 x 2 ( 1 ) + ⋯ + θ n x n ( 1 ) θ 0 x 0 ( 2 ) + θ 1 x 2 ( 2 ) + θ 2 x 2 ( 2 ) + ⋯ + θ n x n ( 2 ) ⋯ θ 0 x 0 ( m ) + θ 1 x 1 ( m ) + θ 2 x 2 ( m ) + ⋯ + θ n x n ( m ) ] g ( X θ ) = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋯ h θ ( x ( m ) ) ] X \theta=\left[\begin{array}{c} \theta_{0} x_{0}^{(1)}+\theta_{1} x_{1}^{(1)}+\theta_{2} x_{2}^{(1)}+\cdots+\theta_{n} x_{n}{ }^{(1)} \\ \theta_{0} x_{0}^{(2)}+\theta_{1} x_{2}^{(2)}+\theta_{2} x_{2}^{(2)}+\cdots+\theta_{n} x_{n}^{(2)} \\ \cdots \\ \theta_{0} x_{0}^{(m)}+\theta_{1} x_{1}^{(m)}+\theta_{2} x_{2}^{(m)}+\cdots+\theta_{n} x_{n}^{(m)} \end{array}\right]~~~~~~~~~~~~~~~~~~~ g(X \theta)=\left[\begin{array}{c} h_{\theta}\left(x^{(1)}\right) \\ h_{\theta}\left(x^{(2)}\right) \\ \cdots\\ h_\theta\left(x^{(m)}\right) \end{array}\right] Xθ=⎣⎢⎢⎢⎡θ0x0(1)+θ1x1(1)+θ2x2(1)+⋯+θnxn(1)θ0x0(2)+θ1x2(2)+θ2x2(2)+⋯+θnxn(2)⋯θ0x0(m)+θ1x1(m)+θ2x2(m)+⋯+θnxn(m)⎦⎥⎥⎥⎤ g(Xθ)=⎣⎢⎢⎡hθ(x(1))hθ(x(2))⋯hθ(x(m))⎦⎥⎥⎤
详细向量化过程
∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) = [ h θ ( x ( 1 ) ) − y ( 1 ) ] x j ( 1 ) + [ h θ ( x ( 2 ) ) − y ( 2 ) ] x j ( 2 ) + ⋯ + [ h θ ( x ( m ) ) − y ( m ) ] x j ( m ) = ( x j ( 1 ) , x j ( 2 ) , ⋯ , x j ( m ) ) ⋅ ( h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ) = ( x j ( 1 ) , x j ( 2 ) , ⋯ , x j ( m ) ) ⋅ [ ( h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ) − ( y ( 1 ) y ( 2 ) ⋮ y ( m ) ) ] = x j ⋅ [ g ( X θ ) − y ⃗ ] \begin{aligned} &\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \\\\ =&{\left[h_{\theta}\left(x^{(1)}\right)-y^{(1)}\right]x_{j}^{(1)}+\left[h_{\theta}\left(x^{(2)}\right)-y^{(2)}\right] x_{j}^{(2)}} +\cdots+\left[h_{\theta}\left(x^{(m)}\right)-y^{(m)}\right] x_{j}^{(m)} \\\\ = &\left(x_{j}^{(1)}, x_{j}^{(2)}, \cdots, x_{j}^{(m)}\right) \cdot\left(\begin{array}{c} h_{\theta}\left(x^{(1)}\right)-y^{(1)} \\ h_{\theta}\left(x^{(2)}\right)-y^{(2)} \\ \vdots \\ h_{\theta}\left(x^{(m)}\right)-y^{(m)} \end{array}\right) \\\\ =& \left(x_{j}^{(1)}, x_{j}^{(2)}, \cdots, x_{j}^{(m)}\right)\cdot\left[\left(\begin{array}{c} h_{\theta}\left(x^{(1)}\right) \\ h_{\theta}\left(x^{(2)}\right) \\ \vdots \\ h_{\theta}\left(x^{(m)}\right) \end{array}\right)-\left(\begin{array}{c} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{array}\right)\right] \\\\ =& x_{j} \cdot[g(X \theta)-\vec{y}] \end{aligned} ====i=1∑m(hθ(x(i))−y(i))xj(i)[hθ(x(1))−y(1)]xj(1)+[hθ(x(2))−y(2)]xj(2)+⋯+[hθ(x(m))−y(m)]xj(m)(xj(1),xj(2),⋯,xj(m))⋅⎝⎜⎜⎜⎛hθ(x(1))−y(1)hθ(x(2))−y(2)⋮hθ(x(m))−y(m)⎠⎟⎟⎟⎞(xj(1),xj(2),⋯,xj(m))⋅⎣⎢⎢⎢⎡⎝⎜⎜⎜⎛hθ(x(1))hθ(x(2))⋮hθ(x(m))⎠⎟⎟⎟⎞−⎝⎜⎜⎜⎛y(1)y(2)⋮y(m)⎠⎟⎟⎟⎞⎦⎥⎥⎥⎤xj⋅[g(Xθ)−y]
则:
θ
j
:
=
θ
j
−
α
m
x
j
[
g
(
X
θ
)
−
y
⃗
]
\theta_{j}:=\theta_{j}-\frac{\alpha}{m}x_{j}[g(X \theta)-\vec{y}]
θj:=θj−mαxj[g(Xθ)−y]
[ θ 0 θ 1 ⋮ θ n ] : = [ θ 0 θ 1 ⋮ θ n ] − α m [ x 0 x 1 ⋮ x n ] [ g ( X θ ) − y ⃗ ] \left[\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right]:=\left[\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right]-\frac{\alpha}{m}\left[\begin{array}{c} x_{0} \\ x_{1} \\ \vdots \\ x_{n} \end{array}\right]\left[g\left(X\theta\right)-\vec{y}\right] ⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤:=⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤−mα⎣⎢⎢⎢⎡x0x1⋮xn⎦⎥⎥⎥⎤[g(Xθ)−y]
最终得:
θ : = θ − α m X T ( g ( X θ ) − y ⃗ ) \theta:=\theta-\frac{\alpha}{m} X^{T}(g(X \theta)-\vec{y}) θ:=θ−mαXT(g(Xθ)−y)