逻辑回归

理论

逻辑函数 (1) g ( z ) = 1 1 + e − z g(z)=\frac 1 {1+e^{-z}}\tag{1} g(z)=1+ez1(1)

逻辑函数可视化

import matplotlib.pyplot as plt
import numpy as np
import math
e = math.e
x = np.linspace(-10,10,1e6)
y = 1 / (1 + np.exp(-x))

plt.plot(x, y)
plt.show()


假设函数 (2) h θ ( x ) = g ( θ T X ) = 1 1 + e − θ T X h_{\theta}(x)=g(\theta^TX)=\frac 1 {1+e^{-\theta^TX}}\tag{2} hθ(x)=g(θTX)=1+eθTX1(2)

代价函数 (3) J ( θ ) = 1 m ∑ i = 1 m C o s t ( h θ ( x ( i ) ) , y ( i ) ) J(\theta)=\frac 1 m \sum_{i=1}^mCost(h_{\theta}(x^{(i)}),y^{(i)}) \tag{3} J(θ)=m1i=1mCost(hθ(x(i)),y(i))(3)
其中, (4) C o s t ( h θ ( x ( i ) ) , y ( i ) ) = { − l o g ( h θ ( x ) ) , if  y = 1 − l o g ( 1 − h θ ( x ) ) , if  y = 0 Cost(h_{\theta}(x^{(i)}),y^{(i)}) = \begin{cases} -log(h_{\theta}(x)), & \text {if $y=1$} \\ -log(1-h_{\theta}(x)), & \text{if $y=0$} \end{cases}\tag{4} Cost(hθ(x(i)),y(i))={log(hθ(x)),log(1hθ(x)),if y=1if y=0(4)

(5) J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] J\left( \theta \right)=-\frac{1}{m}\sum\limits_{i=1}^{m}{[{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)]}\tag{5} J(θ)=m1i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))](5)

其函数图像为:

从图中可以看出:

  • y = 1 y=1 y=1,当预测值 h θ ( x ) = 1 h_{\theta}(x)=1 hθ(x)=1时,代价函数 C o s t Cost Cost的值为0,这是我们想要的(模型预测完全正确时,代价达到最小)。当预测值离1越远,其代价函数越大,这也是我们想要的
  • 同理 y = 0 y=0 y=0,当预测值=0时,代价函数达到最小值;预测值离0越远,其代价函数越大。

代价函数推导过程(采用极大似然估计)

假设函数 h θ ( x ) h_{\theta}(x) hθ(x)表示预测结果为1的概率,则:
(6) P ( y = 1 ∣ x ; θ ) = h θ ( x ) P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) P(y=1|x;\theta)=h_{\theta}(x)\\ P(y=0|x;\theta)=1-h_{\theta}(x)\tag{6} P(y=1x;θ)=hθ(x)P(y=0x;θ)=1hθ(x)(6)

将公式6合并为一个公式:
(7) P ( y ∣ x ; θ ) = h θ ( x ) y ∗ ( 1 − h θ ( x ) ) 1 − y P(y|x;\theta)=h_{\theta}(x)^y*(1-h_{\theta}(x))^{1-y}\tag{7} P(yx;θ)=hθ(x)y(1hθ(x))1y(7)

取似然函数:
(8) L ( θ ) = ∏ i = 1 m P ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) L(\theta)=\prod_{i=1}^mP(y^{(i)}|x^{(i)};\theta)=\prod_{i=1}^m(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}}\tag{8} L(θ)=i=1mP(y(i)x(i);θ)=i=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)(8)

对数似然函数:
(9) l θ = l o g ( L ( θ ) ) = ∑ i = 1 m [ y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] l_{\theta}=log(L(\theta))=\sum\limits_{i=1}^{m}{[{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)]}\tag{9} lθ=log(L(θ))=i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))](9)

最大似然估计是取使得似然函数最大化的 θ \theta θ,令损失函数 J ( θ ) = − 1 m l ( θ ) J(\theta)=-\frac 1 m l(\theta) J(θ)=m1l(θ),则最大化的 l ( θ ) l(\theta) l(θ)即为最小化的 J ( θ ) J(\theta) J(θ)

梯度下降法迭代公式
(10) θ j = θ j − α ( ∂ ∂ θ j ) J ( θ ) = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j=\theta_j-\alpha(\frac \partial {\partial \theta_j})J(\theta)=\theta_j-\alpha \frac 1 m \sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}\tag{10} θj=θjα(θj)J(θ)=θjαm1i=1m(hθ(x(i))y(i))xj(i)(10)

矩阵形式
(11) θ = θ − α 1 m x T ( g ( x θ ) − y ) \theta=\theta-\alpha \frac 1 m x^T(g(x\theta)-y)\tag{11} θ=θαm1xT(g(xθ)y)(11)

推导如下

带惩罚项的逻辑回归

(12) J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ⁡ ( h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}\tag{12} J(θ)=m1i=1m[y(i)log(hθ(x(i)))(1y(i))log(1hθ(x(i)))]+2mλj=1nθj2(12)

重复直至收敛:

θ 0 : = θ 0 − a 1 m ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) ) {\theta_0}:={\theta_0}-a\frac{1}{m}\sum\limits_{i=1}^{m}{(({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{0}^{(i)}}) θ0:=θ0am1i=1m((hθ(x(i))y(i))x0(i))

θ j : = θ j − a [ 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m θ j ] {\theta_j}:={\theta_j}-a[\frac{1}{m}\sum\limits_{i=1}^{m}{({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{j}^{\left( i \right)}}+\frac{\lambda }{m}{\theta_j}] θj:=θja[m1i=1m(hθ(x(i))y(i))xj(i)+mλθj]

j = 1 , 2 , . . . n j=1,2,...n j=1,2,...n

Python实现

import numpy as np

X = np.array([[1, 2], [3, 2], [1, 3], [2, 3], [3, 3], [3, 4], [10, 11], [9, 10], [12, 13], [14, 14], [13, 12]])
y = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
n_samples, n_features = X.shape

X = np.concatenate((np.ones(n_samples).reshape((n_samples, 1)), X), axis=1)
y = y.reshape((n_samples, 1))

max_iter = 1e4  # 最大迭代次数
epsilon = 1e-4  # θ迭代前后变化最大误差不能超过epsilon
theta = np.zeros((n_features + 1, 1))  # 初始化theta
alpha = 0.0001

for iter in range(int(max_iter)):
    theta_next = theta - alpha * (X.T) @ (1 / (1 + np.exp(-X@theta)) - y) / n_samples
    print(theta_next)
    if np.abs(theta - theta_next).sum() < epsilon:
        theta = theta_next
        print("merge")
        break
    theta = theta_next
else:
    print("get the max_iter, stop iter.")
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值