Logistic回归推导(三)--牛顿法及纯python实现

本文介绍了牛顿法的概念及其在解决单特征和多特征问题时的迭代过程,详细阐述了牛顿法如何应用于Logistic回归的优化,通过实例展示了在Python中实现牛顿法的过程,并给出了训练过程中损失(loss)的变化情况。
摘要由CSDN通过智能技术生成

1、牛顿法图解

牛顿法一般用来求解方程的根或求解极值,其基本思想是:在现有极值点估计值附近对f(x)做二阶泰勒展开,从而找到极值点的下一个估计值
下面用一个例图说明:
在这里插入图片描述
如图横坐标为参数 θ \theta θ,纵坐标为函数一阶导数 f ′ ( θ ) f^{'}(\theta) f(θ),则牛顿法迭代过程如下:
(1)过点 ( θ 0 , f ′ ( θ 0 ) ) (\theta_{0},f^{'}(\theta_{0})) (θ0,f(θ0))作切线,切线方程为 y = f ′ ( θ 0 ) + f ′ ′ ( θ 0 ) ( θ − θ 0 ) y=f^{'}(\theta_{0})+f^{''}(\theta_{0})(\theta-\theta_{0}) y=f(θ0)+f(θ0)(θθ0);
(2)切线交横轴于 θ 1 = θ 0 − f ′ ( θ 0 ) f ′ ′ ( θ 0 ) \theta_{1}=\theta_{0}-\frac{f^{'}(\theta_{0})}{f^{''}(\theta_{0})} θ1=θ0f(θ0)f(θ0)处;
(3)过点 ( θ 1 , f ′ ( θ 1 ) ) (\theta_{1},f^{'}(\theta_{1})) (θ1,f(θ1))作切线,切线方程为 y = f ′ ( θ 1 ) + f ′ ′ ( θ 1 ) ( θ − θ 1 ) y=f^{'}(\theta_{1})+f^{''}(\theta_{1})(\theta-\theta_{1}) y=f(θ1)+f(θ1)(θθ1);
(4)切线交横轴于 θ 2 = θ 1 − f ′ ( θ 1 ) f ′ ′ ( θ 1 ) \theta_{2}=\theta_{1}-\frac{f^{'}(\theta_{1})}{f^{''}(\theta_{1})} θ2=θ1f(θ1)f(θ1)处;
(5)重复迭代得到 θ ∗ \theta^{*} θ

2、牛顿法原理

(1)当样本只有一个特征

假设 f ( x ) f(x) f(x)二次可微,设 x = x m i n x=x_{min} x=xmin时,函数f(x)取得最小值,我们的目标就是希望能求得 x m i n x_{min} xmin
首先用 x k x_{k} xk作为 x m i n x_{min} xmin的估计值,在 x = x k x=x_{k} x=xk处进行二阶泰勒展开:
f ( x ) ≈ f ( x k ) + f ′ ( x k ) ( x − x k ) + 1 2 f ′ ′ ( x k ) ( x − x k ) 2 f(x)\approx f(x_{k})+f^{'}(x_{k})(x-x_{k})+\frac{1}{2}f^{''}(x_{k})(x-x_{k})^{2} f(x)f(xk)+f(xk)(xxk)+21f(xk)(xxk)2
f ( x ) f(x) f(x)的极值即求导数为0处的 x x x,对 f ( x ) f(x) f(x)求导:
f ′ ( x ) = 0 + f ′ ( x k ) ( 1 − 0 ) + 1 2 f ′ ′ ( x k ) 2 ( x − x k ) = f ′ ( x k ) + f ′ ′ ( x k ) ( x − x k ) \begin{aligned} f^{'}(x)&=0+f^{'}(x_{k})(1-0)+\frac{1}{2}f^{''}(x_{k})2(x-x_{k})\\ &=f^{'}(x_{k})+f^{''}(x_{k})(x-x_{k}) \end{aligned} f(x)=0+f(xk)(10)+21f(xk)2(xxk)=f(xk)+f(xk)(xxk)
令其为0得到下一个估计值 x = x k − f ′ ( x k ) f ′ ′ ( x k ) x=x_{k}-\frac{f^{'}(x_{k})}{f^{''}(x_{k})} x=xkf(xk)f(xk)
故迭代公式为:
x k + 1 = x k − f ′ ( x k ) f ′ ′ ( x k ) x_{k+1}=x_{k}-\frac{f^{'}(x_{k})}{f^{''}(x_{k})} xk+1=xkf(xk)f(xk)

(1)当样本有多个特征

二阶泰勒展开式的推广为:
φ ( X ) ≈ f ( X k ) + ▽ f ( X k ) ⋅ ( X − X k ) + 1 2 ( X − X k ) T ⋅ ▽ 2 f ( X k ) ⋅ ( X − X k ) \varphi(X) \approx f(X_{k})+\bigtriangledown f(X_{k}) \cdot (X-X_{k})+\frac{1}{2}(X-X_{k})^{T}\cdot \bigtriangledown^{2}f(X_{k})\cdot(X-X_{k}) φ(X)f(Xk)+f(Xk)(XXk)+21(XXk)T2f(Xk)(XXk)
其中 ▽ f \bigtriangledown f f f f f的梯度向量
▽ f = [ ∂ f ∂ x 1 ∂ f ∂ x 2 . . . ∂ f ∂ x N ] T \bigtriangledown f=[\frac{\partial f}{\partial x_{1}}\quad \frac{\partial f}{\partial x_{2}}\quad ... \quad \frac{\partial f}{\partial x_{N}}]^{T} f=[x1fx2f...xNf]T
▽ 2 f \bigtriangledown^{2}f 2f f f f的海森矩阵。
▽ φ ( X ) = 0 \bigtriangledown \varphi(X)=0 φ(X)=0,即
▽ f ( X k ) + ▽ 2 f ( X k ) ⋅ ( X − X k ) = 0 \bigtriangledown f(X_{k})+\bigtriangledown^{2}f(X_{k})\cdot(X-X_{k})=0 f(Xk)+2f(Xk)(XXk)=0
得下一个极值的估计值为
X = X k − ▽ 2 f ( X k ) − 1 ▽ f ( X k ) X=X_{k}-\bigtriangledown^{2}f(X_{k})^{-1}\bigtriangledown f(X_{k}) X=Xk2f(Xk)1f(Xk)
▽ f = g k , ▽ 2 f = H k \bigtriangledown f=g_{k},\bigtriangledown^{2}f=H_{k} f=gk2f=Hk,则迭代公式为
X k + 1 = X k − H k − 1 g k X_{k+1}=X_{k}-H_{k}^{-1}g_{k} Xk+1=XkHk1gk

3、牛顿法求解Logistic回归

Logistic回归假设函数为:
P ( y ∣ x ; θ ) = [ h θ ( x ) ] y [ 1 − h θ ( x ) ] ( 1 − y ) = ( 1 1 + e − θ T x ) y ( 1 − 1 1 + e − θ T x ) 1 − y \begin{aligned} P(y|x;\theta)&=[h_{\theta}(x)]^{y}[1-h_{\theta}(x)]^{(1-y)}\\ &=(\frac{1}{1+e^{-\theta^{T}x}})^{y}(1-\frac{1}{1+e^{-\theta^{T}x}})^{1-y} \end{aligned} P(yx;θ)=[hθ(x)]y[1hθ(x)](1y)=(1+eθTx1)y(11+eθTx1)1y
其目标函数为:
arg  m a x θ 1 N ∑ i = 1 N [ y ( i ) l n h θ ( x ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − h θ ( x ( i ) ) ) ] \text{arg } \underset{\theta}{max}\frac{1}{N}\sum_{i=1}^{N}[y^{(i)}lnh_{\theta}(x^{(i)})+(1-y^{(i)})ln(1-h_{\theta}(x^{(i)}))] arg θmaxN1i=1N[y(i)lnhθ(x(i))+(1y(i))ln(1hθ(x(i)))]

▽ J ( θ ) = 1 N ∑ i = 1 N [ h θ ( x ( i ) ) − y ( i ) ] x ( i ) \bigtriangledown J(\theta)=\frac{1}{N}\sum_{i=1}^{N}[h_{\theta}(x^{(i)})-y^{(i)}]x^{(i)} J(θ)=N1i=1N[hθ(x(i))y(i)]x(i)
H = ▽ 2 J ( θ ) = ▽ 1 N ∑ i = 1 N [ h θ ( x ( i ) ) − y ( i ) ] x ( i ) = ▽ 1 N ∑ i = 1 N h θ ( x ( i ) ) ⋅ x ( i ) = 1 N ∑ i = 1 N ∂ h θ ( x ( i ) ) ∂ θ ⋅ ( x ( i ) ) T = 1 N ∑ i = 1 N h θ ( x ( i ) ) T ⋅ ( 1 − h θ ( x ( i ) ) ) ⋅ x ( i ) ⋅ ( x ( i ) ) T \begin{aligned} H=\bigtriangledown^{2}J(\theta)&=\bigtriangledown\frac{1}{N}\sum_{i=1}^{N}[h_{\theta}(x^{(i)})-y^{(i)}]x^{(i)}\\ &=\bigtriangledown\frac{1}{N}\sum_{i=1}^{N}h_{\theta}(x^{(i)})\cdot x^{(i)}\\ &=\frac{1}{N}\sum_{i=1}^{N}\frac{\partial h_{\theta}(x^{(i)})}{\partial \theta} \cdot (x^{(i)})^{T}\\ &=\frac{1}{N}\sum_{i=1}^{N}h_{\theta}(x^{(i)})^{T} \cdot (1-h_{\theta}(x^{(i)})) \cdot x^{(i)} \cdot (x^{(i)})^{T} \end{aligned} H=2J(θ)=N1i=1N[hθ(x(i))y(i)]x(i)=N1i=1Nhθ(x(i))x(i)=N1i=1Nθhθ(x(i))(x(i))T=N1i=1Nhθ(x(i))T(1hθ(x(i)))x(i)(x(i))T
使用牛顿法更新权重:
θ ( t + 1 ) : = θ ( t ) − H − 1 ▽ J ( θ ) \theta^{(t+1)}:=\theta^{(t)}-H^{-1}\bigtriangledown J(\theta) θ(t+1):=θ(t)H1J(θ)

4、纯python实现

代码如下:

import numpy as np
import matplotlib.pyplot as plt
import time


# 加载数据
def load_data():
    X_train = np.loadtxt("./Exam/train/x.txt")
    Y_train = np.loadtxt("./Exam/train/y.txt", dtype=int)
    X_test = np.loadtxt("./Exam/test/x.txt")
    Y_test = np.loadtxt("./Exam/test/y.txt", dtype=int)
    return X_train, Y_train, X_test, Y_test


# Logistic回归类
class Logistic(object):
    def __init__(self, X_train, Y_train):
        self.X_train = X_train
        self.Y_train = Y_train
        # M:特征数,N:样本数
        self.M = X_train.shape[1]
        self.N = X_train.shape[0]
        self.train()

    def normalization(self):
        # 均值方差归一化
        mean = np.mean(self.X_train)
        variance = np.std(self.X_train)
        self.X_train = (self.X_train - mean)/variance
        self.X_train = np.insert(self.X_train, 0, values=1.0, axis=1)
        self.Y_train = self.Y_train.reshape(self.N, 1)
        self.M += 1
        
    def sigmoid(self, X):
        eta = -np.dot(X, self.theta)  # N*1
        H = np.exp(eta)
        H = 1.0 / (1.0 + H)
        return H

    def Newton_method(self):
        self.theta = -np.ones((self.M, 1))
        for i in range(100):
            self.H = self.sigmoid(self.X_train)
            self.J = np.dot(self.X_train.T, (self.H - self.Y_train))  #M*1
            self.Hession = np.dot(self.H.T, self.X_train).dot(self.X_train.T).dot((1.0 - self.H)) / self.N
            self.theta -= np.dot(self.J, np.linalg.inv(self.Hession))
            self.loss = -np.sum(self.Y_train * np.log(self.H) + (1.0 - self.Y_train)* np.log(1 - self.H))/self.N
            print("iter: %d, loss: %f" % (i, self.loss))
        print(self.theta)

    def train(self):
        self.normalization()
        self.Newton_method()


if __name__ == "__main__":
    X_train, Y_train, X_test, Y_test = load_data()
    Logistic(X_train, Y_train)    

训练过程中loss变化以及分类效果如下:
在这里插入图片描述

(自己学习机器学习的笔记,如有错误望提醒修正)

  • 6
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值