线性回归和逻辑回归(LR)

线性回归(Linear Regression)

给定数据集 { ( x 1 , y 1 ) , ⋯   , ( x N , y N ) } \{(\boldsymbol x_1,y_1),\cdots,(\boldsymbol x_N,y_N)\} {(x1,y1),,(xN,yN)} x ∈ R n \boldsymbol x\in\R^n xRn,求 w = ( w 1 , w 2 , ⋯   , w n ) \boldsymbol w=(w_1, w_2, \cdots, w_n) w=(w1,w2,,wn)使得以下回归模型成立:
y ≈ f ( x ) = w T x + b y \approx f(\boldsymbol x)=\boldsymbol w^T \boldsymbol x+b yf(x)=wTx+b
y y y真实值, f ( x ) f(\boldsymbol x) f(x)预测值.


模型的矩阵形式
X = [ x 11 ⋯ x 1 n 1 x 21 ⋯ x 2 n 1 ⋮ ⋱ ⋮ ⋮ x N 1 ⋯ x N n 1 ] , w = [ w 1 w 2 ⋮ b ] X= \left[\begin{array}{ccc|c} x_{11} &\cdots &x_{1n} &1\\ x_{21} &\cdots &x_{2n} &1\\ \vdots &\ddots &\vdots &\vdots\\ x_{N1} &\cdots &x_{Nn} &1 \end{array}\right],\quad \boldsymbol w= \left[\begin{array}{c} w_1 \\ w_2 \\ \vdots \\ b \end{array}\right] X=x11x21xN1x1nx2nxNn111,w=w1w2b

基于最小二乘思想(方差最小)求解:
w ∗ = arg ⁡ min ⁡ ( y − X w ) T ( y − X w ) \boldsymbol w^*=\arg \min(\boldsymbol y-X\boldsymbol w)^T(\boldsymbol y-X\boldsymbol w) w=argmin(yXw)T(yXw)

E = ( y − X w ) T ( y − X w ) E=(\boldsymbol y-X\boldsymbol w)^T(\boldsymbol y-X\boldsymbol w) E=(yXw)T(yXw)
∂ E ∂ w = 2 X T ( X w − y ) \frac{\partial E}{\partial\boldsymbol w}=2X^T(X\boldsymbol w-\boldsymbol y) wE=2XT(Xwy)

极值点偏导为 0 \boldsymbol 0 0
w ∗ = ( X T X ) − 1 X T y \boldsymbol w^*=(X^T X)^{-1}X^T\boldsymbol y w=(XTX)1XTy

X T X X^T X XTX非满秩矩阵(列数大于行数,即特征数大于样例数),需引入正则项 (向量求导).


对于任意类型的映射 x → y \boldsymbol x\to y xy,可表示为:
y = g − 1 ( w T x + b ) y=g^{-1}(\boldsymbol w^T\boldsymbol x+b) y=g1(wTx+b)

式中 g ( ⋅ ) g(\cdot) g()单调可微,上式等价于 g ( y ) = w T x + b g(y)=\boldsymbol w^T\boldsymbol x+b g(y)=wTx+b.


逻辑回归(Logistic Regression)

二分类任务回归模型输出 y ∈ { 0 , 1 } y\in\{0,1\} y{0,1},而线性回归模型输出为连续值,应选取合适的映射函数将模型输出映射0/1值.

z = w T x + b z=\boldsymbol w^T \boldsymbol x+b z=wTx+b, 则理想情况 g − 1 ( z ) g^{-1}(z) g1(z)应具备单位阶跃性质:
g − 1 ( z ) = { 0 , z < 0 0.5 , z = 0 1 , z > 0 g^{-1}(z)= \begin{cases} 0, & \text {$z<0$} \\ 0.5, & \text{$z=0$} \\ 1, & \text{$z>0$}\end{cases} g1(z)=0,0.5,1,z<0z=0z>0

对于二分类任务, 令 y y y表示样例 x \boldsymbol x x预测为正例概率, 则将模型定义为
ln ⁡ y 1 − y = w T x + b    ⟹    y = 1 1 + e − ( w T x + b ) \ln\frac{y}{1-y}=\boldsymbol w^T\boldsymbol x + b \implies y=\dfrac{1}{1+e^{-(\boldsymbol w^T\boldsymbol x+b)}} ln1yy=wTx+by=1+e(wTx+b)1

y y y称为对数几率 s i g m o d \sf sigmod sigmod函数, 输出等价于类后验概率估计 p ( y = 1 ∣ x ) p(y=1|\boldsymbol x) p(y=1x).


对数几率函数可近似替代单位阶跃函数, 且满足单调可微, 如下:

图1. 单位阶跃函数与对数几率函数

假设模仿线性回归求解模型参数 w , b ) \boldsymbol w, b) w,b)的方法,即将误差平方和作为代价函数
J ( w , b ) = ∑ i = 1 N ( f i − y i ) 2 J(\boldsymbol w,b)=\sum_{i=1}^N (f_i - y_i)^2 J(w,b)=i=1N(fiyi)2

函数为非凸函数,存在较多个局部极小值,不利于求解.


s i g m o d \sf sigmod sigmod函数输出为类1后验概率估计,为了区分模型预测值 y ^ \hat y y^与样本真实值 y y y,取
ϕ ( z ) = ϕ ( w T x + b ) = p ( y = 1 ∣ x ; w , b ) = 1 1 + e − ( w T x + b ) \phi(z)=\phi(\boldsymbol w^T \boldsymbol x + b)=p(y=1|\boldsymbol x;\boldsymbol w, b)=\frac{1}{1+e^{-(\boldsymbol w^T \boldsymbol x + b)}} ϕ(z)=ϕ(wTx+b)=p(y=1x;w,b)=1+e(wTx+b)1
由于 y ∈ { 0 , 1 } y\in\{0,1\} y{0,1}服从0-1分布,令 z = w T x + b z=\boldsymbol w^T \boldsymbol x + b z=wTx+b, 则概率函数为
p ( y ∣ x ; w , b ) = ϕ ( z ) y ( 1 − ϕ ( z ) ) 1 − y p(y|\boldsymbol x;\boldsymbol w, b)=\phi(z)^{y}(1-\phi(z))^{1-y} p(yx;w,b)=ϕ(z)y(1ϕ(z))1y

通过极大似然估计求解模型参数
L ( w , b ) = ∏ i = 1 N p ( y i ∣ x i ; w , b ) = ∏ i = 1 N ϕ ( z i ) y i ( 1 − ϕ ( z i ) ) 1 − y i L(\boldsymbol w,b)=\prod_{i=1}^N p(y_i|\boldsymbol x_i;\boldsymbol w,b)=\prod_{i=1}^N\phi(z_i)^{y_i}(1-\phi(z_i))^{1-y_i} L(w,b)=i=1Np(yixi;w,b)=i=1Nϕ(zi)yi(1ϕ(zi))1yi

取对数得
ln ⁡ L ( w , b ) = ∑ i = 1 N [ y i ln ⁡ ϕ ( z i ) + ( 1 − y i ) ln ⁡ ( 1 − ϕ ( z i ) ) ] \ln L(\boldsymbol w,b) =\sum_{i=1}^N {\large[}y_i\ln \phi(z_i) + (1-y_i) \ln(1-\phi(z_i)){\large]} lnL(w,b)=i=1N[yilnϕ(zi)+(1yi)ln(1ϕ(zi))]

极大化对数似然函数 ln ⁡ L \ln L lnL,等价于极小化 − ln ⁡ L -\ln L lnL
J ( w , b ) = − ln ⁡ L ( w , b ) = − ∑ i = 1 N [ y i ln ⁡ ϕ ( z i ) + ( 1 − y i ) ln ⁡ ( 1 − ϕ ( z i ) ) ] J(\boldsymbol w, b)=-\ln L (\boldsymbol w,b)=-\sum_{i=1}^N {\large[}y_i\ln \phi(z_i) + (1-y_i) \ln(1-\phi(z_i)){\large]} J(w,b)=lnL(w,b)=i=1N[yilnϕ(zi)+(1yi)ln(1ϕ(zi))]

w ^ = ( w , b ) \boldsymbol {\hat w}=(\boldsymbol w, b) w^=(w,b), x ^ = ( x , 1 ) \boldsymbol {\hat x}=(\boldsymbol x, 1) x^=(x,1), 则 w ^ T x ^ = w T x + b \boldsymbol{\hat w}^T\boldsymbol{\hat x}=\boldsymbol w^T\boldsymbol x + b w^Tx^=wTx+b. 由 ϕ ( z ) = 1 / ( 1 + e − z ) \phi(z)=1/(1+e^{-z}) ϕ(z)=1/(1+ez), 可推得 ϕ ′ ( z ) = ϕ ( z ) ( 1 − ϕ ( z ) ) \phi'(z)=\phi(z)(1-\phi(z)) ϕ(z)=ϕ(z)(1ϕ(z)), 故
∂ J ∂ w ^ = − ∑ i = 1 N [ y i 1 ϕ ( z i ) − ( 1 − y i ) 1 1 − ϕ ( z i ) ] ∂ ϕ ( z i ) ∂ w ^ = − ∑ i = 1 N [ y i 1 ϕ ( z i ) − ( 1 − y i ) 1 1 − ϕ ( z i ) ] ϕ ( z i ) ( 1 − ϕ ( z i ) ) ∂ z i ∂ w ^ = − ∑ i = 1 N ( y i − ϕ ( z i ) ) x ^ i \begin{aligned} \frac{\partial J}{\partial \boldsymbol {\hat w}} & =-\sum_{i=1}^N{\big[}y_i \frac{1}{\phi(z_i)} - (1-y_i) \frac{1}{1-\phi(z_i)}{\big]}\frac{\partial \phi(z_i)}{\partial \boldsymbol{\hat w}} \\ & =-\sum_{i=1}^N{\big[}y_i \frac{1}{\phi(z_i)} - (1-y_i) \frac{1}{1-\phi(z_i)}{\big]}\phi(z_i)(1-\phi(z_i)) \frac{\partial z_i}{\partial \boldsymbol{\hat w}} \\ & = -\sum_{i=1}^N(y_i-\phi(z_i))\boldsymbol {\hat x_i} \end{aligned} w^J=i=1N[yiϕ(zi)1(1yi)1ϕ(zi)1]w^ϕ(zi)=i=1N[yiϕ(zi)1(1yi)1ϕ(zi)1]ϕ(zi)(1ϕ(zi))w^zi=i=1N(yiϕ(zi))x^i


理解代价函数

由于 y ∈ { 0 , 1 } y \in\{0, 1\} y{0,1},故代价函数的等价形式为
J ( w , b ) = { − ln ⁡ ϕ ( z ) , y=1 − ln ⁡ ( 1 − ϕ ( z ) ) , y=0 J(\boldsymbol w,b)= \begin{cases} -\ln \phi(z), & \text{y=1} \\ -\ln(1-\phi(z)), & \text{y=0} \end{cases} J(w,b)={lnϕ(z),ln(1ϕ(z)),y=1y=0
若样本真实值为1,则估计值 ϕ ( z ) \phi(z) ϕ(z)越接近于1,则代价越小;
若样本真实值为0,则估计值 ϕ ( z ) \phi(z) ϕ(z)越接近于0,则代价越小;


损失函数与交叉熵
对于真实分布 P ( X ) P(X) P(X)和预测分布 Q ( X ) Q(X) Q(X),其交叉熵定义为
H ( P , Q ) = − ∑ i P ( x i ) ln ⁡ ( Q ( x i ) ) H(P,Q) = -\sum_i P(\boldsymbol x_i)\ln (Q(\boldsymbol x_i)) H(P,Q)=iP(xi)ln(Q(xi))

当前仅当 Q ( X ) Q(X) Q(X) P ( X ) P(X) P(X)同分布时, 交叉熵取得极小值. 由于 y ∈ { 0 , 1 } y \in\{0, 1\} y{0,1}, ϕ ( z ) ∈ [ 0 , 1 ] \phi(z) \in [0, 1] ϕ(z)[0,1], 可知逻辑回归的极大似然参数估计得损失函数与交叉熵等价


模型参数更新
使用最速梯度下降法求解模型参数,迭代更新公式为
w ^ i + 1 = w ^ i + Δ w ^ i , Δ w ^ i = − η ∂ J ( w ^ ) ∂ w ^ i \boldsymbol{\hat w_{i+1}}=\boldsymbol{\hat w_i} + \Delta \boldsymbol{\hat w_i},\quad\Delta\boldsymbol{\hat w_i}=-\eta\frac{\partial J(\boldsymbol{\hat w})}{\partial \boldsymbol{\hat w_i}} w^i+1=w^i+Δw^i,Δw^i=ηw^iJ(w^)

式中 η \eta η为学习速率,因此
w ^ i + 1 = w ^ i + η ∑ i = 1 n ( y i − ϕ ( z i ) ) x ^ i \boldsymbol{\hat w_{i+1}}=\boldsymbol{\hat w_{i}}+\eta\sum_{i=1}^n(y_i-\phi(z_i))\boldsymbol{\hat x_{i}} w^i+1=w^i+ηi=1n(yiϕ(zi))x^i

转为矩阵形式
w ^ i + 1 = w ^ i + η ∑ i = 1 n e i ⋅ x ^ i = w ^ i + η X ^ T E \boldsymbol{\hat w_{i+1}} = \boldsymbol{\hat w_{i}}+\eta \sum_{i=1}^n e_i \cdot \boldsymbol{\hat x_{i}}= \boldsymbol{\hat w_{i}} + \eta \hat X^T E w^i+1=w^i+ηi=1neix^i=w^i+ηX^TE
式中 e i = y i − ϕ ( z i ) e_i=y_i-\phi(z_i) ei=yiϕ(zi)表示实际与预测之间的差值, E = ( e 1 , ⋯   , e n ) T E=(e_1,\cdots,e_n)^T E=(e1,,en)T X ^ T = ( x ^ 1 , ⋯   , x ^ n ) \hat X^T =(\boldsymbol {\hat x_1},\cdots, \boldsymbol {\hat x_n}) X^T=(x^1,,x^n).


Python实现

# !/usr/bin/python
import numpy as np

import matplotlib.pyplot as plt


def load_data_sets():
    """
    载入数据集

    样本数为n,特征数为m
    X:n*m(n行m列), y:1*m(1行m列)
    """
    X, y = [], []
    with open('testSet.txt', encoding='utf8') as f:
        for line in f:
            line = line.strip().split()
            if len(line):
                X.append(line[:-1])
                y.append(line[-1])
    return np.array(X, dtype=np.float), np.array(y, dtype=np.float)


def plt_data_sets(X, y, w):
    """
    数据集散点图

    X: 输入级, n*2
    y: 标签,1*n
    w:模型参数[b, w1, w2]
    """
    true = y == 1
    x1, x2 = X[:, 0], X[:, 1]
    plt.scatter(x1[true], x2[true], marker='+', c='b')
    plt.scatter(x1[~true], x2[~true], marker='o', c='r')

    # 添加决策边界, w^T * x + b = 0
    # 二维特征对应w1*x1 + w2*x2 + b = 0, 其中x2为纵轴
    # 即 y = -(w1*x1 + b) / w2
    x1 = np.arange(x1.min(), x1.max(), 0.1)
    x2 = -(w[0] + w[1] * x1) / w[2]
    plt.plot(x1, x2)
    plt.show()


class LogisticRegression:
    """逻辑回归"""

    def __init__(self, max_iter=300, tol=1e-3, alpha=0.001):
        """
        初始化构造函数

        :param max_iter: 最大迭代次数
        :param tol: 精度
        :param a: 学习速度
        """
        self.max_iter = max_iter
        self.tol = tol
        self.alpha = alpha

        self.w = None
        self.b = None

    def train(self, X, y):
        """
        训练模型

        :param X: 输入集,n*m(n行m列)
        :param y: 标签集,1*m(1行m列)
        :return: 训练好的模型
        """
        n, m = np.shape(X)
        # X = (X, 1), w=(w;b)
        # w^T*x+b => w^T*X
        X = np.mat(np.concatenate((X, np.ones((n, 1))), axis=1))
        y = np.mat(y).T
        w = np.ones((m + 1, 1))

        alpha = self.alpha
        tol = self.tol
        records = []

        for i in range(self.max_iter):
            # 实际值与预测值之差
            error = y - 1.0 / (1 + np.exp(-X * w))
            # 误差变化记录
            records.append((error.T * error)[0, 0])
            if records[-1] <= tol:
                break
            # 负梯度方向更新参数
            w += alpha * X.T * error

            # 动态更新误差曲线
            """
            plt.title('Error Curve')
            plt.xlabel('iterations')
            plt.ylabel('error')
            plt.plot(range(len(records)), records, 'b')
            plt.pause(0.01)
            """

        self.w = w[:-1]
        self.b = w[-1]

    def prob(self, X):
        """计算类1概率"""
        if self.w is None or self.b is None:
            raise Exception('Please train the model.')
        # n*1的矩阵
        results = 1.0 / (1 + np.exp(-np.mat(X) * self.w - self.b))
        # n*1矩阵转为n*1的数组
        return np.array([res[0] for res in np.array(results)])

    def predict(self, X):
        """预测输出类别"""
        prob = self.prob(X)
        # 类1概率小于0.5则为0,其余为1
        return np.where(prob < 0.5, 0, 1)

    def score(self, X, y):
        """
        预测输入集正确率

        :param X:
        :param y:
        :return:
        """
        predict = self.predict(X)
        try:
            error_num = np.where(predict != y)[0][0]
        except IndexError:
            error_num = 0
        return 1-  1. * error_num / len(y)


if __name__ == '__main__':
    # 二分类测试
    X, y = load_data_sets()

    # 分割数据集
    num = int(len(X) * 0.7)
    LR = LogisticRegression()
    LR.train(X[:num], y[:num])

    # 预测输出
    predict = LR.predict(X[num:])

    # 计算得分
    score = LR.score(X[num:], y[num:])
    print(score)

    # 绘制分割平面
    plt_data_sets(X, y, (LR.b, LR.w[0], LR.w[1]))

模型训练效果如下

图2. 模型误差随迭代次数变化曲线(左)和逻辑回归得到的决策边界图(右)

数据集下载链接:https://pan.baidu.com/s/1Xd-DXpw8xvm9mVWuQ0CU8Q

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值