Logistics Regression推导+代码实现

最新推荐文章于 2023-07-03 23:05:16 发布

Sycret

最新推荐文章于 2023-07-03 23:05:16 发布

阅读量293

点赞数

分类专栏：学习笔记文章标签：机器学习

本文链接：https://blog.csdn.net/qq_34131692/article/details/109763742

版权

学习笔记专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Logistic Regression

1.1 Purpose

解决classification的算法（虽然名字是regression）。

1.2 Classes

在这里插入图片描述
目的：找一个模型，分类两个良性和恶性的肿瘤类别。（拟合一个分类边界）

Q:为什么是直线不是曲线
A:也可以是曲线但是logistic 回归的结果是直线

$x_1$ : size
$x_2$ : prob
$h(x,\theta)$ :“幕后推手”，真正能决定肿瘤是恶性还是良性

$w_1x_1 + w_2x_2 + b=\boldsymbol{\theta}^T\boldsymbol{x}$
想把它映射到 $[0, 1]$ 之间方便分类。

sigmoid函数：
$\frac{1}{1+e^{-x}}$
在这里插入图片描述

值域 $(0, 1)$
导数 $y^{'} = y (1 - y)$

1.3 Model选择：

$h_\theta(x) = \frac{1}{1+e^{-\boldsymbol{\theta}^T\boldsymbol{x}}}$

确定损失函数的形式：（目的：让预测和真实尽可能接近）
推导：
最大似然函数加负号
$P(y=1|x;\theta)=h_\theta(x)$
$P(y=0|x;\theta)=1-h_\theta(x)$
Trick:
$L=\prod_{i=1}^m{P^{(i)}}=\prod{\{h_\theta^y(x)[1-h_\theta(x)]^{1-y}\}}$
$log(L)=\sum_{i=1}^m{\{yh_\theta(x)+(1-y)[1-h_\theta(x)]\}}$
$Loss=-log(L)=-\sum_{i=1}^m{\{yh_\theta(x)+(1-y)[1-h_\theta(x)]\}}$
Note：这是二分类情况下的结果，类别多的时候就成为了交叉熵。

Q：为什么不能用非凸函数？
A：SGD方法不支持非凸函数

1.4 求导过程：

结合上求得的形式，再加上log，得到最后的cost function：
$J(\theta) = -\frac{1}{m}\sum_{i=1}^m{[y^{(i)} \cdot log(h_\theta(x^{(i)}))+(1-y^{(i)}) \cdot log(1-h_\theta(x^{(i)}))]}$

求偏导 $\frac{\partial{J}}{\partial{\theta}}$
第一项 $y^{(i)}$ 是标签值，直接拿过来，在结合复合函数求导，
$\frac{\partial{J}}{\partial{\theta}}=-\sum_{i=1}^m{[y^{(i)}\cdot\frac{1}{h_\theta(x^{(i)})}\cdot \frac{\partial{h_\theta(x^{(i)})}}{\partial\theta}+(1-y^{(i)})\cdot\frac{-1}{1-h_\theta(x^{(i)})}\cdot\frac{\partial{h_\theta(x^{(i)})}}{\partial\theta}]}$
$=-\sum_{i=1}^m{[\frac{y^{(i)}}{h_\theta(x^{(i)})}-\frac{1-y^{(i)}}{1-h_\theta(x^{(i)})}]\cdot\frac{\partial{h_\theta(x^{(i)})}}{\partial\theta}}$

根据sigmoid函数导数的特点，即 $y^{'} = y (1 - y)$ ，再结合模型函数：
$h_\theta(x) = \frac{1}{1+e^{-\boldsymbol{\theta}^T\boldsymbol{x}}}$ 可以得到：
$\frac{\partial{h_\theta(x^{(i)})}}{\partial\theta}=h_\theta(x^{(i)})\cdot[1-h_\theta(x^{(i)})]\cdot x^{(i)}$
注意这里是把 $\theta$ 当做自变量求的偏导，所以后面的 $x^{(i)}$ 不能忘记乘。
把这个结论代进求导式中，通分后可以得到：
$\frac{\partial{J}}{\partial{\theta}}=\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x^{(i)}$

这个梯度和线性回归的结果是一样的：
在这里插入图片描述
线性回归的梯度是：
$\frac{\partial{J}}{\partial{\theta}}=\frac{1}{m}\cdot\sum_{i=1}^m(h_\theta(x^i)-y^i)\cdot x^i$

2. Multi-Classes情况

one vs all 一对多：

model 1: 分出三角一类剩下两个视为同一类
model 2:…
model 3:…
训练3个模型。

优点：K个模型
缺点：样本不均衡，只关注多数样本的正确率，三角的训练不好，因为三角的梯度贡献小
one vs one 一对一：

任意2类都训练一个model来分类
优点：训练数据比较少一次就只有2个类别的训练数据，没有不均衡的问题
缺点：训练次数太多了 $C_K^2$ 次model
交叉熵，k个类别
$-\sum_{j=1}^m\sum_{i=1}^k{ylog(P)}$

3. Implementation from Scratch (python)

import numpy as np
import matplotlib.pyplot as plt
import random

# 创造数据集
data_0 = np.random.multivariate_normal(mean=[3,4],cov=[[3,0],[0,1]],size=100)
data_1 = np.random.multivariate_normal(mean=[7,6],cov=[[3,0],[0,2]],size=200)
y_0 = np.array([0]*100)
y_1 = np.array([1]*200)
data_x = np.vstack((data_0,data_1))
data_y = np.hstack((y_0,y_1))
plt.scatter(data_x[:,0],data_x[:,1],c=data_y)
plt.show()

# 打乱顺序
data = list(zip(data_x,data_y)) #可以用来打包
random.shuffle(data)
data_x, data_y = zip(*data)
data_x = np.array(data_x)
data_y = np.array(data_y)

# 求训练集预测准确率
def acc(y_pred):
    y_predict = np.around(y_pred) # 四舍五入取巧，默认threshold为0.5
    ## data_y:(300,) y_predict: (300,1)
    correct_rate = np.mean(np.equal(y_predict,data_y[:,np.newaxis])) #np.equal判断对应位置是否相等
    return correct_rate

# sigmoid
def sigmoid(x):
    return np.array(1/(1+np.exp(-x)))

# 模型正向推理过程
def predict(w,b):
    #data_x:(300,2) 模型维度一定要注意
    #w:(1,2)
    #b:(1,1)
    hidden = np.dot(data_x,w.T)+b
    return sigmoid(hidden)

# 模型梯度反传
def gradients(y_pred):
    #data_y:(300,)->(300,1) 通过data_y[:,np.newaxis]
    #y_pred:(300,1)
    #data_x:(300,2)
    grad_w = np.sum((y_pred-data_y[:,np.newaxis])*data_x,axis=0,keepdims=True)
    grad_b = np.sum((y_pred-data_y[:,np.newaxis]),axis=0,keepdims=True)
    assert grad_w.shape==(1,2)
    assert grad_b.shape==(1,1)
    return grad_w, grad_b

learning_rate = 1e-2 #不可以太大 exp之后会overflow 太小也不行 会不动
iterations = 1000
w = np.random.rand(1,2)
b = np.random.rand(1,1)

# 可视化

from IPython import display
for i in range(iterations):
    y_pred = predict(w,b)
    grad_w,grad_b = gradients(y_pred)
    w = w - learning_rate * grad_w
    b = b - learning_rate * grad_b
    if i%10==0:
        w1 = w[0][0]
        w2 = w[0][1]
        b = b[0][0]
        line_x = np.linspace(-10,20,100)
        line_y = (0.5-b-w1*line_x)/w2
        display.clear_output(wait=True)#每次画之前清除一下图
        plt.plot(line_x,line_y)
        plt.xlim((-5,15))# 限制范围避免图一直动
        plt.ylim((0,10))
        plt.scatter(data_x[:,0],data_x[:,1],c=data_y)
        plt.title(f'iteration: {i}, acc: {round(acc(y_pred),3)}')
        plt.pause(0.1)
        plt.show()