逻辑回归

最新推荐文章于 2022-08-30 13:11:57 发布

一个聪明的女人

最新推荐文章于 2022-08-30 13:11:57 发布

阅读量1.1k

点赞数 2

分类专栏：机器学习文章标签： LR 逻辑回归实现

本文链接：https://blog.csdn.net/xiaomuworld/article/details/51957759

版权

机器学习专栏收录该内容

21 篇文章 2 订阅

订阅专栏

逻辑回归是一种广泛应用的分类算法，它在线性回归基础上引入Sigmoid函数，解决0-1二值分类问题。通过损失函数和梯度下降进行优化，多分类问题常采用one-vs-rest策略。本文结合寒老师的课程，介绍了逻辑回归的原理及Python实现。

摘要由CSDN通过智能技术生成

LR逻辑回归是在线性回归的基础上增加Sigmoid函数映射。是业界使用最广泛的分类算法。

线性回归

线性回归的模型：

h θ (x) = g (θ T x)

$h_{\theta}(x) = g(\theta^{T}x)$
损失函数：

J (θ) = 1 m \sum i = 1 m 1 2 (h θ (x (i)) - y (i)) 2

$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\frac{1}{2}(h_\theta(x^{(i)})-y^{(i)})^2$
求输出值

hθ(x) $h_{\theta}(x)$ 与真实值之间的差值。使用梯度下降GD(gradient desent)，由于

J(θ) $J(\theta)$ 图形是non-convex,容易求得局部最优解。

LR定义

在线性回归的基础上，增加sigmoind函数，将问题转换为0-1二值分类问题。sigmoid函数在0处取得函数值0.5，大于0的结果趋近1，小于0的结果趋近0.

h θ (x) = g (θ T x)

$h_{\theta}(x) = g(\theta^{T}x)$

g (z) = 1 1 + e - z

$g(z)=\frac{1}{1+e^{−z}}$

损失函数

将问题求解转化为凸函数问题，损失函数的求解问题也就是统计中的交叉熵概念(分析两种结果的差距)，LR损失函数定义如下：

J (θ) = 1 m \sum i = 1 m [- y (i) l o g (h θ (x (i))) - (1 - y (i)) l o g (1 - h θ (x (i)))]

$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}\, log\,( h_\theta\,(x^{(i)}))-(1-y^{(i)})\,log\,(1-h_\theta(x^{(i)}))\big]$
正则化：加一个平衡因子。

J (θ) = 1 m \sum i = 1 m [- y (i) l o g (h θ (x (i))) - (1 - y (i)) l o g (1 - h θ (x (i)))] + λ 2 m \sum j = 1 n θ 2 j

$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}\, log\,( h_\theta\,(x^{(i)}))-(1-y^{(i)})\,log\,(1-h_\theta(x^{(i)}))\big] + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$
向量化的损失函数：

J (θ) = 1 m ((l o g (g (X θ)) T y + (l o g (1 - g (X θ)) T (1 - y)) + λ 2 m \sum j = 1 n θ 2 j

$J(\theta) = \frac{1}{m}\big((\,log\,(g(X\theta))^Ty+(\,log\,(1-g(X\theta))^T(1-y)\big) + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$

优化

使用梯度下降方法GD求解loss function的最小值，随机找一组 $\theta$ 值，求偏导，

δ J ( θ ) δ θ j = 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j + λ m θ j

$\frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m}\sum_{i=1}^{m} ( h_\theta (x^{(i)})-y^{(i)})x^{(i)}_{j} + \frac{\lambda}{m}\theta_{j}$
向量化的偏导：

δ J ( θ ) δ θ j = 1 m X T (g (X θ) - y) + λ m θ j

$\frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m} X^T(g(X\theta)-y) + \frac{\lambda}{m}\theta_{j}$
梯度下降更新

θ $\theta$ 值：

θ j : = θ j - α δ δ θ j J (θ)

$\theta_j:=\theta_j-\alpha\frac{\delta}{\delta\theta_j}J(\theta)$

α $\alpha$ 是步长，一般取一组指数增长的值实验，如0.01,0.1…，如果步长取太大，会产生震荡而不收敛。

多分类问题

一般使用one-vs-rest方法，将多分类问题转换成二分类问题。

python实现

# %load ../../standard_import.txt
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

from scipy.optimize import minimize

from sklearn.preprocessing import PolynomialFeatures

pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 150)
pd.set_option('display.max_seq_items', None)

#%config InlineBackend.figure_formats = {'pdf',}
%matplotlib inline

import seaborn as sns
sns.set_context('notebook')
sns.set_style('white')

def loaddata(file, delimeter):
    data = np.loadtxt(file, delimiter=delimeter)
    print('Dimensions: ',data.shape)
    print(data[1:6,:])
    return(data)
def plotData(data, label_x, label_y, label_pos, label_neg, axes=None):
    # 获得正负样本的下标(即哪些是正样本，哪些是负样本)
    neg = data[:,2] == 0
    pos = data[:,2] == 1

    if axes == None:
        axes = plt.gca()
    axes.scatter(data[pos][:,0], data[pos][:,1], marker='+', c='k', s=60, linewidth=2, label=label_pos)
    axes.scatter(data[neg][:,0], data[neg][:,1], c='y', s=60, label=label_neg)
    axes.set_xlabel(label_x)
    axes.set_ylabel(label_y)
    axes.legend(frameon= True, fancybox = True);
data = loaddata('data1.txt', ',')
X = np.c_[np.ones((data.shape[0],1)), data[:,0:2]]
y = np.c_[data[:,2]]
plotData(data, 'Exam 1 score', 'Exam 2 score', 'Pass', 'Fail')
#定义sigmoid函数
def sigmoid(z):
    return(1 / (1 + np.exp(-z)))
#定义损失函数
def costFunction(theta, X, y):
    m = y.size
    h = sigmoid(X.dot(theta))

    J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y))

    if np.isnan(J[0]):
        return(np.inf)
    return J[0]
#求解梯度
def gradient(theta, X, y):
    m = y.size
    h = sigmoid(X.dot(theta.reshape(-1,1)))

    grad =(1.0/m)*X.T.dot(h-y)

    return(grad.flatten())
initial_theta = np.zeros(X.shape[1])
cost = costFunction(initial_theta, X, y)
grad = gradient(initial_theta, X, y)
print('Cost: \n', cost)
print('Grad: \n', grad)
#最小化损失函数
res = minimize(costFunction, initial_theta, args=(X,y), jac=gradient, options={'maxiter':400})
#预测
def predict(theta, X, threshold=0.5):
    p = sigmoid(X.dot(theta.T)) >= threshold
    return(p.astype('int'))
sigmoid(np.array([1, 45, 85]).dot(res.x.T))
#画决策边界
plt.scatter(45, 85, s=60, c='r', marker='v', label='(45, 85)')
plotData(data, 'Exam 1 score', 'Exam 2 score', 'Admitted', 'Not admitted')
x1_min, x1_max = X[:,1].min(), X[:,1].max(),
x2_min, x2_max = X[:,2].min(), X[:,2].max(),
xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
h = sigmoid(np.c_[np.ones((xx1.ravel().shape[0],1)), xx1.ravel(), xx2.ravel()].dot(res.x))
h = h.reshape(xx1.shape)
plt.contour(xx1, xx2, h, [0.5], linewidths=1, colors='b');

#以下为加正则化的LR
data2 = loaddata('data2.txt', ',')
# 拿到X和y
y = np.c_[data2[:,2]]
X = data2[:,0:2]
# 画个图
plotData(data2, 'Microchip Test 1', 'Microchip Test 2', 'y = 1', 'y = 0')
#多项式特征
poly = PolynomialFeatures(6)
XX = poly.fit_transform(data2[:,0:2])
# 看看形状(特征映射后x有多少维了)
print(XX.shape)
# 定义损失函数
def costFunctionReg(theta, reg, *args):
    m = y.size
    h = sigmoid(XX.dot(theta))

    J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y)) + (reg/(2.0*m))*np.sum(np.square(theta[1:]))

    if np.isnan(J[0]):
        return(np.inf)
    return(J[0])
def gradientReg(theta, reg, *args): #注意，我们另外自己加的参数 θ0 不需要被正则化
    m = y.size
    h = sigmoid(XX.dot(theta.reshape(-1,1)))

    grad = (1.0/m)*XX.T.dot(h-y) + (reg/m)*np.r_[[[0]],theta[1:].reshape(-1,1)]

    return(grad.flatten())
initial_theta = np.zeros(XX.shape[1])
costFunctionReg(initial_theta, 1, XX, y)
fig, axes = plt.subplots(1,3, sharey = True, figsize=(17,5))

# 决策边界，咱们分别来看看正则化系数lambda太大太小分别会出现什么情况
# Lambda = 0 : 就是没有正则化，这样的话，就过拟合咯
# Lambda = 1 : 这才是正确的打开方式
# Lambda = 100 : 正则化项太激进，导致基本就没拟合出决策边界

for i, C in enumerate([0.0, 1.0, 100.0]):
    # 最优化 costFunctionReg
    res2 = minimize(costFunctionReg, initial_theta, args=(C, XX, y), jac=gradientReg, options={'maxiter':3000})

    # 准确率
    accuracy = 100.0*sum(predict(res2.x, XX) == y.ravel())/y.size    

    # 对X,y的散列绘图
    plotData(data2, 'Microchip Test 1', 'Microchip Test 2', 'y = 1', 'y = 0', axes.flatten()[i])

    # 画出决策边界
    x1_min, x1_max = X[:,0].min(), X[:,0].max(),
    x2_min, x2_max = X[:,1].min(), X[:,1].max(),
    xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
    h = sigmoid(poly.fit_transform(np.c_[xx1.ravel(), xx2.ravel()]).dot(res2.x))
    h = h.reshape(xx1.shape)
    axes.flatten()[i].contour(xx1, xx2, h, [0.5], linewidths=1, colors='g');       
    axes.flatten()[i].set_title('Train accuracy {}% with Lambda = {}'.format(np.round(accuracy, decimals=2), C))

总结男神寒老师的课程笔记

一个聪明的女人

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
逻辑回归

LR逻辑回归是在线性回归的基础上增加Sigmoid函数映射。是业界使用最广泛的分类算法。线性回归线性回归的模型： hθ(x)=g(θTx) h_{\theta}(x) = g(\theta^{T}x) 损失函数： J(θ)=1m∑i=1m12(hθ(x(i))−y(i))2 J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\frac{1}{2}(h_\theta(x^
复制链接

扫一扫

专栏目录