逻辑回归

最新推荐文章于 2022-05-11 20:40:39 发布

XuLu2013

最新推荐文章于 2022-05-11 20:40:39 发布

阅读量292

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/u012564684/article/details/78144866

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

逻辑回归

【介绍】

逻辑回归是用来解决分类问题的，典型的分类方式是二分类，可推广到多分类。线性回归的y是连续的，经过非线性函数sigmoid后将值约束在（0,1）区间内，设定一个阈值，通过判断与阈值的大小关系可将输入分成两类。

Sigmoid函数

曲线

性质

可视为概率

以0.5为界进行二分类，当时为一类，时为另一类

【模型】

设属于一类时为y=1，另一类时y=0,有

【lossfunction】

对于m个独立样本，取最大似然函数

对数似然函数为：

求的最大值，使用梯度上升法

Loss function:

【求解】

梯度上升法

梯度

矩阵形式

【simulation】

code:

# -*- coding: utf-8 -*-
"""
logistic regression
Date:2017/9/18
@author: xulu
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1.0/(1+np.exp(-x))

def loadDataSet():
    fil=pd.read_csv("testSet.txt",encoding="utf-8",header=None,delimiter='\t')
    dataMat=fil.loc[:,0:1]
    labelMat=fil[2]
    negSampleMat=fil[fil[2]==0].loc[:,0:1]
    posSampleMat=fil[fil[2]==1].loc[:,0:1]
    return dataMat,labelMat,negSampleMat,posSampleMat

def plotDataSet(negSampleMat,posSampleMat):
    plt.figure()
    #fil.plot(x=0,y=1,kind='scatter')#DateFram's plot
    plt.scatter(negSampleMat[0],negSampleMat[1],c='g',marker='o')#g--0
    plt.scatter(posSampleMat[0],posSampleMat[1],c='r',marker='s')#r--1
    plt.show()

def handleDataMat(dataMat):
    dataMat['a']=1
    a=dataMat.pop('a')
    dataMat.insert(0,'a',a)
    return dataMat

def gradient(dataMatrix,labelMat,weights):
    h = sigmoid(dataMatrix*weights)     #matrix mult
    error = (labelMat - h)              
    return dataMatrix.transpose()* error

def gradAscent(dataMatrix,labelMat,weights,iters,alpha):
    for _ in range(iters):
        grad=gradient(dataMatrix,labelMat,weights)             
        weights = weights + alpha * grad
    return weights

def params_init(param_nums):
    alpha = 0.001
    iters = 500
    weights = np.ones((param_nums,1))
    return alpha,iters,weights

def train(dataMatIn, classLabels):
    handleDataMat(dataMatIn)
    dataMatrix = np.mat(dataMatIn)             #convert to NumPy matrix
    labelMat = np.mat(classLabels).transpose() 
    m,n = np.shape(dataMatrix)
    alpha,iters,weights=params_init(n)
    weights=gradAscent(dataMatrix, labelMat,weights,iters,alpha)
    return weights.getA()

def plotBestFit(negSampleMat,posSampleMat,weights):
    plt.figure()
    plt.scatter(negSampleMat[0],negSampleMat[1],c='g',marker='o')#g--0
    plt.scatter(posSampleMat[0],posSampleMat[1],c='r',marker='s')#r--1
    x = np.arange(-3.0, 3.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    plt.plot(x, y)
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()

def predict(x,weights):
    if sigmoid(x*weights)>0.5:
        return 1
    else:
        return 0

if __name__=='__main__':
    dataMat,labelMat,negSampleMat,posSampleMat=loadDataSet()
    plotDataSet(negSampleMat,posSampleMat)
    
    weights=train(dataMat,labelMat)
    plotBestFit(negSampleMat,posSampleMat,weights)
    
    x=np.matrix([1,0,0])#predict data (0,0)
    print(predict(x,weights))
    x=np.matrix([1,1,8])#predict data (1,8)
    print(predict(x,weights))

Result: