逻辑回归

逻辑回归

【介绍】

逻辑回归是用来解决分类问题的,典型的分类方式是二分类,可推广到多分类。线性回归的y是连续的,经过非线性函数sigmoid后将值约束在(0,1)区间内,设定一个阈值,通过判断与阈值的大小关系可将输入分成两类。

Sigmoid函数

                                                           

曲线


性质

                                                     


                                                                

                                                          

 可视为概率

以0.5为界进行二分类,当时为一类,时为另一类

【模型】

设属于一类时为y=1,另一类时y=0,有

                        


                                             

【lossfunction】

对于m个独立样本,取最大似然函数

                 

对数似然函数为:

       

求 的最大值,使用梯度上升法

Loss function:


【求解】

梯度上升法

梯度

                                                              


矩阵形式

                                                               

simulation

code:

# -*- coding: utf-8 -*-
"""
logistic regression
Date:2017/9/18
@author: xulu
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1.0/(1+np.exp(-x))

def loadDataSet():
    fil=pd.read_csv("testSet.txt",encoding="utf-8",header=None,delimiter='\t')
    dataMat=fil.loc[:,0:1]
    labelMat=fil[2]
    negSampleMat=fil[fil[2]==0].loc[:,0:1]
    posSampleMat=fil[fil[2]==1].loc[:,0:1]
    return dataMat,labelMat,negSampleMat,posSampleMat

def plotDataSet(negSampleMat,posSampleMat):
    plt.figure()
    #fil.plot(x=0,y=1,kind='scatter')#DateFram's plot
    plt.scatter(negSampleMat[0],negSampleMat[1],c='g',marker='o')#g--0
    plt.scatter(posSampleMat[0],posSampleMat[1],c='r',marker='s')#r--1
    plt.show()

def handleDataMat(dataMat):
    dataMat['a']=1
    a=dataMat.pop('a')
    dataMat.insert(0,'a',a)
    return dataMat

def gradient(dataMatrix,labelMat,weights):
    h = sigmoid(dataMatrix*weights)     #matrix mult
    error = (labelMat - h)              
    return dataMatrix.transpose()* error

def gradAscent(dataMatrix,labelMat,weights,iters,alpha):
    for _ in range(iters):
        grad=gradient(dataMatrix,labelMat,weights)             
        weights = weights + alpha * grad
    return weights

def params_init(param_nums):
    alpha = 0.001
    iters = 500
    weights = np.ones((param_nums,1))
    return alpha,iters,weights

def train(dataMatIn, classLabels):
    handleDataMat(dataMatIn)
    dataMatrix = np.mat(dataMatIn)             #convert to NumPy matrix
    labelMat = np.mat(classLabels).transpose() 
    m,n = np.shape(dataMatrix)
    alpha,iters,weights=params_init(n)
    weights=gradAscent(dataMatrix, labelMat,weights,iters,alpha)
    return weights.getA()

def plotBestFit(negSampleMat,posSampleMat,weights):
    plt.figure()
    plt.scatter(negSampleMat[0],negSampleMat[1],c='g',marker='o')#g--0
    plt.scatter(posSampleMat[0],posSampleMat[1],c='r',marker='s')#r--1
    x = np.arange(-3.0, 3.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    plt.plot(x, y)
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()

def predict(x,weights):
    if sigmoid(x*weights)>0.5:
        return 1
    else:
        return 0

if __name__=='__main__':
    dataMat,labelMat,negSampleMat,posSampleMat=loadDataSet()
    plotDataSet(negSampleMat,posSampleMat)
    
    weights=train(dataMat,labelMat)
    plotBestFit(negSampleMat,posSampleMat,weights)
    
    x=np.matrix([1,0,0])#predict data (0,0)
    print(predict(x,weights))
    x=np.matrix([1,1,8])#predict data (1,8)
    print(predict(x,weights))

Result:


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值