MachineLearning—Logistic Regression(四)-逻辑回归应用于手写数字识别

    之前的文章对逻辑回归已经进行了较为详尽的介绍,在此我们为了更为迅速方便的理解逻辑回归在手写数字识别当中的应用,我们仅仅做简单快速的回顾。逻辑回归通常应用于二分类问题,而经过拓展其也可以应用于多分类问题,此问题读者可以去看本人写的这篇文章机器学习-Softmax Regression


(一)逻辑回归原理回顾


逻辑回归的主要目的就是寻找图中的蓝色最佳分界线,其中x1,x2可以理解为两个特征,每个特征会对应一个权值θ,则分界线的表达式就是:Θ0*x0+Θ1*x1+Θ2*x2=Θ0+Θ1*x1+Θ2*x2=0, 分类边界可以统一表示成f(x)=ΘT*x,


记住梯度上升的权值更新公式对于理解代码非常重要,如下:




(二)数据解释

在此我们工程试验依然采用knn中的数据,数据样式如下:为了方便简单起见,我们这里只做了一个0,1手写数字的分类;



(三)源码理解:

#!/usr/bin/python
# coding=utf-8
from numpy import *
import os
import sys

def loadData(direction):    # 加载数据 生成样本大矩阵 和标签矩阵
    trainfileList=os.listdir(direction)  # 返回路径下所有文件名 列表
    m=len(trainfileList)  
    dataArray= zeros((m,1024))  #创建空矩阵
    labelArray= zeros((m,1))    #标签空矩阵
    for i in range(m):  
        returnArray=zeros((1,1024))  #每个txt文件形成的特征向量  
        filename=trainfileList[i]    #返回文件名
        fr=open('%s/%s' %(direction,filename))  
        for j in range(32):      #一行一行的读取入 32个循环
            lineStr=fr.readline()  
            for k in range(32):  
                returnArray[0,32*j+k]=int(lineStr[k])   #一个个的存入样本矩阵
        dataArray[i,:]=returnArray   #存储特征向量  被存入样本大矩阵
      
        filename0=filename.split('.')[0]  
        label=filename0.split('_')[0]  
        labelArray[i]=int(label)     #存储类别标签  
    return dataArray,labelArray  


def sigmoid(inX):  
    return 1.0/(1+exp(-inX))  


def gradAscent(dataArray,labelArray,alpha,maxCycles):    #梯度上升  
    dataMat=mat(dataArray)    #size:m*n  
    labelMat=mat(labelArray)      #size:m*1  
    m,n=shape(dataMat)  
    weigh=ones((n,1))         #权值初始化
    for i in range(maxCycles):  
        h=sigmoid(dataMat*weigh)  
        error=labelMat-h    #size:m*1  其实就是公式里的y-h
        weigh=weigh+alpha*dataMat.transpose()*error    #梯度上升权值更新公式
    return weigh  


def classfy(testdir,weigh):    #测试分类
    dataArray,labelArray=loadData(testdir)  
    dataMat=mat(dataArray)  
    labelMat=mat(labelArray)  
    h=sigmoid(dataMat*weigh)  #size:m*1  m行1列
    m=len(h)  
    error=0.0  
    for i in range(m):  
        if int(h[i])>0.5:     #大于0.5我们就判定为数字1,否则我们判定为数字0
            print int(labelMat[i]),'is classfied as: 1'  
            if int(labelMat[i])!=1:  #而如果实际上标签不是1,说明判断错误
                error+=1  
                print 'error'  
        else:  
            print int(labelMat[i]),'is classfied as: 0'  
            if int(labelMat[i])!=0:  #同样如果实际上的标签不是0,说明判定错误
                error+=1  
                print 'error'  
    print 'error rate is:','%.4f' %(error/m)  


#调用子函数
def digitRecognition(trainDir,testDir,alpha=0.07,maxCycles=10):  
    data,label=loadData(trainDir)  
    weigh=gradAscent(data,label,alpha,maxCycles)  
    classfy(testDir,weigh)  

#执行数字识别函数 digitRecognition()
digitRecognition('C:\\Anaconda\\LRnum\\train','C:\\Anaconda\\LRnum\\test')  

运行结果如下:
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1
500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

In [2]: run C:\\Anaconda\\LRnum.py
C:\\Anaconda\\LRnum.py:29: RuntimeWarning: overflow encountered in exp
  return 1.0/(1+exp(-inX))
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 1
error
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
error rate is: 0.0118



参考资料:http://blog.csdn.net/u012162613/article/details/41844495

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值