之前的文章对逻辑回归已经进行了较为详尽的介绍,在此我们为了更为迅速方便的理解逻辑回归在手写数字识别当中的应用,我们仅仅做简单快速的回顾。逻辑回归通常应用于二分类问题,而经过拓展其也可以应用于多分类问题,此问题读者可以去看本人写的这篇文章机器学习-Softmax Regression;
(一)逻辑回归原理回顾
逻辑回归的主要目的就是寻找图中的蓝色最佳分界线,其中x1,x2可以理解为两个特征,每个特征会对应一个权值θ,则分界线的表达式就是:Θ0*x0+Θ1*x1+Θ2*x2=Θ0+Θ1*x1+Θ2*x2=0, 分类边界可以统一表示成f(x)=ΘT*x,
记住梯度上升的权值更新公式对于理解代码非常重要,如下:
(二)数据解释
在此我们工程试验依然采用knn中的数据,数据样式如下:为了方便简单起见,我们这里只做了一个0,1手写数字的分类;
(三)源码理解:
#!/usr/bin/python
# coding=utf-8
from numpy import *
import os
import sys
def loadData(direction): # 加载数据 生成样本大矩阵 和标签矩阵
trainfileList=os.listdir(direction) # 返回路径下所有文件名 列表
m=len(trainfileList)
dataArray= zeros((m,1024)) #创建空矩阵
labelArray= zeros((m,1)) #标签空矩阵
for i in range(m):
returnArray=zeros((1,1024)) #每个txt文件形成的特征向量
filename=trainfileList[i] #返回文件名
fr=open('%s/%s' %(direction,filename))
for j in range(32): #一行一行的读取入 32个循环
lineStr=fr.readline()
for k in range(32):
returnArray[0,32*j+k]=int(lineStr[k]) #一个个的存入样本矩阵
dataArray[i,:]=returnArray #存储特征向量 被存入样本大矩阵
filename0=filename.split('.')[0]
label=filename0.split('_')[0]
labelArray[i]=int(label) #存储类别标签
return dataArray,labelArray
def sigmoid(inX):
return 1.0/(1+exp(-inX))
def gradAscent(dataArray,labelArray,alpha,maxCycles): #梯度上升
dataMat=mat(dataArray) #size:m*n
labelMat=mat(labelArray) #size:m*1
m,n=shape(dataMat)
weigh=ones((n,1)) #权值初始化
for i in range(maxCycles):
h=sigmoid(dataMat*weigh)
error=labelMat-h #size:m*1 其实就是公式里的y-h
weigh=weigh+alpha*dataMat.transpose()*error #梯度上升权值更新公式
return weigh
def classfy(testdir,weigh): #测试分类
dataArray,labelArray=loadData(testdir)
dataMat=mat(dataArray)
labelMat=mat(labelArray)
h=sigmoid(dataMat*weigh) #size:m*1 m行1列
m=len(h)
error=0.0
for i in range(m):
if int(h[i])>0.5: #大于0.5我们就判定为数字1,否则我们判定为数字0
print int(labelMat[i]),'is classfied as: 1'
if int(labelMat[i])!=1: #而如果实际上标签不是1,说明判断错误
error+=1
print 'error'
else:
print int(labelMat[i]),'is classfied as: 0'
if int(labelMat[i])!=0: #同样如果实际上的标签不是0,说明判定错误
error+=1
print 'error'
print 'error rate is:','%.4f' %(error/m)
#调用子函数
def digitRecognition(trainDir,testDir,alpha=0.07,maxCycles=10):
data,label=loadData(trainDir)
weigh=gradAscent(data,label,alpha,maxCycles)
classfy(testDir,weigh)
#执行数字识别函数 digitRecognition()
digitRecognition('C:\\Anaconda\\LRnum\\train','C:\\Anaconda\\LRnum\\test')
Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul 2 2014, 15:12:11) [MSC v.1
500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
In [2]: run C:\\Anaconda\\LRnum.py
C:\\Anaconda\\LRnum.py:29: RuntimeWarning: overflow encountered in exp
return 1.0/(1+exp(-inX))
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 1
error
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
0 is classfied as: 0
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
1 is classfied as: 1
error rate is: 0.0118
参考资料:http://blog.csdn.net/u012162613/article/details/41844495