使用smo算法编写svm对CIFAR-10数据分类

公式太难打了,弄成图片,可能不太美观,但知识没变味
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

3:实验内容

3.1 提取hog特征

本实验的核心在于设计svm算法,因此提取特征使用库函数实现,最主要代码如下

from skimage import feature as ft
ft.hog(data[i],feature_vector=True,block_norm='L2-Hys',transform_sqrt=True)

3.2 使用SVM库验证特征提取后的分类效果

使用库的核心代码如下

trainmatrix=data2image(trainImg['Data'])
hogtrain=meanlie(feature_hog(trainmatrix))
testmatrix=data2image(testImg['Data'])
hogtest=meanlie(feature_hog(testmatrix))
from sklearn import svm
from skimage import feature as ft
clf=svm.SVC()
clf.fit(hogtrain,trainImg['Label'])
pre=clf.predict(hogtest)

分类结果如下
在这里插入图片描述在这里插入图片描述其中,count为正确分类的样本数,为4075,总的测试集样本数为5000。可以看到,分类准确性很高,达到0.815。这还是没有仔细调参的结果,这样的结果是很理想的,证明了hog特征+svm的思路切实可行,下面将其运用到自己编写的svm算法上。

3.3 验证自编二分类算法的正确性

本实验是个多分类问题,因此自己编写svm算法分两步,第一步编写二分类算法,第二步结合前面所选定的多分类策略基于此二分类算法实现多分类。
因此先来第一步:验证自编二分类算法
这里我分别抽取训练集和测试集中的6和9两类,训练集中每类分别选500个样本(只是为了运行快一些),二分类的类命名为PlattSMO,单独保存成自定义模块plattSMO,方便导入
完整smo代码太长,在附录给出,调用部分主要代码如下

trainmatrix=data2image(trainImg['Data'])
hogtrain=meanlie(feature_hog(trainmatrix))
testmatrix=data2image(testImg['Data'])
hogtest=meanlie(feature_hog(testmatrix))
hogtraindata,hogtrainlabel=extractClass(hogtrain,trainImg['Label'],6,9)
hogtraindata,hogtrainlabel=extractPart(hogtraindata,hogtrainlabel,500)
hogtestdata,hogtestlabel=extractClass(hogtest,testImg['Label'],6,9)
smo = plattSMO.PlattSMO(hogtraindata, hogtrainlabel, 0.05, 0.0001, 200, name='rbf', theta=20)
smo.smoP()
testResult = smo.predict(hogtestdata)
count=0
for i in range(len(testResult)):
    if testResult[i]==hogtestlabel[i]:
    count+=1
print('right rate:%f'%(float(count)/len(hogtestlabel)))

smoP函数就是完整的线性SMO算法
结果如下
在这里插入图片描述在这里插入图片描述其中,count是正确分类的样本数,1724个,而测试集中这两类样本一共2000个,二分类准确率0.862。验证了自编二分类代码是正确的。

3.4 验证自编多分类算法的正确性

多分类算法中,我们需要构建n(n-1)/2个二分类模型,只需调用PlattSMO类,实例化即可。构建好10个模型后,对于每个测试样本,使用模型进行分类,调整权重,最后投票表决得出结果即可。多分类的类命名为LibSVM,保存成模块libsvm,在主函数中调用即可。多分类代码主要有训练和预测函数,train和predict,train函数训练模型,保存到self.classfy变量,predict函数多分类策略,值得一提的是,10个模型的分类结果可能会使得某几个类别的权重相同,这种情况下我将这几个权重最大(相同权重)的类别取出来,再对该样本继续分类,调整权重,投票。相当于实行两次多分类策略,只是第二次的类别数较少一些(因为剔除了第一次权重小的类别)
完整代码在附件给出,主要代码如下

def __init__(self,data=[],label=[],C=0,toler=0,maxIter=0,**kernelargs):
        self.classlabel = unique(label)
        self.classNum = len(self.classlabel)
        self.classfyNum = (self.classNum * (self.classNum-1))/2
        self.classfy = []
        self.dataSet={}
        self.kernelargs = kernelargs
        self.C = C
        self.toler = toler
        self.maxIter = maxIter
        m = shape(data)[0]
        for i in range(m):
            label[i]=int(label[i])
            if label[i] not in self.dataSet.keys():
                self.dataSet[int(label[i])] = []
                self.dataSet[int(label[i])].append(data[i][:])
            else:
                self.dataSet[int(label[i])].append(data[i][:])
    def train(self):
        num = self.classNum
        for i in range(num):
            for j in range(i+1,num):
                data = []
                label = [1.0]*shape(self.dataSet[self.classlabel[i]])[0]
                label.extend([-1.0]*shape(self.dataSet[self.classlabel[j]])[0])
                data.extend(self.dataSet[self.classlabel[i]])
                data.extend(self.dataSet[self.classlabel[j]])
                svm = PlattSMO(array(data),array(label),self.C,self.toler,self.maxIter,**self.kernelargs)
                svm.smoP()
                self.classfy.append(svm)
        self.dataSet = None
    def predict(self,data,label):
        m = shape(data)[0]
        num = self.classNum
        classlabel = []
        count = 0.0
        for n in range(m):
            result = [0] * num
            index = -1
            for i in range(num):
                for j in range(i + 1, num):
                    index += 1
                    s = self.classfy[index]
                    t = s.predict([data[n]])[0]
                    if t > 0.0:
                        result[i] +=1
                    else:
                        result[j] +=1
            #classlabel.append(self.classlabel[result.index(max(result))])
            
            resultmax=max(result)
            maxindex=result.index(resultmax)
            index1=[maxindex]
            for i in range(maxindex+1,5):
                if result[i]==resultmax:
                    index1.append(i)
            index2 = [0 for _ in range(len(index1))]
            if len(index1) > 1:
                
                for i in range(len(index1)):
                    for j in range(i+1,len(index1)):
                        if index1[i]==0:
                            s = self.classfy[index1[j-1]]
                        elif index1[i]==3:
                            s=self.classfy[9]
                        else:
                            s=self.classfy[2*index1[i]+index1[j]]
                        t = s.predict([data[n]])[0]
                        if t > 0.0:
                            index2[i]+=1
                        else:
                            index2[j]+=1
            classlabel.append(self.classlabel[index1[index2.index(max(index2))]])
                        
            if classlabel[-1] != label[n]:
                count +=1
                print label[n],classlabel[n]
        #print classlabel
        countright=m-count
        print "right rate:",countright / m
        return classlabel

主函数调用核心代码,libSVM.LibSVM参数很重要,这里选择的松弛变量C为10,容错率toler为0.0001,最大迭代次数maxIter为200,核函数为高斯核’rbf’,对应的带宽theta为20。可调优

 trainImg = loadData(file)
    testImg=loadData(file1)
    traindata,trainlabel=extractData(trainImg['Data'],trainImg['Label'],400)
    trainmatrix=data2image(traindata)
    hogtrain=meanlie(feature_hog(trainmatrix))
    testmatrix=data2image(testImg['Data'])
    hogtest=meanlie(feature_hog(testmatrix))
    #C选10最好,0.678
    svm = libSVM.LibSVM(hogtrain, trainlabel, 10, 0.0001, 200, name='rbf', theta=20)
    svm.train()
    svm.predict(hogtest,testImg['Label'])

训练模型时,只取训练集中一部分样本,每个类别取200或者400个样本,总共1000或者2000个样本进行训练,运行时间约5min以内,是很快的。但是若取全部样本训练,则时间难以忍受,当然只选择这样少的样本训练模型,一定会使得分类准确度下降,但是即便只选择400个样本,准确率已经可以达到0.678了,这是很不错的结果,可以想见,当使用全样本训练时,结果应该可以达到前面使用svm库的0.815的准确率。
实验结果如下
每类抽取200个样本
在这里插入图片描述在这里插入图片描述其中countright为正确分类的样本数,3282个,分类正确率为0.6564
每类抽取400个样本
在这里插入图片描述在这里插入图片描述正确分类的样本数countright为3473,分类正确率为0.6946。比用每类200个样本时高了约4%,而全部训练样本为14968个,如果用上全部样本数据,准确率还会有不小的提升

附录:

代码中各函数功能说明

trialsvm.py
def meanlie(data):
data中每个元素除以对应列的均值,这一步骤替代归一化,在提取hog特征后因为要进行1后续的分类,所以需要归一化数据。而实验发现归一化后分类准确性不是很高,发现是由于每个特征对应的data中的列总是有个别的数极大,其他很小,因此采用每个元素除以所在列的均值的方式代替归一化
def loadData(file):
导入数据集,很简单
def data2image(data):
将每个样本对应的行向量转化成图片并灰度化,因为hog特征是在图片上提取,而数据集中图片是以一个1x3072的行向量表示的
def feature_hog(data):
对传进来的data提取hog特征,返回提取特征后的数据
def extractData(data,label,num):
从大的数据集data中抽取部分样本,num是每类需要抽取的样本数,返回抽取后的样本数据和对应的标签
plattSMO.py
class PlattSMO:
二分类的类
def init(self,dataMat,classlabels,C,toler,maxIter,**kernelargs):
初始化函数,初始化一些变量dataMat-数据矩阵,C - 松弛变量,classLabels - 数据标签,toler - 容错率,maxIter-最大迭代次数,**kernelargs-核函数有关的参数
def kernelTrans(self,x,z):
通过核函数将数据转换更高维的空间
def calcEK(self,k):
计算误差
def updateEK(self,k):
计算Ek,并更新误差缓存
def selectJ(self,i,Ei):
内循环启发方式2
def innerL(self,i):
优化的SMO算法
def smoP(self):
完整的线性SMO算法
def calcw(self):
计算权重W
def predict(self,testData):
预测函数,预测样本类别
libsvm.py
class LibSVM:
多分类类
def train(self):
训练函数,训练10个分类模型
def predict(self,data,label):
预测函数,实现多分类策略,使用训练模型对测试数据data进行预测,给出分类结果

trialsvm.py

# -*- coding: utf-8 -*-
"""
Created on Fri Dec 14 19:15:13 2018

@author: Administrator
"""
from mysvm import plattSMO,libSVM
import matplotlib.pyplot as plt
import numpy as np
import random
from numpy import *
import scipy.io as sio
from sklearn.decomposition import PCA
from sklearn import preprocessing
from skimage import feature as ft
def meanlie(data):
    #每个元素除以对应列的均值,这一步骤替代归一化,因归一化效果不好
    m,n=data.shape
    meandata=np.mean(data,axis=0)# axis=0,计算每一列的均值
    for i in range(n):
        data[:,i]/=meandata[i]
    return data
    
def loadData(file):
    #file='G:/lecture of grade one/pattern recognition/data/train_data.mat'
    trainImg=sio.loadmat(file)
    return trainImg
def data2image(data):
    newdata=[]
    m,n=data.shape
    for i in range(m):
        img=data[i,:].reshape((3,32,32))
        #gray=img.convert('L')
        gray = img[0,:, :]*0.2990+img[1,:, :]*0.5870+img[2,:, :]*0.1140
        newdata.append(gray)
    #np.array(newdata)
    return newdata    
def feature_hog(data):
    #提取hog特征
    fea=[]
    for i in range(len(data)):
        #data[i]=Image.fromarray(data[i][0])
        fea.append(ft.hog(data[i],feature_vector=True,block_norm='L2-Hys',transform_sqrt=True))
    fea=np.array(fea)
    return fea
'''
def extractClass(data,label,class1,class2):
    #抽取两类,并将类别标签改为1或-1,方便做svm
    m,n=data.shape
    index=[]
    for i in range(m):
        if label[i][0]!=class1 and label[i][0]!=class2:
            index.append(i)
    data=np.delete(data,index,0)
    label=np.delete(label,index,0)
    min_max_scaler=preprocessing.MinMaxScaler()
    data=min_max_scaler.fit_transform(data)
    Y=[]        
    for i in label:
        if i[0]==class1:#class1对应1
            Y.append(1)
        else:
            Y.append(-1)#class2对应-1
    Y=np.array(Y)
    return data,Y
def extractPart(data,label,nums):
    m,n=data.shape
    index=[]
    a=0;b=0
    for i in range(m):
        if label[i]==1 and a<nums:
            index.append(i)
            a+=1
        if label[i]==-1 and b<nums:
            index.append(i)
            b+=1
        if a>nums and b>nums:
            break
    data=data[index]
    label=label[index]    
    return data,label
'''
def extractData(data,label,num):
    m,n=data.shape
    count=[0,0,0,0,0]
    cla=[0,6,7,8,9]
    index=[]
    for i in range(m):
        for j in range(5):
            if label[i]==cla[j] and count[j]<num:
                count[j]+=1
                index.append(i)
    data=data[index]
    label=label[index]
    return data,label
                
            
if __name__ == '__main__':
    
    file='G:/lecture of grade one/pattern recognition/trial three/data3/train_data3.mat'
    file1='G:/lecture of grade one/pattern recognition/trial three/data3/test_data3.mat'
    trainImg = loadData(file)
    testImg=loadData(file1)
    traindata,trainlabel=extractData(trainImg['Data'],trainImg['Label'],400)
    #testdata,testlabel=extractData(testImg['Data'],testImg['Label'],1000)
    trainmatrix=data2image(traindata)
    hogtrain=meanlie(feature_hog(trainmatrix))
    testmatrix=data2image(testImg['Data'])
    hogtest=meanlie(feature_hog(testmatrix))
    #C选10最好,0.678
    svm = libSVM.LibSVM(hogtrain, trainlabel, 10, 0.0001, 200, name='rbf', theta=20)
    svm.train()
    svm.predict(hogtest,testImg['Label'])
    
    '''
    trainmatrix=data2image(trainImg['Data'])
    hogtrain=meanlie(feature_hog(trainmatrix))
    testmatrix=data2image(testImg['Data'])
    hogtest=meanlie(feature_hog(testmatrix))
    hogtraindata,hogtrainlabel=extractClass(hogtrain,trainImg['Label'],6,9)
    hogtraindata,hogtrainlabel=extractPart(hogtraindata,hogtrainlabel,500)
    hogtestdata,hogtestlabel=extractClass(hogtest,testImg['Label'],6,9)
    smo = plattSMO.PlattSMO(hogtraindata, hogtrainlabel, 0.05, 0.0001, 200, name='rbf', theta=20)
    smo.smoP()
    testResult = smo.predict(hogtestdata)
    count=0
    for i in range(len(testResult)):
        if testResult[i]==hogtestlabel[i]:
            count+=1
    print('right rate:%f'%(float(count)/len(hogtestlabel)))
    '''

plattSMO.py

import sys
from numpy import *
from svm import *
from os import listdir
class PlattSMO:
    def __init__(self,dataMat,classlabels,C,toler,maxIter,**kernelargs):
        self.x = array(dataMat)
        self.label = array(classlabels).transpose()
        self.C = C
        self.toler = toler
        self.maxIter = maxIter
        self.m = shape(dataMat)[0]
        self.n = shape(dataMat)[1]
        self.alpha = array(zeros(self.m),dtype='float64')
        self.b = 0.0
        self.eCache = array(zeros((self.m,2)))
        self.K = zeros((self.m,self.m),dtype='float64')
        self.kwargs = kernelargs
        self.SV = ()
        self.SVIndex = None
        for i in range(self.m):
            for j in range(self.m):
                self.K[i,j] = self.kernelTrans(self.x[i,:],self.x[j,:])
    def calcEK(self,k):
        fxk = dot(self.alpha*self.label,self.K[:,k])+self.b
        Ek = fxk - float(self.label[k])
        return Ek
    def updateEK(self,k):
        Ek = self.calcEK(k)

        self.eCache[k] = [1 ,Ek]
    def selectJ(self,i,Ei):
        maxE = 0.0
        selectJ = 0
        Ej = 0.0
        validECacheList = nonzero(self.eCache[:,0])[0]
        if len(validECacheList) > 1:
            for k in validECacheList:
                if k == i:continue
                Ek = self.calcEK(k)
                deltaE = abs(Ei-Ek)
                if deltaE > maxE:
                    selectJ = k
                    maxE = deltaE
                    Ej = Ek
            return selectJ,Ej
        else:
            selectJ = selectJrand(i,self.m)
            Ej = self.calcEK(selectJ)
            return selectJ,Ej

    def innerL(self,i):
        Ei = self.calcEK(i)
        if (self.label[i] * Ei < -self.toler and self.alpha[i] < self.C) or \
                (self.label[i] * Ei > self.toler and self.alpha[i] > 0):
            self.updateEK(i)
            j,Ej = self.selectJ(i,Ei)
            alphaIOld = self.alpha[i].copy()
            alphaJOld = self.alpha[j].copy()
            if self.label[i] != self.label[j]:
                L = max(0,self.alpha[j]-self.alpha[i])
                H = min(self.C,self.C + self.alpha[j]-self.alpha[i])
            else:
                L = max(0,self.alpha[j]+self.alpha[i] - self.C)
                H = min(self.C,self.alpha[i]+self.alpha[j])
            if L == H:
                return 0
            eta = 2*self.K[i,j] - self.K[i,i] - self.K[j,j]
            if eta >= 0:
                return 0
            self.alpha[j] -= self.label[j]*(Ei-Ej)/eta
            self.alpha[j] = clipAlpha(self.alpha[j],H,L)
            self.updateEK(j)
            if abs(alphaJOld-self.alpha[j]) < 0.00001:
                return 0
            self.alpha[i] +=  self.label[i]*self.label[j]*(alphaJOld-self.alpha[j])
            self.updateEK(i)
            b1 = self.b - Ei - self.label[i] * self.K[i, i] * (self.alpha[i] - alphaIOld) - \
                 self.label[j] * self.K[i, j] * (self.alpha[j] - alphaJOld)
            b2 = self.b - Ej - self.label[i] * self.K[i, j] * (self.alpha[i] - alphaIOld) - \
                 self.label[j] * self.K[j, j] * (self.alpha[j] - alphaJOld)
            if 0<self.alpha[i] and self.alpha[i] < self.C:
                self.b = b1
            elif 0 < self.alpha[j] and self.alpha[j] < self.C:
                self.b = b2
            else:
                self.b = (b1 + b2) /2.0
            return 1
        else:
            return 0

    def smoP(self):
        iter = 0
        entrySet = True
        alphaPairChanged = 0
        while iter < self.maxIter and ((alphaPairChanged > 0) or (entrySet)):
            alphaPairChanged = 0
            if entrySet:
                for i in range(self.m):
                    alphaPairChanged+=self.innerL(i)
                iter += 1
            else:
                nonBounds = nonzero((self.alpha > 0)*(self.alpha < self.C))[0]
                for i in nonBounds:
                    alphaPairChanged+=self.innerL(i)
                iter+=1
            if entrySet:
                entrySet = False
            elif alphaPairChanged == 0:
                entrySet = True
        self.SVIndex = nonzero(self.alpha)[0]
        self.SV = self.x[self.SVIndex]
        self.SVAlpha = self.alpha[self.SVIndex]
        self.SVLabel = self.label[self.SVIndex]
        self.x = None
        self.K = None
        self.label = None
        self.alpha = None
        self.eCache = None
#   def K(self,i,j):
#       return self.x[i,:]*self.x[j,:].T
    def kernelTrans(self,x,z):
        if array(x).ndim != 1 or array(x).ndim != 1:
            raise Exception("input vector is not 1 dim")
        if self.kwargs['name'] == 'linear':
            return sum(x*z)
        elif self.kwargs['name'] == 'rbf':
            theta = self.kwargs['theta']
            return exp(sum((x-z)*(x-z))/(-1*theta**2))

    def calcw(self):
        for i in range(self.m):
            self.w += dot(self.alpha[i]*self.label[i],self.x[i,:])

    def predict(self,testData):
        test = array(testData)
        #return (test * self.w + self.b).getA()
        result = []
        m = shape(test)[0]
        for i in range(m):
            tmp = self.b
            for j in range(len(self.SVIndex)):
                tmp += self.SVAlpha[j] * self.SVLabel[j] * self.kernelTrans(self.SV[j],test[i,:])
            while tmp == 0:
                tmp = random.uniform(-1,1)
            if tmp > 0:
                tmp = 1
            else:
                tmp = -1
            result.append(tmp)
        return result
def plotBestfit(data,label,w,b):
    import matplotlib.pyplot as plt
    n = shape(data)[0]
    fig = plt.figure()
    ax = fig.add_subplot(111)
    x1 = []
    x2 = []
    y1 = []
    y2 = []
    for i in range(n):
        if int(label[i]) == 1:
            x1.append(data[i][0])
            y1.append(data[i][1])
        else:
            x2.append(data[i][0])
            y2.append(data[i][1])
    ax.scatter(x1,y1,s=10,c='red',marker='s')
    ax.scatter(x2,y2, s=10, c='green', marker='s')
    x = arange(-2,10,0.1)
    y = ((-b-w[0]*x)/w[1])
    plt.plot(x,y)
    plt.xlabel('X')
    plt.ylabel('y')
    plt.show()
def loadImage(dir,maps = None):
    dirList = listdir(dir)
    data = []
    label = []
    for file in dirList:
        label.append(file.split('_')[0])
        lines = open(dir +'/'+file).readlines()
        row = len(lines)
        col = len(lines[0].strip())
        line = []
        for i in range(row):
            for j in range(col):
                line.append(float(lines[i][j]))
        data.append(line)
        if maps != None:
            label[-1] = float(maps[label[-1]])
        else:
            label[-1] = float(label[-1])
    return array(data),array(label)

def main():
    '''
    data,label = loadDataSet('testSetRBF.txt')
    smo = PlattSMO(data,label,200,0.0001,10000,name = 'rbf',theta = 1.3)
    smo.smoP()
    smo.calcw()
    print smo.predict(data)
    '''
    maps = {'1':1.0,'9':-1.0}
    data,label = loadImage("digits/trainingDigits",maps)
    smo = PlattSMO(data, label, 200, 0.0001, 10000, name='rbf', theta=20)
    smo.smoP()
    print len(smo.SVIndex)
    test,testLabel = loadImage("digits/testDigits",maps)
    testResult = smo.predict(test)
    m = shape(test)[0]
    count  = 0.0
    for i in range(m):
        if testLabel[i] != testResult[i]:
            count += 1
    print "classfied error rate is:",count / m
    #smo.kernelTrans(data,smo.SV[0])

if __name__ == "__main__":
    sys.exit(main())

libsvm.py

import sys
from numpy import *
from svm import *
from os import listdir
from plattSMO import PlattSMO
import pickle
class LibSVM:

    def __init__(self,data=[],label=[],C=0,toler=0,maxIter=0,**kernelargs):
        self.classlabel = unique(label)
        self.classNum = len(self.classlabel)
        self.classfyNum = (self.classNum * (self.classNum-1))/2
        self.classfy = []
        self.dataSet={}
        self.kernelargs = kernelargs
        self.C = C
        self.toler = toler
        self.maxIter = maxIter
        m = shape(data)[0]
        for i in range(m):
            label[i]=int(label[i])
            if label[i] not in self.dataSet.keys():
                self.dataSet[int(label[i])] = []
                self.dataSet[int(label[i])].append(data[i][:])
            else:
                self.dataSet[int(label[i])].append(data[i][:])
    def train(self):
        num = self.classNum
        for i in range(num):
            for j in range(i+1,num):
                data = []
                label = [1.0]*shape(self.dataSet[self.classlabel[i]])[0]
                label.extend([-1.0]*shape(self.dataSet[self.classlabel[j]])[0])
                data.extend(self.dataSet[self.classlabel[i]])
                data.extend(self.dataSet[self.classlabel[j]])
                svm = PlattSMO(array(data),array(label),self.C,self.toler,self.maxIter,**self.kernelargs)
                svm.smoP()
                self.classfy.append(svm)
        self.dataSet = None
    def predict(self,data,label):
        m = shape(data)[0]
        num = self.classNum
        classlabel = []
        count = 0.0
        for n in range(m):
            result = [0] * num
            index = -1
            for i in range(num):
                for j in range(i + 1, num):
                    index += 1
                    s = self.classfy[index]
                    t = s.predict([data[n]])[0]
                    if t > 0.0:
                        result[i] +=1
                    else:
                        result[j] +=1
            #classlabel.append(self.classlabel[result.index(max(result))])
            
            resultmax=max(result)
            maxindex=result.index(resultmax)
            index1=[maxindex]
            for i in range(maxindex+1,5):
                if result[i]==resultmax:
                    index1.append(i)
            index2 = [0 for _ in range(len(index1))]
            if len(index1) > 1:
                
                for i in range(len(index1)):
                    for j in range(i+1,len(index1)):
                        if index1[i]==0:
                            s = self.classfy[index1[j-1]]
                        elif index1[i]==3:
                            s=self.classfy[9]
                        else:
                            s=self.classfy[2*index1[i]+index1[j]]
                        t = s.predict([data[n]])[0]
                        if t > 0.0:
                            index2[i]+=1
                        else:
                            index2[j]+=1
            classlabel.append(self.classlabel[index1[index2.index(max(index2))]])
                        
            if classlabel[-1] != label[n]:
                count +=1
                #print label[n],classlabel[n]
        #print classlabel
        countright=m-count
        print "right rate:",countright / m
        return classlabel
    def save(self,filename):
        fw = open(filename,'wb')
        pickle.dump(self,fw,2)
        fw.close()

    @staticmethod
    def load(filename):
        fr = open(filename,'rb')
        svm = pickle.load(fr)
        fr.close()
        return svm

def loadImage(dir,maps = None):
    dirList = listdir(dir)
    data = []
    label = []
    for file in dirList:
        label.append(file.split('_')[0])
        lines = open(dir +'/'+file).readlines()
        row = len(lines)
        col = len(lines[0].strip())
        line = []
        for i in range(row):
            for j in range(col):
                line.append(float(lines[i][j]))
        data.append(line)
        if maps != None:
            label[-1] = float(maps[label[-1]])
        else:
            label[-1] = float(label[-1])
    return data,label
def main():
    '''
    data,label = loadImage('trainingDigits')
    svm = LibSVM(data, label, 200, 0.0001, 10000, name='rbf', theta=20)
    svm.train()
    svm.save("svm.txt")
    '''
    svm = LibSVM.load("svm.txt")
    test,testlabel = loadImage('testDigits')
    svm.predict(test,testlabel)

if __name__ == "__main__":
    sys.exit(main())
  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值