Fisher准则线性分类器的Python实现

最新推荐文章于 2024-04-24 13:35:29 发布

浮舟

最新推荐文章于 2024-04-24 13:35:29 发布

阅读量9k

点赞数 3

分类专栏：模式识别文章标签： python Fisher分类器

本文链接：https://blog.csdn.net/fzch_struggling/article/details/45011121

版权

模式识别专栏收录该内容

3 篇文章 1 订阅

订阅专栏

Fisher准则线性分类器的Python实现

Fisher准则线性分类器的Python实现

本节内容：本节内容是根据上学期所上的模式识别课程的作业整理而来，第二道题目是线性分类器设计，数据集是Iris(鸢尾花的数据集)，根据前一题的Kmeans聚类得出的结果，分成训练集与测试集，进行比较。

选取的训练集与测试集

训练集：(选取上一题中的第一种结果:每一类大小都是l=0.67*len(dataset[i])，前l个数据)
第一类（33个）：[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
第二类（25个）：[52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112, 115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131],
第三类（41个）：[50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92]]
测试集
第一类(17个)：[[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
第二类(13个）：[132, 134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148],
第三类(21个)： [93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114, 119, 121, 123, 126, 127, 133, 138, 142, 146, 149]]

分类决策与分类器

分类决策：一对一
分类器：Fisher线性分类器
$g_{12}(x)=W_{12}^Tx-W_{012}, g_{13}(x)=W_{13}^Tx-W_{013}, g_{23}(x)=W_{23}^Tx-W_{023}, g_{21}(x)= -g_{12}(x)，g_{31}(x)= -g_{13}(x)，g_{32}(x)= -g_{23}(x)$
相关参数
类间离散度矩阵： $S_{12}=(m_1-m_2)(m_1-m_2)^T$
类内离散度矩阵： $S_w=\sum_{x_i \in \omega_i}(x_i-m_1)(x_i-m_1)^T$
最优投影方向： $W^*=S_w^{-1}(m_1-m_2)$
判别函数
第一类：g12(x)>0 and g13(x)>0
第二类：g12(x)<0 and g23(x)>0
第三类：g13(x)<0 and g23(x)<0
拒识情况：其他
在本题中，以上分类器分别为：

g12(x) =(W12.T)*(X.T)-W012，W12=la.inv(sw12)*((u01-u02).T)
W012=(l1*(W12.T)*(u01.T)+l2*(W12.T)*(u02.T))/(l1+l2)
g13(x) =(W13.T)*(X.T)-W013，  W13=la.inv(sw13)*((u01-u03).T)
W013=(l1*(W13.T)*(u01.T)+l3*(W13.T)*(u03.T))/(l1+l3)
g23(x)= (W23.T)*(X.T)-W023，W23=la.inv(sw23)*(u02-u03).T
W023=(l2*(W23.T)*(u02.T)+l3*(W23.T)*(u03.T))/(l3+l2)

代码

# coding=gbk
#python edition: Python3.4.1,2014,10,17
import numpy as np
from numpy import linalg as la

def read_points():
    dataset=[]
    with open('Iris.txt','r') as file:
        for line in file:
            if line =='\n':
                continue
            dataset.append(list(map(float,line.split(' '))))
        file.close()
        return  dataset

def generate_traineddata():
    arr=[[] for i in range(3)]
    with open('setbase.txt','r') as file:
        index=0
        for line in file:
            if line=='\n' :
                continue
            elif line[0]=='C':
                index=int(line[-2])-1
                continue
            arr[index].append(int(line))
        file.close()
    train=[[] for i in range(3)]
    test=[[] for i in range(3)]
    for i in range(len(arr)):
        tr=int(0.67* len(arr[i]))
        train[i]=arr[i][:tr]
        test[i]=arr[i][tr:]
    f1=open('trained.txt','w')
    f2=open('tested.txt','w')
    print(train,end='\n')
    print(test,end='\n')
    for i in range(3):
        for j in train[i]:
            f1.write("%d\n"%j)
        f1.write('\n')
        for k in test[i]:
            f2.write("%d\n"%k)
        f2.write('\n')
    f1.close()
    f2.close()
    return train,test

def createMatrix(train,test,dataset):
    trainmat=[[] for i in range(3)]
    testmat=[[] for i in range(3)]
    for i in range(3):
        for j in train[i]:
            trainmat[i].append(dataset[j])
        for k in test[i]:
            testmat[i].append(dataset[k])
    return   trainmat,testmat

def classify(trainmat,testmat,test):
    #求三类训练集的均值向量
    tr1,tr2,tr3=np.mat(trainmat[0]),np.mat(trainmat[1]),np.mat(trainmat[2])
    te=[[] for i in range(3)]
    te[0],te[1],te[2]=np.mat(testmat[0]),np.mat(testmat[1]),np.mat(testmat[2])
    u01=np.mean(tr1,axis=0)
    u02=np.mean(tr2,axis=0)
    u03=np.mean(tr3,axis=0)
    #获得矩阵长度
    l1,l2,l3=len(trainmat[0]),len(trainmat[1]),len(trainmat[2])
    #求三类训练集的类内离散度矩阵
    s1,s2,s3=0,0,0
    for i in range(l1):
        s1=s1+(tr1[i]-u01).T*(tr1[i]-u01)
    for i in range(l2):
        s2=s2+ (tr2[i]-u02).T*(tr2[i]-u02)
    for i in range(l3):
        s3=s3+ (tr3[i]-u03).T*(tr3[i]-u03)
    #总类内离散度矩阵
    sw12,sw13,sw23=s1+s2,s1+s3, s2+s3
    #求向量W*与边界
    W12=la.inv(sw12)*((u01-u02).T)
    W012=(l1*(W12.T)*(u01.T)+l2*(W12.T)*(u02.T))/(l1+l2)
    W13=la.inv(sw13)*((u01-u03).T)
    W013=(l1*(W13.T)*(u01.T)+l3*(W13.T)*(u03.T))/(l1+l3)
    W23=la.inv(sw23)*(u02-u03).T
    W023=(l2*(W23.T)*(u02.T)+l3*(W23.T)*(u03.T))/(l3+l2)
    result=[[] for i in range(4)]
    for i in range(3):
        testset=te[i]
        count=0
        for X in testset:
            if ((W12.T)*(X.T)-W012>0) and   ((W13.T)*(X.T)-W013>0):
                result[0].append(test[i][count])
            elif  ((W12.T)*(X.T)-W012<0) and   ((W23.T)*(X.T)-W023>0):
                result[1].append(test[i][count])
            elif    ((W13.T)*(X.T)-W013<0) and   ((W23.T)*(X.T)-W023<0):
                result[2].append(test[i][count])
            else:
                result[3].append(test[i][count])
            count=count+1
    str1='类内离散度矩阵：'
    str2='投影方向：'
    str3='边界点：'
    str4='Fisher得出的分类结果：'
    fisher=open('fisher.txt','w')
    print(str1,'s1:',end='\n',file=fisher)
    print(s1,end='\n',file=fisher)
    print(str1,'s2:',end='\n',file=fisher)
    print(s2,end='\n',file=fisher)
    print(str1,'s3:',end='\n',file=fisher)
    print(s3,end='\n',file=fisher)
    print(str2,'\n','W12:\n',W12,end='\n',file=fisher)
    print('W13:\n',W13,end='\n',file=fisher)
    print('W23:\n',W23,end='\n',file=fisher)
    print(str3,'\n','W012:',W012,end='\n' ,file=fisher)
    print('W013:',W013,end='\n' ,file=fisher)
    print('W023:',W023,end='\n' ,file=fisher)
    print(str4,end='\n' ,file=fisher)
    for i in range(4):
        print('第%d类'%(i+1),result[i],end='\n' ,file=fisher)
    fisher.close()
    return    result


def main():
    dataset=read_points()
    train,test= generate_traineddata()
    trainmat,testmat=createMatrix(train,test,dataset)
    result= classify(trainmat,testmat,test)
    print(result)

if __name__=='__main__':
    main()

测试集上的结果

实验中测得的参数：
类内离散度矩阵： s1:
[[ 4.23636364 3.10090909 0.50545455 0.50545455]
[ 3.10090909 4.10060606 0.02030303 0.47363636]
[ 0.50545455 0.02030303 1.01515152 0.16181818]
[ 0.50545455 0.47363636 0.16181818 0.34181818]]
类内离散度矩阵： s2:
[[ 6.12 0.948 5.132 0.046 ]
[ 0.948 2.5144 0.4916 0.6928]
[ 5.132 0.4916 7.0024 1.2592]
[ 0.046 0.6928 1.2592 1.6136]]
类内离散度矩阵： s3:
[[ 9.60097561 3.05414634 5.14512195 1.73707317]
[ 3.05414634 4.27512195 2.67926829 1.69756098]
[ 5.14512195 2.67926829 7.34439024 2.53463415]
[ 1.73707317 1.69756098 2.53463415 1.56878049]]
投影方向：
W12: [[ 0.00467303], [ 0.21663412], [-0.43497883], [-0.71989691]]T
W13: [[-0.01511481], [ 0.3200074 ], [-0.25327497], [-0.55700484]]T
W23: [[ 0.02024091], [-0.05645396], [ 0.0580067 ], [ 0.17693648]]T
边界点：
W012: [[-1.44922042]], W013: [[-0.34557374]], W023: [[ 0.53145747]]
分类结果（无拒识情况，错误的为斜黑体）
第1类 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
第2类 [132, 134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148, 101, 113, 114, 119, 121, 123, 126, 127, 138, 142, 146, 149]
本身为第三类，却分到了第二类
第3类 [93, 94, 95, 96, 97, 98, 99, 106, 133]
正确率
39/51*100%=76.47%

浮舟

关注

3
点赞
踩
42

收藏

觉得还不错? 一键收藏
2
评论
Fisher准则线性分类器的Python实现

Fisher准则线性分类器的Python实现Fisher准则线性分类器的Python实现选取的训练集与测试集分类决策与分类器代码测试集上的结果本节内容：本节内容是根据上学期所上的模式识别课程的作业整理而来，第二道题目是线性分类器设计，数据集是Iris(鸢尾花的数据集)，根据前一题的Kmeans聚类得出的结果，分成训练集与测试集，进行比较。选取的训练集与测试集训练集：(选取上一题中的
复制链接

扫一扫