【机器学习-3】SVM——人脸识别

最新推荐文章于 2024-05-01 16:26:56 发布

空则无心

最新推荐文章于 2024-05-01 16:26:56 发布

阅读量2.7k

点赞数 4

分类专栏：机器学习 Python

本文链接：https://blog.csdn.net/qq_26489165/article/details/80572185

版权

Python 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

机器学习

9 篇文章 0 订阅

订阅专栏

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split

1.导入 from _future_ import print_function

Python提供了__future__模块，把下一个新版本的特性导入到当前版本，于是我们就可以在当前版本中测试一些新版本的特性

参考：

https://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386820023084e5263fe54fde4e4e8616597058cc4ba1000

2.logging.basicConfig

logging.basicConfig(level=log_level,
                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
                    datefmt='%a, %d %b %Y %H:%M:%S',
                    filename='parser_result.log',
                    filemode='w')
复制代码
logging.basicConfig函数各参数:
filename: 指定日志文件名
filemode: 和file函数意义相同，指定日志文件的打开模式，'w'或'a'
format: 指定输出的格式和内容，format可以输出很多有用信息，如上例所示:
 %(levelno)s: 打印日志级别的数值
 %(levelname)s: 打印日志级别名称
 %(pathname)s: 打印当前执行程序的路径，其实就是sys.argv[0]
 %(filename)s: 打印当前执行程序名
 %(funcName)s: 打印日志的当前函数
 %(lineno)d: 打印日志的当前行号
 %(asctime)s: 打印日志的时间
 %(thread)d: 打印线程ID
 %(threadName)s: 打印线程名称
 %(process)d: 打印进程ID
 %(message)s: 打印日志信息
datefmt: 指定时间格式，同time.strftime()
level: 设置日志级别，默认为logging.WARNING
stream: 指定将日志的输出流，可以指定输出到sys.stderr,sys.stdout或者文件，默认输出到sys.stderr，当stream和filename同时指定时，stream被忽略

logging打印信息函数：

logging.debug('This is debug message')
logging.info('This is info message')
logging.warning('This is warning message')

参考：https://www.cnblogs.com/felixzh/p/6072417.html

3.fetch_lfw_people

fetch_lfw_people(data_home=None, funneled=True, resize=0.5, min_faces_per_person=0, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True)[源代码]¶

参考：http://lijiancheng0614.github.io/scikit-learn/modules/generated/sklearn.datasets.fetch_lfw_people.html

4.rain_test_split

train_test_split是交叉验证中常用的函数，功能是从样本中随机的按比例选取train data和testdata，形式为：

X_train,X_test, y_train, y_test =

cross_validation.train_test_split(train_data,train_target,test_size=0.4, random_state=0)

参数解释：

train_data：所要划分的样本特征集

train_target：所要划分的样本结果

test_size：样本占比，如果是整数的话就是样本的数量

random_state：是随机数的种子。

随机数种子：其实就是该组随机数的编号，在需要重复试验的时候，保证得到一组一样的随机数。比如你每次都填1，其他参数一样的情况下你得到的随机数组是一样的。但填0或不填，每次都会不一样。

随机数的产生取决于种子，随机数和种子之间的关系遵从以下两个规则：

种子不同，产生不同的随机数；种子相同，即使实例不同也产生相同的随机数。

5.plt.figure() &plt.subplot()

plt.figure() ：自定义画布大小

plt.subplot() ：设置画布划分以及图像在画布上输出的位置

参考：https://www.cnblogs.com/laoniubile/p/5893286.html

6.源程序：

from __future__ import  print_function
#从time模块导入time，因为有些步骤需要计时
from time import time
#打印出一些程序进展信息
import logging 
#绘图的包，即最后将我们预测出来的人脸打印出来
import matplotlib.pyplot as plt

from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.grid_search import GridSearchCV
from sklearn.metrics  import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn import svm

#打印输出日志信息
logging.basicConfig(level=logging.INFO,format='%(asctime)s%(message)s')
#下载数据集--户外脸部数据集lfw（Labeled Faces in the Wild）
#minfaces_per_person:int,可选默认无，提取的数据集仅保留包含min_faces_per_person不同图片的人的照片
#resize调整每张人脸图片的比例，默认是0.5
lfw_people = fetch_lfw_people(min_faces_per_person=70,resize=0.4)
#返回数据集有多少个实例，h是多少，w是多少
n_samples,h,w=lfw_people.images.shape
#X矩阵用来装特征向量，得到数据集的所有实例
#每一行是一个实例，每一列是个特征值
X=lfw_people.data
#X矩阵调用shape返回矩阵的行数和列数，
#X.shape[1]返回矩阵的列数，对应的特征向量的维度或者特征点多少
n_features=X.shape[1]
#获取特征结果集，提取每个实例对应的每个人脸
#y为classlabel目标分类标记，即不同人的身份
y=lfw_people.target
#数据集中有多少个人，以人名组成列表返回
target_names=lfw_people.target_names
#shape[0]就是多少行，多少个人，多少类
n_classes = target_names.shape[0]

print("Total dataset size:")#数据集中信息
print("n_samples:%d"% n_samples)#数据个数1288
print("n_features:%d"% n_features)#特征个数，维度1850
print("n_classes:%d"% n_classes)#结果集类别个数，即多少个人

#利用train_test_split拆分训练集合测试集
#test_size=0.25表示随机抽取25%的测试集
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25)

#采用PCA降维，原始数据的特征向量维度非常高，意味着训练模型的复杂度非常高
#保存的组件数目，也即保留下来的特征个数n
n_components =150

print("Exreacting the top %d eigenfaces from %faces"%(n_components,X_train.shape[0]))
#初始时间
t0=time()
#降维
pca =PCA(n_components = n_components,whiten= True).fit(X_train)
print("pca done in %0.3fs"%(time()-t0))
#从人脸中提取特征点，对于人脸的一张照片提取的特征值名为eigenfaces
eigenfaces = pca.components_.reshape((n_components,h,w))

print("projecting the input data on the eigenfaces orthonormal basis")
t0=time()
#把训练集特征向量转化为更低维的矩阵
X_train_pca = pca.transform(X_train)
#把测试集的特征向量转化为更低维的矩阵
X_test_pca =pca.transform(X_test)
print("done in %0.3fs"%(time()-t0))

#训练一个支持向量机的分类model——构造分类器
print("Fitting the classifier to the training set")
t0=time()

#c是一个对错误德部分的惩罚
#gamma的参数对不同核函数有不同的表现，gamma表示使用多少比例的特征点
#使用不同的c和不同值的gamma，进行多个量的尝试，然后进行搜索，选出准确率最高模型
param_grid = {
    'C':[1e3,5e3,1e4,5e4,1e5],
    'gamma':[0.0001,0.0005,0.001,0.005,0.01,0.1]
    }
#调用SVM进行分类搜索哪对组合产生最好的归类精确度
#ernel：rbf高斯径向基核函数   class_weight权重
#把所有我们所列参数的组合都放在SVC里面进行计算，最后看出哪一组函数的表现度最好
clf=GridSearchCV(svm.SVC(kernel='rbf',class_weight='balanced'),param_grid=param_grid)
clf=clf.fit(X_train_pca,y_train)
print("fit done in %0.3fs"%(time()-t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)

##################进行评估准确率计算######################
print("Predicting people's names on the test set")
t0=time()
#预测新的分类
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs"%(time()-t0))
#通过classification_report方法进行查看，可以得到预测结果中哪些是正确
print(classification_report(y_test,y_pred,target_names=target_names))
#confusion_matrix是建一个n*n的方格，横行和纵行分别表示真实的每一组测试的标记和测试集标记的差别
#对角线表示的是正确的值，对角线数字越多表示准确率越高
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))

#将测试标记过进行展示，即先弄一个通用的图片可视化函数：
def plot_gallery(images,titles,h,w,n_row=3,n_col=4):
    """Helper function to plot a gallery of portraits"""
    #建立图作为背景
    #自定义画布大小
    plt.figure(figsize=(1.8*n_col,2.4*n_row))
    #位置调整
    plt.subplots_adjust(bottom=0,left=.01,right=.99,top=.90,hspace=.35)
    for i in range(n_row*n_col):
        #设置画布划分以及图像在画布上输出的位置
        plt.subplot(n_row,n_col,i+1)
        #在轴上显示图片
        plt.imshow(images[i].reshape((h,w)),cmap=plt.cm.gray)
        #整个画板的标题
        plt.title(titles[i],size=12)
        #获取或设置x、y轴的当前刻度位置和标签
        plt.xticks(())
        plt.yticks(())
        
#预测函数归类标签和实际归类标签打印      
#返回预测人脸姓和测试人脸姓的对比title
def title(y_pred,y_test,target_names,i):
    
    #rsplit（' ',1）从右边开始以右边第一个空格为界，分成两个字符
    #组成一个list
    #此处代表把'姓'和'名'分开，然后把后面的姓提出来
    #末尾加[-1]代表引用分割后的列表最后一个元素
    pred_name=target_names[y_pred[i]].rsplit(' ',1)[-1]
    true_name=target_names[y_test[i]].rsplit(' ',1)[-1]
    return 'predicted:%s\ntrue:  %s'%(pred_name,true_name)

#预测出的人名
prediction_titles=[title(y_pred,y_test,target_names,i)
                   for i in range(y_pred.shape[0])]
#测试集的特征向量矩阵和要预测的人名打印
plot_gallery(X_test, prediction_titles, h, w)
#打印原图和预测的信息
eigenface_titles =["eigenface %d" % i for i in range (eigenfaces.shape[0])]
#调用plot_gallery函数打印出实际是谁，预测的谁，以及提取过特征的脸
plot_gallery(eigenfaces, eigenface_titles, h, w)

plt.show()

7.结果：

2018-06-04 19:52:25,306Loading LFW people faces from C:\Users\Administer\scikit_learn_data\lfw_home
Total dataset size:
n_samples:1288
n_features:1850
n_classes:7
Exreacting the top 150 eigenfaces from 966.000000aces
pca done in 0.332s
projecting the input data on the eigenfaces orthonormal basis
done in 0.294s
Fitting the classifier to the training set
fit done in 13.494s
Best estimator found by grid search:
SVC(C=1000.0, cache_size=200, class_weight='balanced', coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
Predicting people's names on the test set
done in 0.038s
                   precision    recall  f1-score   support

     Ariel Sharon       0.80      0.84      0.82        19
     Colin Powell       0.77      0.91      0.83        55
  Donald Rumsfeld       0.88      0.79      0.84        29
    George W Bush       0.87      0.89      0.88       138
Gerhard Schroeder       0.88      0.79      0.84        29
      Hugo Chavez       0.90      0.64      0.75        14
       Tony Blair       0.82      0.74      0.78        38

      avg / total       0.85      0.84      0.84       322

[[ 16   2   1   0   0   0   0]
 [  0  50   0   5   0   0   0]
 [  1   2  23   3   0   0   0]
 [  2   6   1 123   1   1   4]
 [  0   2   0   4  23   0   0]
 [  0   2   0   1   0   9   2]
 [  1   1   1   5   2   0  28]]