Keras/Tensorflow多输入Siamese(孪生网络)图片/人脸识别Python

最新推荐文章于 2024-11-04 10:51:20 发布

Fu_Connor

最新推荐文章于 2024-11-04 10:51:20 发布

阅读量3.1k

点赞数 6

文章标签： python 深度学习神经网络图像识别人脸识别

本文链接：https://blog.csdn.net/qq_42686550/article/details/117124480

版权

本文介绍了使用Siamese网络进行人脸识别的实践过程，包括为何不适用多分类方法，Siamese网络的工作原理，以及Keras官方案例存在的问题。通过改进数据处理方式，如从硬盘分批读取图片，避免全内存加载，作者构建了一个双生成器以平衡样本。使用ResNet预训练模型加速收敛，训练并验证了模型的有效性。最后，展示了模型在未知人脸上的表现及同人脸与不同人脸的得分差距。

摘要由CSDN通过智能技术生成

最近学习keras的孪生网络，官方提供了一个案例,由于官方案例的GitHub已经404，只能从其他地方获取这个案例，本文也是基于这个案例进行大量修改，使之更加接近现实生活中的情况，本文的完整代码可在我个人GitHub上查看：
https://github.com/Connor666/Face_Siamese_network

为什么使用Siamese网络

首先，图片的匹配并不是一个多分类问题

因为多分类的本质是softmax，也就是说，输入一张图片，会计算各个类别的possibility，然后根据maximum likelyhood输出可能性最大的那一类作为结果

但这样并不符合我们图片匹配的逻辑，因为匹配不一定是这几类中的某一个，而是应该是与每张图片的距离，或者说相似度

因此我们才使用Siamese孪生网络作为model

什么是Siamese网络

Siamese网络由一个共享权重的网络层进行特征压缩与检测

输入数据为两张图片，经过网络后的最终的特征数据用来计算其欧式距离

根据对比损失（ Contrastive loss），制定损失函数进行梯度下降

网络整体并不复杂，关键是对比损失

在这里插入图片描述
核心公式：对比损失，其中d是预测出的欧氏距离，y是label（1是相同图片，0是不同图片）

可以看出当𝒚=𝟏时,也就是相同图片的时候，右边项为0，distance越小loss越小，符合逻辑

当𝒚=𝟎时，图片不同的时候，左项为0，distance越大,loss越小

简而言之，对比损失就是希望相同图片时候，距离越小loss越小，不同图片时候，距离越大loss越小
在这里插入图片描述

Keras官方案例的问题

首先对于识别系统，keras这个官方案例是不符合逻辑的，在现实生活中，你的模型应该是训练好放上去，而不是在终端训练，以人脸为例，因此训练集应该是大量人脸，测试集是其他人脸，而很多网上教程都是把所有类的图片拆出来一部分作为测试集，其他为训练集，那这样训练效果肯定高，因为所有类都被训练了，而实际上应该是一部分类训练，一部分类做测试。
除此之外，网上的案例还有一个很严重的问题，就是图片都是从内存里读取的，这样是不合适的，因为实际上大量图片应该是从硬盘分批读进去，而不是存在内存里，一是非空间，二是当数据集很大时候内存无法存下.
基于这两个逻辑，开始我们的项目

图片处理

图片是来自ORL数据集，ORL数据集并不复杂，大多数网络都可以训练到90%以上的准确率，我们目的不单单是追求准确率，更是一个学习和实现目的的过程，我们选择这组数据集进行网络训练：

数据本身是pgm格式，这个格式本身不影响训练，也不影响读取数据，但是他最大的问题是，在WIN端目录下无法预览，这就有点难受了，因此我们将图片转换为png格式方便我们直接从目录中预览

train_dir= 'data/faces_siamese/Training'
test_dir='data/faces_siamese/Testing'

orig_train='data/faces/Training'
orig_test='data/faces/Testing'


def pgmtopng(originalpath,path):
    if not os.path.exists(path):
        os.mkdir(path)
    for i in os.listdir(originalpath):
        #建立训练集与测试集文件夹
        if not os.path.exists(path+'/'+i):
            os.mkdir(path+'/'+i)
            
        #将每个类里面的图片变为png到指定目录    
        for j in os.listdir(originalpath+'/'+i):
            imgpath=originalpath+'/'+i+'/'+j
            img=cv.imread(imgpath)
            imgname=j.split('.')[0]
            savepath=path+'/'+i+'/'+imgname+'.png'
            cv.imwrite(savepath,img)
            
pgmtopng(orig_train,train_dir)
pgmtopng(orig_test,test_dir)

图片被转换为png格式，图片被简单分为了训练集(前37类)与测试集（后3类），没有验证集，理论上是需要的，这里图片数据不多就被我简化了，有兴趣的可以自行尝试
在这里插入图片描述

从硬盘分批读取数据

载入数据的源代码是基于keras官方Siamese例子，但是个人感觉写的也不好，并没有将所有组合全部遍历，但不影响结果，我们姑且按这个格式进行修改，最终我们可以使用yield来实现从硬盘分批读取图片，以实现未来内存无法储存大量图片所带来的问题。

为什么不适用ImageDataGenerator 类方法中的flow_from_directory？
keras官方自带功能包含了从硬盘分批读取图片，为什么要我们自己撰写呢.Keras官方自带案例的确可以实现单张从内存里读取，但是这是一个单张的generator无法直接fed到孪生网络模型中，如果将两个生成器拼合成一个，原理上是可行的，但最大问题是官方自带的生成器是随机的，那么随机的拼合最大的问题是产生不均衡的样本，因为非同人脸的图片比例一定是很高的，这对于我们的学习是非常不利的。若将keras的学习器固定为非随机，那么产生的图片又是连续的，数量也十分有限，加之我们数据量本身就不大因此也不适合

train_dir= 'data/faces_siamese/Training'
test_dir='data/faces_siamese/Testing'

def read_image(imageName):
    im = cv.imread(imageName)
    im=cv.resize(im,(64,64))/255.
    data = np.array(im)
    return data.astype('float32')

def loadData(datadir):
    images_path=[]
    labels=[]
    count=0
    for i in os.listdir(datadir):
        count+=1
        for fn in os.listdir(os.path.join(datadir, str(i))):
            if fn.endswith('.png'):
                fd = os.path.join(datadir, str(i), fn)
                images_path.append(fd)
                labels.append(str(i))
    return images_path,np.array(labels),count


def doubleGenerator(x, digit_indices,num_classes,batch_size):
    '''Positive and negative pair creation.
    Alternates between positive and negative pairs.
    '''
    img1list = []
    img2list = []
    labels = []
    count=0
    n = min([len(digit_indices[d]) for d in range(num_classes)]) - 1
    while True:
        for d in range(num_classes):
            for i in range(n):
                z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
                img1,img2=read_image(x[z1]),read_image(x[z2])
                img1list.append(img1)
                img2list.append(img2)
                inc = random.randrange(1, num_classes)
                dn = (d + inc) % num_classes
                z1, z2 = digit_indices[d][i], digit_indices[dn][i]
                img1,img2=read_image(x[z1]),read_image(x[z2])
                img1list.append(img1)
                img2list.append(img2)
                labels += [1., 0.]#每次保持一个正例和一个反例
                count+=1
                if count==batch_size:#满足指定batch size后yield
                    count=0
                    yield [np.array(img1list),np.array(img2list)], np.array(labels)
                    img1list = []
                    img2list = []
                    labels = []

#读取每张图片的路径，label和类别总数
x_train,y_train,train_classes=loadData(train_dir)
x_test,y_test,test_classes=loadData(test_dir)

#获取每张图片label对应的index
digit_indices_train = [np.where(y_train == i)[0] for i in os.listdir(train_dir)]
digit_indices_test = [np.where(y_test == i)[0] for i in os.listdir(test_dir)]

这样我们的generator就写好了

构建Siamese网络

我们的训练集并不多，有37类共370张图片，因此采用预训练模型（Resnet）构建，可以极大的加快收敛，能在较少的数据集上取得优秀的结果。虽然不使用预训练模型也是可以收敛到一个不错的结果，但是我们依然采用预训练模型

def euclidean_distance(vects):
    x, y = vects
    sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
    return K.sqrt(K.maximum(sum_square, K.epsilon()))

def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)

def contrastive_loss(y_true, y_pred):
    '''Contrastive loss from Hadsell-et-al.'06
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
    '''
    margin = 1.
    sqaure_pred = K.square(y_pred)
    margin_square = K.square(K.maximum(margin - y_pred, 0))
    return K.mean(y_true * sqaure_pred + (1 - y_true) * margin_square)


def base_net(input_tensor_shape):
    
    '''Base network to be shared (eq. to feature extraction).
    '''
    input=Input(input_tensor_shape)
    conv_base = ResNet50(weights='imagenet',
                         include_top=False)
    conv_base.trainable=False
    net=conv_base(input)
    net=layers.Flatten()(net)
    #net=layers.Dropout(0.1)(net)
    net=layers.Dense(512, activation='relu')(net)
    return Model(input, net)


def accuracy(y_true, y_pred): # Tensor上的操作
    '''Compute classification accuracy with a fixed threshold on distances.
    '''
    return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))

def compute_accuracy(y_true, y_pred): # numpy上的操作
    '''Compute classification accuracy with a fixed threshold on distances.
    '''
    pred = y_pred.ravel() < 0.5
    return np.mean(pred == y_true)

这里我要解释一下这个accuracy，这里模型预测的输出并不是label而是距离，因此距离越小越认为他是一类，因此小于0.5欧式距离的被认为是一类，这里的accuracy不影响训练，影响训练的是Loss function

input_a=Input(shape=(64,64,3))
input_b=Input(shape=(64,64,3))

base_network=base_net((64,64,3))
processed_a=base_network(input_a)
processed_b=base_network(input_b)

distance = Lambda(euclidean_distance,output_shape=eucl_dist_output_shape)([processed_a, processed_b])
model = Model([input_a, input_b], distance)
model.summary()

非常简单的Siamese网络，但是性能很强
在这里插入图片描述

callbacks_list=[ReduceLROnPlateau(monitor='val_loss',factor=0.1, patience=10,verbose=1),
                ModelCheckpoint(filepath='model_save.h5',monitor='val_loss',save_best_only=True)
                ]

model.compile(optimizer=Adam(lr=0.001),loss=contrastive_loss,metrics=[accuracy])
history = model.fit(
      doubleGenerator(x_train, digit_indices_train,train_classes,8),
    steps_per_epoch=20,
    validation_data=doubleGenerator(x_test, digit_indices_test,test_classes,8),
    validation_steps=4,
    batch_size=64,
    callbacks=callbacks_list,
      epochs=40)

开启训练后发现，在很早的时候测试集就取得了很高的accuracy，很有可能是采用了预训练模型的结果，这里的测试集损失低于训练集，是因为keras每次的训练损失是平均数而测试集是最后一次的结果
在这里插入图片描述

验证模型

在未知人脸的情况下，也取得了95%的成绩，可以再继续优化模型和匹配逻辑，目前进行到此，给大家留下一些优化的空间

test_loss, test_acc = model.evaluate_generator(doubleGenerator(x_test, digit_indices_test,test_classes,16), steps=2)
print('test acc:', test_acc)

根据直方图发现，同人脸和不同人脸的分数拉开了差距，模型有效
在这里插入图片描述

count=0
NIRA=[]
NGRA=[]
for i in doubleGenerator(x_test, digit_indices_test,test_classes,16):
    for j in range(len(i[0][0])):
        img1=i[0][0][j].reshape(1,64,64,3)
        img2=i[0][1][j].reshape(1,64,64,3)
        result=model.predict([img1,img2])
        if i[1][j]==1:
            NGRA.append(result[0][0])
        else:
            NIRA.append(result[0][0])
    count+=1
    if count==2:
        break

import matplotlib.pyplot as plt
import numpy


bins = numpy.linspace(0,2, 100)
plt.hist([NIRA,NGRA], bins,label=['Between_classes','Within_classes'])
plt.legend(loc='upper right')
plt.show()