基于keras的知识蒸馏(Knowledge Distillation)-分类与回归

最新推荐文章于 2025-04-13 17:59:05 发布

炸鸡配啤酒

最新推荐文章于 2025-04-13 17:59:05 发布

阅读量1.3k

点赞数 2

文章标签： keras 分类人工智能

本文链接：https://blog.csdn.net/qq_40037127/article/details/129778897

版权

本文介绍了知识蒸馏的概念，即通过训练小型学生模型来复制大型教师模型的行为。在TensorFlow和Keras中，通过定制Distiller类，利用损失函数和温度参数来软化概率分布，从而将教师模型的知识转移给学生模型。实验中，教师和学生模型均为卷积神经网络，在MNIST数据集上进行训练和评估，展示了知识蒸馏能提高学生模型的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Knowledge Distillation

Introduction to Knowledge Distillation

知识提取是一种模型压缩过程，其中对小（学生）模型进行训练，以匹配预先训练的大（教师）模型。通过最小化损失函数，将知识从教师模型转移到学生身上，目的是匹配软化的教师逻辑和基本事实标签。

通过在softmax中应用“温度”标度函数来软化logits，有效地平滑了概率分布，并揭示了教师学习到的课堂间关系。

Hinton et al. (2015)

导入基础库

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

构造Distiller类

自定义Distiller()类覆盖Model方法train_step、test_step和compile()。为使用蒸馏器，我们需要：

训练有素的教师模型
要训练的学生模型
关于学生预测和基本事实之间差异的学生损失函数
关于学生软预测和教师软标签之间差异的蒸馏损失函数以及温度
衡量学生体重和蒸馏损失的阿尔法因素
针对学生的优化器和（可选）评估绩效的指标

在train_step方法中，我们执行教师和学生的前向传递，分别通过α和1-alpha对student_loss和distraction_loss进行加权来计算损失，并执行后向传递。注意：只有学生权重会更新，因此我们只计算学生权重的梯度。

在test_step方法中，我们在提供的数据集上评估学生模型。

class Distiller(keras.Model):
    def __init__(self, student, teacher):
        super().__init__()
        self.teacher = teacher
        self.student = student

    def compile(
        self,
        optimizer,
        metrics,
        student_loss_fn,
        distillation_loss_fn,
        alpha=0.1,
        temperature=3,
    ):
        """ Configure the distiller.

        Args:
            optimizer: Keras optimizer for the student weights
            metrics: Keras metrics for evaluation
            student_loss_fn: Loss function of difference between