摘要
Knowledge distillation (KD 知识蒸馏) is a successful approach(方法) for deep neural network(深度神经网络) acceleration(加速), with which a compact network(紧凑网络) (student) is trained by mimicking the softmax output of a pre-trained(预训练) high-capacity(高容量) network (teacher). In tradition(传统上), KD usually relies on(依赖) access to(获取) the training samples(训练样本) and the parameters of the white-box teacher(白盒教师的参数) to acquire the transferred knowledge(获取迁移知识). However, these prerequisites are not always realistic due to(由于) storage costs(存储成本) or privacy issues(隐私问题) in real-world applications(实际应用程序). Here we propose the concept of decision-based black-box(基于决策黑箱) (DB3) knowledge distillation, with which the student is trained by distilling