NeurIPS2023蒸馏文章摘要集锦与随记

最新推荐文章于 2025-02-07 21:14:35 发布

m0_46092647

最新推荐文章于 2025-02-07 21:14:35 发布

阅读量885

点赞数 12

文章标签： python 机器学习数据挖掘语音识别人工智能计算机视觉

本文链接：https://blog.csdn.net/m0_46092647/article/details/140590424

版权

作者：ygz
日期：20240721
地点：hby
言：将自己的哲学贯彻到底。私以为，人生当作一书。

搜集方法：

https://neurips.cc/virtual/2023/papers.html?filter=titles&search=distill
直接进行搜索就能知道使用蒸馏的文章，由于是关键字搜索，遗漏是正常的。

论文

Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement

摘要：
Humans use abstract concepts for understanding instead of hard features. Recent
interpretability research has focused on human-centered concept explanations of
neural networks. Concept Activation Vectors (CAVs) estimate a model’s sensitivity
and possible biases to a given concept. In this paper, we extend CAVs from posthoc analysis to ante-hoc training in order to reduce model bias through fine-tuning
using an additional Concept Loss. Concepts were defined on the final layer of the
network in the past. We generalize it to intermediate layers using class prototypes.
This facilitates class learing in the last convolution layer which is known to be most
informative. We also introduce Concept Distillation t create richer concepts using
a pre-trained knowledgeable model as the teacher. Our method can sensitize or
desensitize a model towards concepts. We show applications of concept-sensitive
training to debias several classification problems. We also use concepts to induce
prior knowledge into IID, a reconstruction problem. Concept-sensitive training can
improve model interpretability, reduce biases, and induce prior knowledge. Please
visit https://avani17101.github.io/Concept-Distilllation/ for code and more details.

评：使用了一个概念损失来减少模型的偏差。使用一个概念蒸馏的方式来蒸馏模型。

RD-Suite: A Benchmark for Ranking Distillation
地址：https://proceedings.neurips.cc/paper_files/paper/2023/file/701eba0f98c6f28ffee0de5969d8d034-PaperDatasets_and_Benchmarks.pdf
摘要：
The distillation of ranking models has become an important topic in both academia
and industry. In recent years, several advanced methods have been proposed to
tackle this problem, often leveraging ranking information from teacher rankers that
is absent in traditional classification settings. To date, there is no well-established
consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field. This paper first examines representative prior arts on
ranking distillation, and raises three questions to be answered around methodology
and reproducibility. To that end, we propose a systematic and unified benchmark,
Ranking Distillation Suite (RD-Suite), which is a suite of tasks with 4 large realworld datasets, encompassing two major modalities (textual and numeric) and
two applications (standard distillation and distillation transfer). RD-Suite consists of benchmark results that challenge some of the common wisdom in the
field, and the release of datasets with teacher scores and evaluation scripts for
future research. RD-Suite paves the way towards better understanding of ranking
distillation, facilities more research in this direction, and presents new challenges.

评：排序模型的蒸馏的基准测试没有一个统一的标准。不同的研究者使用不同的benchmark去评估自己的模型。这使得可比性减弱，对本领域的前沿的评估变得很难评价。本文提出了一个统一的标准评估。【ps：标准不标准不知道，但是确实是一个评估工具这是确切的。没有免费的午餐。】

Module-wise Adaptive Distillation for Multimodality Foundation Models

https://proceedings.neurips.cc/paper_files/paper/2023/file/dc9544b26ad3579477e567588db18cfc-Paper-Conference.pdf

摘要：
Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective
approach to reducing their sizes is layerwise distillation, wherein small student
models are trained to match the hidden representations of large teacher models
at each layer. Motivated by our observation that certain architecture components,
referred to as modules, contribute more significantly to the student’s performance
than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a
greater contribution to distill more frequently. Such an approach can be naturally
formulated as a multi-armed bandit (MAB) problem, where modules and loss
decrements are considered as arms and rewards, respectively. We then develop
a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically,
we leverage the observed contributions in recent history to estimate the changing
contribution of each module and select modules based on these estimations to
maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA
through distillation experiments on various multimodal understanding and image
captioning tasks, using the CoCa-Large model [48] as the teacher model

评：本文将模型的蒸馏，精细到了一个子模块的蒸馏【自定义的】，使用一个optima算法来选择那个子模块更应该用作蒸馏。【tips：建议做一个自适应蒸馏。对每个自模块增加一个标记，若是我这部分的更新是无效的则我更新别的部分，选择自合适的自己O【1】】

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

https://proceedings.neurips.cc/paper_files/paper/2023/file/b6404bf461c3c3186bdf5f55756af908-Paper-Conference.pdf

摘要：
In this paper, we introduce self-distillation and online clustering for self-supervised
speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for
speech. DinoSR first extracts contextualized embeddings from the input audio
with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses
previous state-of-the-art performance in several downstream tasks, and provide a
detailed analysis of the model and the learned discrete units. Code available at
https://github.com/Alexander-H-Liu/dinosr.、

评：一个语音的模型的蒸馏，从大模型去活得语音的向量表示，然后使用在线集群来获得一个声音的仓库，使用这个处理后的信息来指导学生模型训练。【论文没看，摘要描述的泰国抽象】

DFRD: Data-Free Robustness Distillation for Heterogeneous Federated Learning

https://proceedings.neurips.cc/paper_files/paper/2023/file/39ca8893ea38905a9d2ffe786e85af0f-Paper-Conference.pdf

摘要：
Federated Learning (FL) is a privacy-constrained decentralized machine learning
paradigm in which clients enable collaborative training without compromising
private data. However, how to learn a robust global model in the data-heterogeneous
and model-heterogeneous FL scenarios is challenging. To address it, we resort
to data-free knowledge distillation to propose a new FL method (namely DFRD).
DFRD equips a conditional generator on the server to approximate the training
space of the local models uploaded by clients, and systematically investigates
its training in terms of fidelity, transferability and diversity. To overcome the
catastrophic forgetting of the global model caused by the distribution shifts of
the generator across communication rounds, we maintain an exponential moving
average copy of the generator on the server. Additionally, we propose dynamic
weighting and label sampling to accurately extract knowledge from local models.
Finally, our extensive experiments on various image classification tasks illustrate
that DFRD achieves significant performance gains compared to SOTA baselines.
Our code is here: https://anonymous.4open.science/r/DFRD-0C83/.

评：联邦学习是一种机器学习方法，是对数据不能共享的一种妥协，使用分散的模型训练不同的数据是现状。如何在保护数据的前提下，将数据的模式共享，这是其最大的难点。最简单的联邦学习是显而易见的，相当于数据并行呗。但是当数据的格式不同【数据异质的一种表现】，模型异质时，情况将变得异常复杂。没有看论文。【无法细说】

【未完：待续】