Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction(CVPR2022)
本文提出一种h-NNE,在保证性能的同时比t-SNE, UMAP等数据降维方法有更快的运行速度。
The projection algorithm consists of three main steps:
building a tree hierarchy based on 1-NNGs, computing a preliminary projection with an approximate version of PCA and adjusting the projected point locations based on the constructed tree. The projected point location adjustment can be enhanced with an optional inflation step which can be used to improve visualization. In the following sections, we will elaborate on each step and provide some evidence of their validity.
方法图:
效果图:
KNOWLEDGE REMOVAL IN SAMPLING-BASED BAYESIAN INFERENCE (ICLR2022)
文章大概瞄了两眼,做machine unlearning的, 但是貌似比较的对象很少,没有和现有的一些主流machine unlearning方法比。
Existing works propose methods to remove knowledge
learned from data for explicitly parameterized models, which however are not appliable to the sampling-based Bayesian inference, i.e., Markov chain Monte Carlo (MCMC), as MCMC can only infer implicit distributions. In this paper, we propose the first machine unlearning algorithm for MCMC. We first convert the MCMC unlearning problem into an explicit optimization problem. Based on this problem conversion, an MCMC influence function is designed to provably characterize the learned knowledge from data, which then delivers the MCMC unlearning algorithm. Theoretical analysis shows that MCMC unlearning would not compromise the generalizability of the MCMC models. Experiments on Gaussian mixture models and Bayesian neural networks confirm the effectiveness of the proposed algorithm.
DECOUPLE-AND-SAMPLE: PROTECTING SENSITIVE INFORMATION IN TASK AGNOSTIC DATA RELEASE
将数据信息分为敏感信息和不敏感信息,并将敏感信息去敏后和不敏感信息合并。
使用场景:
UC1: A crowd-sourcing company can build a facial recognition model for medical diagnostics from the
sanitized dataset. This model will be deployed on cloud, therefore the prediction will be performed over sanitized
images.
UC2: A group of researchers can develop a model of capturing keypoints from face images. Unlike UC1, they want the model to predict over unsanitized images. Hence sanitized images should be photo-realistic to prevent a domain mismatch.
UC3: The hospital wants to share a sanitized dataset with a company to build an ML model to predict “age”. Similar to UC2, the hospital would perform prediction on unsanitized images hence the sanitized dataset should be photo-realistic. However, unlike UC2, prediction attribute “age” is also a sensitive attribute requiring privacy.
方法论: