2021WSB-day4-1: Patel教授讲解Federated learning for biometrics application生物特征识别中的联邦学习机制

最新推荐文章于 2023-06-23 23:00:23 发布

MrCharles

最新推荐文章于 2023-06-23 23:00:23 发布

阅读量1.3k

点赞数

分类专栏： WinterSchoolBiometrics2021

本文链接：https://blog.csdn.net/MrCharles/article/details/113237269

版权

WinterSchoolBiometrics2021 专栏收录该内容

24 篇文章

订阅专栏

本文介绍了联邦学习的基本概念，包括其动机、关键技术如FedAvg及面临的挑战。探讨了如何通过差分隐私等方法保护用户数据，并展示了在生物特征认证等场景的应用案例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

听百家之言，集百家智慧，站在巨人肩上攀登

讲解的老师是来自约翰霍普金斯的Patel教授.

Vishal M. Patel is an Assistant Professor in the Department of Electrical and Computer Engineering (ECE) at Johns Hopkins University.

文章目录

Federated learning for biometrics application
AlexNet vs LeNet
数据不简单呀
联邦学习
Federated Learning - Applications
Federated Iearning - Challenges
FL with differential privacy
Tools
应用
检测是不是真的人
活跃验证
one class classfication problem
Summary
问答QA

Federated learning for biometrics application

这个为我们不是经常见到的，所以今天我们讲一下啦

Agenda
Part 1
Motivation
Federated learning

FedAvg
$-$ SplitNN
Privacy-enhancing methods for federated learning

Part 2

Applications
- Face anti-spoofing
- Active authentication
Open problems

AlexNet vs LeNet

AlexNet可以是因为：

Availability of large annotated data
More layers a Capture more invariances
More computing
Availability and affordability of GPUs
Better regularization
Dropout
New nonlinearities
Rectified Linear Unit (ReLU)
Parametric Rectified Linear Unit (PReLU)

最主要还是数据集。

数据不简单呀

Collecting and annotating datasets

Expensive
Labor intensive
User privacy issues
- GDPR: General Data Protection Regulation
- HIPAA: Health Insurance Portability and Accountability Act, 1996
- SHIELD: Stop Hacks and Improve Electronic Data Security Act, Jan 12019
- PCI: Payment Card Industry Data Security Standard, 2004
- IRB: Institutional Review Board

所以，我们想知道怎么保护用户的隐私

Data privacy (protect the data)

Cancelable biometrics
- Modify data through revocable and non-invertible transformations
BioHashing
Random projections are used to generate templates
Differential privacy
An algorithm is differentially private if its behavior hardly changes when a single individual joins or leaves the dataset
Hide unique samples (add noise to data)
Homomorphic encryption
Perform calculations on encrypted data

Federated learning (build protection into the models)

Machine learning on decentralized data
Communication-efficient learning of deep networks from decentralized data, AISTATS 2017, McMahan et al. (Google)

联邦学习

在这里插入图片描述

Federated Learning - Applications

Learning over smart phones
- Mobile-based biometrics applications
- Active authentication
Learning across organizations
- Multi-institutional collaboration
Internet of things
- Wearable devices, autonomous vehicles, smart homes, …

Federated Iearning - Challenges

Communication

Federated networks are comprised of a massive number of device’s which causes communication in the network to be slower than local computations (i.e. expensive communication)
Need communication-efficient methods that iteratively send model updates as part of the training process

Systems heterogeneity

Storage, computational, and communication capabilities of each device in federated networks may differ due to variability in hardware (CPU, memory), network connectivity $(3 G, 4 G, 5 G,$ wifi), and power (battery level)
Stragglers and fault tolerance significantly more prevalent

Non-IID data

Devices frequently generate and collect data in a non-identically distributed manner across the network.
Unbalanced data
Increases the likelihood of stragglers, and may add complexity in terms of modeling, analysis, and evaluation

Privacy issues

攻击者可以重构用户的数据，基于模型参数：
在这里插入图片描述

FL with differential privacy

你就往里面加noise。

传输参数的时候，你要threshold，truncate，加noise。这样攻击者基本上就没有办法了。
在这里插入图片描述
Three key properties

There is a tradeoff between convergence performance and privacy protection levels, i.e., better convergence performance leads to a lower protection level
Given a fixed privacy protection level, increasing the number $\mathrm{N}$ of overall clients participating in $\mathrm{FL}$ can improve the convergence performance
There is an optimal number aggregation times (communication rounds) in terms of convergence performance for a given protection level

在这里插入图片描述

Tools

在这里插入图片描述

应用

在这里插入图片描述

检测是不是真的人

在这里插入图片描述

活跃验证

在这里插入图片描述
就是持续性的验证用户。用户拿着手机，走路，拍照，打字，触摸等都可以进行持续的验证授权。

one class classfication problem

在这里插入图片描述
找一个boundary。可以尝试解决一个分类的一些问题。

在这里插入图片描述

Summary

Federated learning promises to be an active area of research

Open problems

Domain adaptive FL methods
Benchmarks
Unsupervised and semi-supervised FL
Privacy preserving FL methods
Novel FL models for biometrics and surveillance applications

在这里插入图片描述

问答QA

问： safety, 如果用户总是在手机上做一些故意的错误的打字？

有其他的用户的数据辅助
我们需要的是average model不是local model

问：我们怎么做FL的研究，我们没有这么多设备呀？

很多人有做，我们可能使用同样的idea在不同的领域
你可以尝试任何问题，如果你有这么多数据
如果你没有足够数据，把他拆分，当作不同的数据中心

问： Hi, Vishal, great work! FedPAD is to average the weights, and is a linear solution. Since CNN is a non-linear model, do you have any non-linear solution to combine the parameters from different data centers?

aggregate参数的话，这个也可以是non-liner的，
你可以加non-linear啦。

问： Thank you, professor. Are there any other privacy protection methods besides differential privacy?

differential privacy是因为有直接的证明
其他的，譬如cancellable biometric
这些都没有人做，你可以尝试

问：And how should we experiment when there are not so many mobile clients?

是的，客服端越多越好
没有的话，你也可以尝试，看看会怎样

问： 100手机，和只有1部手机？

local data local model，不会受到影响

问： Hello professor. Thank you for your presentation. The data should be synchronized when they are uploaded to the sever. Is there a particular strategy on severs about data synchronization and integrality for federated learning?

数据一直是在本地，服务器只有模型参数

问： Thank you for the presentation.I want to know if a owner of the data also owns part of the copyright of the trained model (parameters) according to some laws such as GDPR

GDPR can you identify the ppl?
有可能拿到用户数据，譬如重构图像
no，不是的