讲解的老师是来自约翰霍普金斯的Patel教授.
Vishal M. Patel is an Assistant Professor in the Department of Electrical and Computer Engineering (ECE) at Johns Hopkins University.
文章目录
Federated learning for biometrics application
这个为我们不是经常见到的,所以今天我们讲一下啦
Agenda
Part 1
Motivation
Federated learning
- FedAvg
− - − SplitNN - Privacy-enhancing methods for federated learning
Part 2
- Applications
- Face anti-spoofing
- Active authentication
- Open problems
AlexNet vs LeNet
AlexNet可以是因为:
- Availability of large annotated data
- More layers a Capture more invariances
More computing - Availability and affordability of GPUs
Better regularization
Dropout
New nonlinearities - Rectified Linear Unit (ReLU)
- Parametric Rectified Linear Unit (PReLU)
最主要还是数据集。
数据不简单呀
Collecting and annotating datasets
- Expensive
- Labor intensive
- User privacy issues
- GDPR: General Data Protection Regulation
- HIPAA: Health Insurance Portability and Accountability Act, 1996
- SHIELD: Stop Hacks and Improve Electronic Data Security Act, Jan 12019
- PCI: Payment Card Industry Data Security Standard, 2004
- IRB: Institutional Review Board
所以,我们想知道怎么保护用户的隐私
Data privacy (protect the data)
- Cancelable biometrics
- Modify data through revocable and non-invertible transformations
- BioHashing
Random projections are used to generate templates - Differential privacy
An algorithm is differentially private if its behavior hardly changes when a single individual joins or leaves the dataset
Hide unique samples (add noise to data) - Homomorphic encryption
Perform calculations on encrypted data
Federated learning (build protection into the models)
- Machine learning on decentralized data
- Communication-efficient learning of deep networks from decentralized data, AISTATS 2017, McMahan et al. (Google)
联邦学习
Federated Learning - Applications
- Learning over smart phones
- Mobile-based biometrics applications
- Active authentication
- Learning across organizations
- Multi-institutional collaboration
- Internet of things
- Wearable devices, autonomous vehicles, smart homes, …
Federated Iearning - Challenges
Communication
- Federated networks are comprised of a massive number of device’s which causes communication in the network to be slower than local computations (i.e. expensive communication)
- Need communication-efficient methods that iteratively send model updates as part of the training process
Systems heterogeneity
- Storage, computational, and communication capabilities of each device in federated networks may differ due to variability in hardware (CPU, memory), network connectivity ( 3 G , 4 G , 5 G , (3 G, 4 G, 5 G, (3G,4G,5G, wifi), and power (battery level)
- Stragglers and fault tolerance significantly more prevalent
Non-IID data
- Devices frequently generate and collect data in a non-identically distributed manner across the network.
- Unbalanced data
- Increases the likelihood of stragglers, and may add complexity in terms of modeling, analysis, and evaluation
Privacy issues
攻击者可以重构用户的数据,基于模型参数:
FL with differential privacy
你就往里面加noise。
传输参数的时候,你要threshold,truncate, 加noise。这样攻击者基本上就没有办法了。
Three key properties
- There is a tradeoff between convergence performance and privacy protection levels, i.e., better convergence performance leads to a lower protection level
- Given a fixed privacy protection level, increasing the number N \mathrm{N} N of overall clients participating in F L \mathrm{FL} FL can improve the convergence performance
- There is an optimal number aggregation times (communication rounds) in terms of convergence performance for a given protection level
Tools
应用
检测是不是真的人
活跃验证
就是持续性的验证用户。用户拿着手机,走路,拍照,打字,触摸等都可以进行持续的验证授权。
one class classfication problem
找一个boundary。可以尝试解决一个分类的一些问题。
Summary
Federated learning promises to be an active area of research
Open problems
- Domain adaptive FL methods
- Benchmarks
- Unsupervised and semi-supervised FL
- Privacy preserving FL methods
- Novel FL models for biometrics and surveillance applications
问答QA
问: safety, 如果用户总是在手机上做一些故意的错误的打字?
- 有其他的用户的数据辅助
- 我们需要的是average model不是local model
问: 我们怎么做FL的研究,我们没有这么多设备呀?
- 很多人有做,我们可能使用同样的idea在不同的领域
- 你可以尝试任何问题,如果你有这么多数据
- 如果你没有足够数据,把他拆分,当作不同的数据中心
问: Hi, Vishal, great work! FedPAD is to average the weights, and is a linear solution. Since CNN is a non-linear model, do you have any non-linear solution to combine the parameters from different data centers?
- aggregate参数的话,这个也可以是non-liner的,
你可以加non-linear啦。
问: Thank you, professor. Are there any other privacy protection methods besides differential privacy?
- differential privacy是因为有直接的证明
- 其他的,譬如cancellable biometric
- 这些都没有人做,你可以尝试
问:And how should we experiment when there are not so many mobile clients?
- 是的,客服端越多越好
- 没有的话,你也可以尝试,看看会怎样
问: 100手机,和只有1部手机?
- local data local model,不会受到影响
问: Hello professor. Thank you for your presentation. The data should be synchronized when they are uploaded to the sever. Is there a particular strategy on severs about data synchronization and integrality for federated learning?
- 数据一直是在本地,服务器只有模型参数
问: Thank you for the presentation.I want to know if a owner of the data also owns part of the copyright of the trained model (parameters) according to some laws such as GDPR
- GDPR can you identify the ppl?
- 有可能拿到用户数据,譬如重构图像
- no, 不是的