CLASS 6
Lesson 6.1
应用
In the context of Federated Learning, there are also other applications that are very interesting.
Predictive Maintenance
One interesting case is predictive maintenance. So, for example, if you own a car. Every several 1,000 miles you have to take it in to get an oil change, right? We just have this fixed number of miles that we have to do this typically. But Federated Learning could be used to try to predict when your car needs to go into maintenance ahead of time by having a model that actually lives within all cars that are on the road, studying when they actually start to break down.
So in this particular way, using various aspects of your climate, how you drive, the kind of engine you have, these kinds of predictive models could actually become intelligent in doing this. The nice thing is that, this wouldn’t actually require you to upload any data from your car to some sort of central server. In theory, a model just comes down to your car, it learns how to predict when your car is going to require maintenance and it uploads that ability to predict. An update to the model, backup to the Cloud.
一种有趣的情况是预测性维护。 因此,例如,如果您拥有一辆汽车。 每隔几千英里就要换一次油,对吗? 我们只有通常要执行的固定英里数。 但是,通过使用一个模型,该模型可以实际存在于行驶中的所有汽车中,并研究它们何时真正开始发生故障,因此可以使用联合学习来尝试预测汽车何时需要提前维修。
因此,以这种特定方式,利用气候的各个方面,驾驶方式,拥有的发动机类型,这些预测模型实际上可以在执行此操作时变得非常聪明。 令人高兴的是,这实际上并不需要您将汽车中的任何数据上传到某种中央服务器。 从理论上讲,模型只是落到您的汽车上,它会学习如何预测汽车何时需要维护,并会上载该预测能力。 模型更新,备份到云。
Wearable Medical Devices
Another interesting use cases is in wearable medical devices. So you can imagine if you have one of those sort of fancy watches, or wristbands that’s tracking your movement, heart rate, or perhaps even your sleep cycles or if you maybe you log your diet online, one of these dialoguing apps, you can imagine that Federated Learning will allow you to learn a machine learning model which could help you optimize certain parts of your health, whether it’s your diet for having a better sleep cycle, or how much you move during the day for accomplishing some sort of wake up.
Again, the nice thing here is that you by yourself are not actually generating enough training data to be able to train a model like this. But, if you collaboratively trained as single machine learning model with thousands or millions of other people, then everyone can benefit from this machine learning model without anyone having to upload their data to a central Cloud.
另一个有趣的用例是可穿戴医疗设备。 因此,您可以想象一下,如果您拥有其中一种花式手表,或者跟踪您的运动,心率甚至睡眠周期的腕带,或者如果您在线记录饮食,这些对话应用程序之一,就可以想象得到 联合学习将使您能够学习一种机器学习模型,该模型可以帮助您优化健康的某些部分,无论是饮食以改善睡眠周期,还是白天为了完成某种唤醒动作而运动的时间。
同样,这里的好处是,您自己实际上并没有生成足够的训练数据来训练这样的模型。 但是,如果您与成千上万的其他人一起作为单个机器学习模型进行了培训,那么每个人都可以从该机器学习模型中受益,而无需任何人将其数据上传到中央云。
Browser : ad-blocking or auto-complete
Finally, one of the other widely deployed use cases of Federated Learning is inside of mobile browsers, often in the context of ad blocking or an auto-complete. So, when you go up to your browser and you start to type some website and it tries to predict what website it thinks you’re trying to refer to, again, that’s another example where Federated Learning has been used out in the world to allow large groups of individuals to enjoy the benefits of ML models without having to upload all of their data to the Cloud.
最后,联邦学习的另一个广泛部署的用例之一是在移动浏览器内部,通常是在广告拦截或自动完成的情况下。 因此,当您进入浏览器并开始输入一些网站时,它会尝试预测它认为要尝试引用的网站,这又是另一个示例,世界上使用联合学习来允许 一大群人可以享受ML模型的好处,而不必将其所有数据上传到云端。
So I hope these examples give you a sense for the kind of settings when Federated Learning is useful, and really a context for the setup that leads to people wanting to use Federated Learning.
为什么要使用Federated Learning(特性)
Privacy
The general challenge is that you have a scenario where a dataset is distributed amongst a large number of sources, usually because it’s being produced by the sources, in the context of phone’s, browser, wearable medical devices, or automobiles, for example. What we want to do is, we want be able to train a machine learning model on that dataset. But you aren’t able to aggregate that data set for a variety of reasons. Sometimes it’s for privacy, right?
There might be a legal restriction that prevents you from aggregating the data to a central server, medical data for example has a lot of these kinds of legal restrictions. There might be user discomfort. So in the case of this ad blocker or autocomplete use case within browsers, people don’t really want to upload every webpage URL that they’re going to to some central server if they don’t need to.
Then of course, there are competitive dynamics. So if you think in the context that predictive maintenance, automobile companies might not want to upload to some central server every time one of their automobiles breaks down. Now, if it’s just within one brand, maybe just Ford or something like that, that’d be great. But if you can imagine that if Toyota could see when Honda cars break down because if some central server, there could be some strange competitive dynamics or leaks of potentially negative information.
一般的挑战是您有一个场景,其中数据集分布在大量来源中,通常是因为它是由这些来源生成的,例如在电话,浏览器,可穿戴医疗设备或汽车的环境中。我们想要做的是,我们希望能够在该数据集上训练机器学习模型。但是由于多种原因,您无法汇总该数据集。有时候是为了隐私,对吗?
可能存在法律上的限制,阻止您将数据聚合到中央服务器,例如,医疗数据具有很多此类法律限制。用户可能会感到不适。因此,在浏览器中出现这种广告拦截器或自动完成用例的情况下,人们实际上并不需要将不需要的每个网页URL上传到某个中央服务器。
然后,当然会有竞争动态。因此,如果您在预测性维护的背景下考虑,汽车公司可能不想在每次一辆汽车发生故障时都将其上传到某个中央服务器。现在,如果它只是在一个品牌内,也许只是福特之类的东西,那就太好了。但是,如果您可以想象,如果丰田能够看到本田汽车何时发生故障,因为如果有中央服务器,则可能会出现奇怪的竞争动态或潜在负面信息的泄露。
Engineering Constraints
But Federated Learning isn’t just about privacy. Often it’s used because of engineering constraints. As it turns out, Federated Learning can actually be used to reduce the bandwidth cost of having to upload datasets to the Cloud by allowing training to happen locally inside of these devices.
So, this is why the contexts of Federated Learning is an incredibly powerful tool. It’s already been widely deployed to some very substantial use cases, and it’s one of the things that’s really advancing the fastest in terms of privacy, preserving technologies.
但是,联合学习不仅涉及隐私。 通常由于工程限制而使用它。 事实证明,通过允许在这些设备内部进行训练,联邦学习实际上可以用于减少将数据集上传到云时的带宽成本。
因此,这就是为什么联合学习是一个非常强大的工具。 它已经被广泛部署到一些非常重要的用例中,这是在保护隐私和保护技术方面真正取得最快进步的一件事。
Lesson 6.2 PySyft
PySyft is an extension to the major deep learning toolkits. In particular, we are going to be looking at its extension to the PyTorch framework, which will allow you to do the kinds of remote executions necessary for federated learning in a deep learning context.
PySyft是主要深度学习工具包的扩展。特别是,我们将研究它对PyTorch框架的扩展,它将允许您在深度学习环境中执行联合学习所需的各种远程执行。
As you can imagine, if I’m a central server, and I’m trying to orchestrate(协调) millions of different devices, training models in the correct way so that I can aggregate all these gradients, I need a special set of tools because the nature of standard deep learning toolkits is that they assume that you have some dataset locally in some local framework to let you perform operations.
可以想像,如果我是中央服务器,并且试图协调数百万个不同的设备,以正确的方式训练模型,以便我可以汇总所有这些梯度,那么我需要一套特殊的工具,因为自然标准深度学习工具包的一个假设是,它们假定您在某个本地框架中本地有一些数据集,以便您执行操作。
So what we’re going to be looking at here is a new set of tools, that allow us to actually have an interface that is nearly identical to PyTorch, but allows us to execute commands on remote machines and coordinate how data is moved amongst different machines. That’s really what’s going to form the foundation for us to be able to do federated learning on actual distributed dataset.
因此,我们这里要看的是一组新工具,这些工具允许我们实际上具有与PyTorch几乎相同的界面,但允许我们在远程计算机上执行命令并协调如何在不同机器之间移动数据机器。这实际上将构成我们能够在实际的分布式数据集上进行联合学习的基础。
6.8 部分实战
and so occasionally if you do a few commands, they’ll get cached in such a way where garbage collection will no longer collect those objects. So you might end up with just things, just by virtue of(凭借) the fact you’re working in a Jupyter Notebook, it can keep a couple extra objects around.
因此,偶尔执行一些命令,它们就会以某种方式被缓存,即垃圾回收将不再收集那些对象。 因此,由于您在Jupyter笔记本中工作,您可能只剩下一些东西,它可以在周围保留几个额外的对象。
PS: 我在自己的服务器中测试了一下,并未发现这个问题。可能不同的jupyter配置不一样吧。
So let me show you what I mean specifically. So if we create another one of these and then we go x and we’ll just call the double under reaper here, then if I go x equals this or as this, x equals asdf, there’s still a reference to it.
Then I just want to show you that this garbage collection works for like for loops and stuff too.
然后,我只想向您展示该垃圾收集也适用于for循环和填充。