stm32分散加载学习_倡导联合学习的天才使隐私分散化

最新推荐文章于 2023-11-30 17:41:07 发布

杨_明

最新推荐文章于 2023-11-30 17:41:07 发布

阅读量228

点赞数

文章标签： python 机器学习人工智能 java 编程语言

原文链接：https://towardsdatascience.com/decentralizing-ai-championing-privacy-the-genius-of-federated-learning-3760a613ac70

版权

stm32分散加载学习

Chances are, your phone is within five feet of you right now.

您的手机现在离您的五英尺以内。

In fact, a 2013 study found that 72% of people were within that distance of their phone most of the time. A 2015 study found that 63% of participants were away from their phone less than one hour a day. An additional quarter could not remember being away from their phones at all.

实际上，2013年的一项研究发现，大部分时间里，有72％的人在手机的距离范围内。 2015年的一项研究发现，每天有63％的参与者每天不到一小时不使用手机。再有四分之一的人根本不记得离开手机了。

It’s obvious that our devices, particularly our phones, are constantly cranking out data. They’re chock-full of sensors — both digital, following your touch movements and browsing history, and physical, tracking your physical locations and movements. Your phone is the data embodiment of you, and it’s remarkably, and perhaps disturbingly, easy to access.

显然，我们的设备，尤其是我们的手机，正在不断地处理数据。它们塞满了传感器-既是数字式的，跟随您的触摸动作和浏览历史记录，又是物理的，可跟踪您的物理位置和动作。您的电话是您数据的体现，并且非常容易使用，而且很令人不安。

From a machine learning perspective, this is incredibly helpful. With more data comes a golden opportunity to create smarter models that produce more engaging and personal results. Whether it’s curating customized search results based on your location or monitoring your browsing and social media activity to recommend appropriate products to buy, the applications of more data are countless.

从机器学习的角度来看，这是非常有用的。有了更多的数据，就有了千载难逢的机会来创建更智能的模型，从而产生更多引人入胜的个人结果。无论是根据您的位置策划定制的搜索结果，还是监视您的浏览和社交媒体活动以推荐合适的产品，无数数据的应用都是无数的。

Ethically, however, the prospect that you — your activity, your network, your conversations, your movement — doesn’t truly belong to you should be troubling, if not horrifying. Organizations that work with user data in the digital age need to be very cautious with it, lest it be exploited and manipulated for malicious purposes. These days, many digital users are, rightfully, stringent about how their data is being used.

从伦理上讲，您的前景-您的活动，网络，对话，运动-并不真正属于您，这应该令人担忧，即使这并不令人恐惧。在数字时代，处理用户数据的组织必须非常谨慎，以免被恶意利用和操纵。如今，许多数字用户理所应当地对自己的数据使用方式严格。

Now, an important question must be asked: how can the benefits of big data for more personalized and engaging experiences be realized while championing — not merely acknowledging— the privacy of users’ data?

现在，必须提出一个重要的问题：在倡导(而不仅仅是承认)用户数据的隐私性的同时，如何实现大数据带来的更多个性化和引人入胜的体验的好处？

Google came up with one approach to solving this difficult task: federated learning. Introduced in a 2016 paper, federated learning is not a specific algorithm but a structured strategy towards mining insights from user data in an ethical privacy-upholding manner.

Google提出了一种解决这一难题的方法：联合学习。联合学习在2016年的一篇论文中引入，它不是一种特定的算法，而是一种结构化的策略，旨在以符合道德的隐私维护方式从用户数据中挖掘见解。

To understand this, however, first we must understand how other conventional methods of deploying AI models work. Let’s say that we are using YouTube’s app, of which a core functionality is its video recommendation algorithm.

但是，要了解这一点，首先我们必须了解其他部署AI模型的常规方法是如何工作的。假设我们正在使用YouTube的应用程序，其核心功能是其视频推荐算法。

Usually, these types of heavy models are stored in the cloud, which makes sense — they have billions of parameters and look over countless potential videos. It would be more efficient to store the model in the cloud, and to have users connect to it via requests.

通常，这些类型的重型模型存储在云中，这很有意义-它们具有数十亿个参数，并且可以查看无数潜在视频。将模型存储在云中并让用户通过请求连接到模型将更加有效。

The requests sent by users come in the form of data, and are returned with responses. For instance, while you are browsing YouTube your phone has sent your data — your browsing history, topics you find interesting, etc. — to the cloud, which returned possible videos you may like (response).

用户发送的请求以数据形式出现，并与响应一起返回。例如，当您浏览YouTube时，您的手机已将您的数据(您的浏览历史记录，您发现有趣的主题等)发送到了云端，云端返回了您可能喜欢的视频(响应)。

Based on the newly connected data and whether you actually clicked on the recommended videos, YouTube’s recommendation algorithm constantly receives new input and feedback on how well it is doing. It takes the new training data and hence can generate predictions that transform with new trends, ideas, and movements.

根据新连接的数据以及您是否实际点击了推荐视频，YouTube的推荐算法会不断收到有关其效果的新输入和反馈。它吸收了新的训练数据，因此可以生成随新趋势，新思想和新动作而变化的预测。

Obviously, there are many privacy concerns here. All the data is centralized in one spot — the cloud — and every time a machine learning model is needed, which is quite often, your data is being sent there. Many data breaches are deadly because the data is so centralized in one location. It is the same concern with banks — centralized money stashes — that was the driving rationale behind creating Bitcoin.

显然，这里有很多隐私问题。所有数据都集中在一个位置-云中-并且每次需要机器学习模型时(通常，您的数据就被发送到那里)。许多数据泄露是致命的，因为数据是如此集中在一个位置。与银行一样的关注-集中的货币存储-这是创建比特币背后的驱动原理。

On the other hand, offline inference eliminates such a heavy dependency on the cloud. When the phone is in a comfortable state — e.g. plugged in, high battery, connected to a secure wi-fi— it downloads a downsized, mobile-optimized version of the original model. There are several benefits to using offline inference:

另一方面，离线推理消除了对云的严重依赖。当手机处于舒适状态时(例如，插入电源，充满电并连接到安全的Wi-Fi)，它将下载原始模型的缩小版，针对移动设备进行优化的版本。使用离线推理有几个好处：

Privacy. Clearly, the constant requests to the cloud aren’t needed. Your data is localized on your phone, which directly receives responses from the downloaded model.
隐私。显然，不需要对云的持续请求。您的数据已在手机上本地化，可以直接从下载的型号接收响应。
Less risk of harmful data breaches. When the data of each user is localized, even if a hacker were to gain access the cloud, the information — version of models, performances of downloaded models on users’ phones — wouldn’t be much use. This reduces the incentive to perform data breaches in the first place.
减少有害数据泄露的风险。对每个用户的数据进行本地化后，即使黑客想要访问云，信息(模型的版本，用户手机上下载的模型的性能)也不会有用。首先，这降低了执行数据泄露的动机。
Less bandwidth and battery are needed to run the model. Constantly maintaining a request-response dialogue with the cloud, especially as thousands, if not tens of thousands, of users are doing the same thing at the same moment, is computationally expensive. Offline inference is well-developed with methods like model compression to yield powerful predictions whilst taking up small storage space.
运行模型所需的带宽和电池更少。不断维护与云的请求-响应对话，特别是当成千上万(即使不是数万)的用户在同一时间做同一件事时，在计算上是昂贵的。离线推论通过模型压缩等方法得到了很好的发展，可以在不占用存储空间的情况下进行强大的预测。
Lower latency. Because the model is localized and hence only serves one client, the response time for predictions is much faster. In the world of digital interfaces, every fraction of a second is worth considerable amounts of money.
较低的延迟。由于模型是本地化的，因此仅服务于一个客户端，因此预测的响应时间要快得多。在数字接口的世界中，每一分之一秒的价值都是可观的。
The model can be run even when there is no internet connection. This is a valuable feature for any product. Consider, for example, Google Translate: would it be as popular if you couldn’t pop it out and translate languages, both in text and visual form, anywhere?
即使没有互联网连接，该模型也可以运行。这是任何产品的重要功能。例如，考虑Google Translate：如果您无法在任何地方将其弹出并翻译成文字和视觉形式的语言，它会如此受欢迎吗？

So online and offline inference are the norm when deploying models in distributed systems. Clearly, offline inference is more secure and private in terms of data management, but you can’t simultaneously train the model. On the other hand, online inference is less secure, but the model can be updated.

因此，在分布式系统中部署模型时，在线和离线推理是常态。显然，就数据管理而言，离线推理更加安全和私密，但是您不能同时训练模型。另一方面，在线推理的安全性较差，但是可以更新模型。

Any company knows that if they remain static, they will quickly fall out of favor with the dynamic nature of demand. With the limitations of offline inference, that’s not a big win for the championing private and secure data.

任何公司都知道，如果它们保持静态，它们将很快因需求的动态性质而失宠。由于离线推理的局限性，对于拥护私有和安全数据而言，这并不是一个大胜利。

With more knowledge, a more specific question can be asked: how can deep models be trained — updated with new data — while championing the privacy of users’ data?

有了更多的知识，就会提出一个更具体的问题：在维护用户数据的隐私的同时，如何训练深度模型(使用新数据进行更新)？

In the initial concept, Google prescribes one solution framework to this problem:

在最初的概念中，Google规定了一个解决此问题的解决方案框架：

A sample of devices are selected to participate in training. These are usually phones that are currently in a ‘comfortable’ and active state.
选择设备样本参加培训。这些通常是当前处于“舒适”和活动状态的电话。
The current model parameters are downloaded onto each device.
当前模型参数已下载到每个设备上。
The devices train their offline models for a preset period of time (e.g. 20 minutes), using local training data to update model parameters.
设备使用本地训练数据更新模型参数，以预设时间段(例如20分钟)训练其离线模型。
After the preset period of time, the phone uploads their model parameters to the cloud.
预设时间后，手机会将其模型参数上传到云中。
An algorithm takes the parameters for each model — which have developed differently depending on how the user interacted with it (the training data that was provided) — and aggregates them to form an updated global model.
算法采用每个模型的参数(根据用户与模型交互的方式(提供的训练数据)而开发不同的参数)并将其汇总以形成更新的全局模型。
Initiate another round of model updating.
启动另一轮模型更新。

Federated learning is so brilliant because it solves problems of both offline and online inference. It allows companies to champion the privacy of users while updating their models in dynamic with their market flow.

联合学习是如此出色，因为它解决了离线和在线推理的问题。它使公司可以捍卫用户的隐私，同时根据市场流动态更新其模型。

The key to federated learning — distributed learning that champions privacy while continually optimizing models with the power of big data — is in the aggregation algorithm. The initial algorithm proposed, the ‘Federated Averaging Algorithm’, works well enough, which essentially performs a weighted average on the received parameters.

联合学习的关键是聚合算法，即分布式学习既可以保护隐私，又可以利用大数据的功能不断优化模型，同时又可以保护隐私。提出的初始算法“联合平均算法”运行良好，基本上对接收到的参数执行了加权平均。

Let’s take an example of federated learning. Suppose a texting service is testing out their language model, which recommends words a user will type next based on what they have already typed.

让我们以联合学习为例。假设一个短信服务正在测试他们的语言模型，该模型建议用户根据他们已经输入的内容接下来输入的单词。

100 users are selected for the training, and the current model parameters are sent to their phones. Then, throughout the next forty-eight hours, the models will receive performance updates and training based on how well they perform. For instance, say a model recommends ‘you’ after a user has already typed ‘how are’.

选择了100个用户进行训练，并且当前模型参数已发送到他们的手机。然后，在接下来的48小时内，这些模型将根据其性能状况接受性能更新和培训。例如，假设某个模型在用户已经输入“怎么样”之后建议“您”。

If the user indeed does type or select ‘you’, the model receives reinforcement on a currently correct decision-making process. If the user doesn’t type ‘you’, this is new training data for the model to learn. Federated learning can only operate on these models that are able to collect labels purely from local data.

如果用户确实输入或选择了“您”，则模型会在当前正确的决策过程中得到加强。如果用户未输入“您”，则这是供模型学习的新训练数据。联合学习只能在这些只能从本地数据中收集标签的模型上运行。

We all type differently — a teenager whose texting is chock-full of acronyms (“how r u”, “brb”, “lol/lmao”, etc.) has different styles than a distinguished writer. A company that wants to succeed should have a model that can handle all of these audiences, yet through texting we communicate some of our most person information — secrets, passwords, stories. Through federated learning, the unique insights of each consumer segment can be aggregated in a private and secure fashion.

我们的打字方式各不相同-一名青少年的短信充斥着首字母缩写词(“ how ru”，“ brb”，“ lol / lmao”等)，其风格与杰出的作家不同。想要成功的公司应该有一个可以处理所有这些受众的模型，但是通过发短信，我们可以传达一些我们最亲密的信息-机密，密码，故事。通过联合学习，可以以私有和安全的方式汇总每个消费者细分市场的独特见解。

One issue identified with the naïve model of federated learning is the models, especially very deep ones, can essentially memorize updated information, whose parameters are still sent to the central server. Another aggregation algorithm, differentially-private federated averaging, has been proposed to further secure the information contained in individual model parameters.

联合学习的朴素模型存在的一个问题是模型，尤其是非常深入的模型，可以从本质上记住更新后的信息，这些信息的参数仍然发送到中央服务器。已经提出了另一种聚合算法，差分私有联合平均，以进一步确保各个模型参数中包含的信息。

As big data continues to expand rapidly, privacy is increasingly becoming a primary concern. Continued research in federated learning and establishing it as a standard in artificial intelligence is key.

随着大数据的持续快速增长，隐私正日益成为人们关注的主要问题。持续研究联邦学习并将其建立为人工智能标准是关键。

关键点 (Key Points)

In the digital age, our phones and other devices are increasingly attached to you. Your activity, network, movements, and information need to be secure and private.
在数字时代，我们的电话和其他设备越来越多地连接您。您的活动，网络，活动和信息需要安全且私密。
Online learning centralizes AI. Users send their data to the model in the cloud, which return predictions and are updated. Offline learning decentralizes AI, at the cost of not being able to continuously train a global model with new insights.
在线学习将AI集中化。用户将其数据发送到云中的模型，该模型返回预测并进行更新。离线学习分散了AI的权限，但代价是无法以新的见解不断训练全球模型。
Federated learning combines the best of online and offline learning by training models on local data, then aggregating the parameters to form an updated global model. A global AI can be trained without touching personal data through a decentralized system.
联合学习通过对本地数据进行模型训练，然后整合参数以形成更新的全局模型，从而结合了在线和离线学习的优势。可以训练全球AI，而无需通过分散系统接触个人数据。

Thanks for reading!

谢谢阅读！

翻译自: https://towardsdatascience.com/decentralizing-ai-championing-privacy-the-genius-of-federated-learning-3760a613ac70

stm32分散加载学习

杨_明

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
stm32分散加载学习_倡导联合学习的天才使隐私分散化

stm32分散加载学习Chances are, your phone is within five feet of you right now.您的手机现在离您的五英尺以内。In fact, a 2013 study found that 72% of people were within that distance of their phone most of the time. A 2015...
复制链接

扫一扫