2023-10 去中心化联邦均衡(写作学习)

原文题目:Decentralized Federated Averaging

作者:Tao Sun , Dongsheng Li , and Bao Wang

摘要

Federated averaging (FedAvg) is a communication-efficient algorithm for distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients.This central server distributes the parameters to each client and collects the updated parameters from clients.

介绍FedAvg:何种算法+基本的工作模式

FedAvg is mostly studied in centralized fashions, requiring massive communications between the central server and clients, which leads to possible channel blocking. Moreover, attacking the central server can break the whole system’s privacy.

点出FedAvg存在的问题:通信成本+易受攻击

Indeed, decentralization can significantly reduce the communication of the busiest node (the central one) because all nodes only communicate with their neighbors.

介绍去中心化方法的优势

To this end, in this paper, we study the decentralized FedAvg with momentum (DFedAvgM), implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. The proposed algorithm involves the mixing matrix, momentum, client training with multiple local iterations, and quantization, introducing extra items in the Lyapunov analysis. Thus, the analysis of this paper is much more challenging than previous decentralized (momentum) SGD or FedAvg.

本文的亮点,并点出与类似工作的区别和优势

We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved to sublinear when the loss function satisfies the PL property. Numerically, we find that the proposed algorithm outperforms FedAvg in both convergence speed and communication cost.

假设和效果

1 简介

Federated learning (FL) is a privacy-preserving distributed machine learning (ML) paradigm [1]. In FL, a central server connects with enormous clients (e.g., mobile phones, pads, etc.); the clients keep their data without sharing it with the server. In each communication round, clients receive the current global model from the server, and a small portion of clients are selected to update the global model by running stochastic gradient descent (SGD) [2] for multiple iterations using local data. The central server then aggregates these updated parameters to obtain the updated global model. The above learning algorithm is known as the federated average (FedAvg) [1]. In particular, if the clients are homogeneous, FedAvg is equivalent to the local SGD [3]. FedAvg involves multiple local SGD updates and one aggregation by the server in each communication round, which significantly reduces the communication cost between sever and clients compared to the conventional distributed training with one local SGD update and one communication.

介绍FL,FedAvg,Local SGD之间的关系,Local SGD节省通信成本

1.1 动机

In FL applications, large companies and government organizations usually play the role of the central server. On the one hand, since the number of clients in FL is massive, the communication cost between the server and clients–the busiest communication in the centralized system–can be a bottleneck because all clients are connected with the central server [4]. On the other hand, the updated models collected from clients encode the private information of the local data; adversaries can attack the central server to break the privacy of the whole system, which remains the privacy issue as a serious concern. To this end, decentralized federated learning has been proposed [5], [6], where all clients are connected with an undirected graph, a.k.a. overlay network. Decentralized FL (DFL) replaces the server-clients communication in FL with clients-clients communication, or peer-to-peer communication.

FL的痛点:1.通信成本;2.安全性。针对痛点引入DFL,并简述DFL的本质: 将FL中的服务器-客户机通信替换为客户机-客户机通信或点对点通信。

Compared with centralized federated learning, decentralized federated learning enjoys several advantages: 1) DFL significantly reduces the communication costs of the busiest node in FL. DFL reduces the communication cost of centralized FL’s busiest node (central server), since all nodes are connected to the central server in the centralized FL. However, in the decentralized case, all nodes only communicate with their neighbors. One of the simplest decentralized FL cases is using a ring graph to connect all clients, in which each node just connects two topological neighbors. 2) DFL is more robust to clients’ failures than the centralized FL. Centralized FL will stop if the central server is broken. While decentralized FL can still work even several clients are out of order. Thus, the decentralized one is more robust to potential clients’ failures. 3) DFL is more resilient to potential privacy attacks than FL. Privacy is another primary concern of federated learning since the central server is also exposed to adversarial attacks. Notice that the central server contains all clients’ information in FedAvg. If someone successfully attacks the centralized server, all information may be divulged. While in the decentralized case, all clients only communicate with their neighbors. Only part of the information will be leaked if some clients are attacked. As confirmed numerically in our paper, DFL is more robust to potential privacy attacks, e.g., membership inference attacks.

对比DFL和CFL(中心化FL)的优劣势:

1.DFL显著减少通信成本:DFL降低了集中式FL最繁忙节点(中央服务器)的通信成本,因为在集中式FL中,所有节点都连接到中央服务器。而在分散的情况下,所有节点只与邻居通信。最简单的去中心化FL案例之一是使用环图连接所有客户端,其中每个节点只连接两个拓扑邻居。

2.DFL有更好的鲁棒性:如果中央服务器损坏,集中式FL将停止。而去中心化的FL仍然可以工作,即使几个客户端出了故障。

3.DFL有更好的防御性:隐私是联邦学习的另一个主要问题,因为中央服务器也会受到对抗性攻击。注意,中央服务器包含FedAvg中的所有客户端信息。如果有人成功地攻击了中央服务器,所有信息都可能泄露。而在去中心化的情况下,所有客户端只与它们的邻居通信。如果部分客户端受到攻击,只会泄露部分信息。正如我们的论文中数值所证实的那样,DFL对潜在的隐私攻击(例如成员推断攻击)更加健壮。

In this paper, we consider two crucial issues about decentralized FL: 1) Although there is no expensive communication between server and clients in decentralized FL, the communication between local clients can be costly when the size of ML model is large. Therefore, it is crucial to ask can we reduce the client-client communication cost in DFL systems? 2) Momentum is a well-established acceleration technique for SGD [7]. It is natural to ask can we use SGD with momentum to improve the training of ML models in decentralized FL with theoretical convergence guarantees?

明确本文针对的两个科学问题:1.我们能否降低DFL系统中的客户-客户通信成本?(模型大时,仅是节点间的通信也有很高的通信成本)

2.我们是否可以使用带动量的SGD来改进分散FL中ML模型的训练,并保证理论收敛?(动量是公认的SGD加速方法)

1.2 其他相关工作和新颖性

We briefly review three lines of work that are most related to this paper, i.e., federated learning, decentralized training, and decentralized federated learning.

总述相关工作的三个方面:FL, Decentralized training, DFL

Federated Learning. Many variants of FedAvg have been developed with theoretical guarantees. In [8], the authors use the momentum method for local clients training in FedAvg. The authors of [9] propose the adaptive FedAvg, whose central parameter server uses the adaptive learning rate to aggregate local models. Lazy and quantized gradients are used to reduce communications [10], [11]. In the paper [12], the authors propose a Newton-type scheme for federated learning. The federated learning method has been applied to Internet of Things (IoT) researches [13]. The convergence analysis of FedAvg on heterogeneous data is discussed in [14], [15], [16]. More details and applications of federated learning could be found in [17], [18]. Recent advances and some open problems in FL is available in survey papers [19], [20].

 联邦学习:动量方法,自适应学习率,惰性和量化梯度,牛顿式方案,IoT设备,异构数据上的收敛性分析,应用,综述

Decentralized Training. Decentralized algorithms were originally developed to calculate the mean of data that are stored over multiple sensors [21], [22], [23], [24]. Decentralized (sub) gradient descents (DGD), one of the simplest and efficient decentralized algorithms, have been studied in [25], [26], [27], [28], [29]. In DGD, the convexity assumption is unnecessary [30], which makes DGD useful for nonconvex optimization. A provably convergent decentralized SGD (DSGD) is proposed in [4], [31], [32]. The paper [31] provides the complexity result of a decentralized stochastic algorithm. In [32], the authors design a decentralized stochastic algorithm with dual information and provides the theoretical convergence guarantee. The authors of [4] prove that DSGD outperforms SGD in communication efficiency. Asynchronous DSGD is analyzed in [33]. DGD with momentum is proposed in [34], [35]. Quantized DSGD has been proposed in [36].

去中心化训练:分散算法的初衷:计算存储在多个传感器上的数据的平均值。DGD:分散梯度下降对凸性无要求。DSGD理论工作:复杂度,改进和理论收敛性证明,证明DSGD在通信高效性上优于SGD,异步DSGD,动量,量化。

Decentralized Federated Learning. Decentralized FL is a learning paradigm of choice when the edge devices do not trust the central server in protecting their privacy [18]. The authors in [37] propose a novel FL framework without a central server for medical applications, and the new method offers a highly dynamic peer-to-peer environment. The papers [5], [6] consider training an ML model with a connected overlay network whose nodes take a Bayesian-like approach by introducing a prior of the parameter space.

去中心化联邦学习:当边缘设备不相信中央服务器能保护它们的隐私时,去中心化FL是一种选择的学习范例。[37]的作者为医疗应用程序提出了一种新的FL框架,没有中央服务器,新方法提供了一个高度动态的点对点环境。论文[5],[6]考虑训练一个带有连接覆盖网络的ML模型,其节点采用类似贝叶斯的方法,通过引入参数空间的先验。

Compared with existing works on FedAvg [1], [8], [9], [10], [11], this paper uses a decentralized framework to enhance the robustness of FL to node failures and privacy attacks. In contrast to decentralized training [5], [6], [37], all nodes perform multiple local iterations rather than only one and employ momentum in our algorithm. To further reduce the communication costs, we use the quantization technique. These new algorithms are much more complicated than FedAvg or DSGD, and their convergence analysis is significantly more challenging than analyzing FedAvg and DSGD. We present the detailed convergence results of the proposed algorithms in convex, nonconvex, and Pº conditions. From a practical viewpoint, decentralized FL enjoys communication efficiency and fast convergence; we summarize the advantages of decentralized FL over FL in Table 1. Moreover, we present a sufficient condition that reveals when quantization enjoys communication efficiency and convergence tradeoffs.

与传统FedAvg相比本文用了去中心化框架来增强对节点失效和隐私性攻击的鲁棒性。与去中心化方法相比,本文的方法在所有节点都进行了多次局部迭代并引入了动量方法。为了进一步减小通信成本,还引入了量化技术。这些新算法比FedAvg或DSGD复杂得多,其收敛性分析比分析FedAvg和DSGD具有更大的挑战性。我们给出了所提算法在凸、非凸和PL条件下的详细收敛结果。此外,我们提出了一个充分条件,揭示了何时量化具有通信效率和收敛权衡。

1.3 贡献

We propose decentralized FedAvg with momentum (DFedAvgM) to improve training machine learning models in a DFL fashion. To further reduce the communication cost between clients, we also integrate model quantization, i.e., quantize the local machine learning models before communication, with DFedAvgM. Our contributions in this paper are elaborated below in threefold.

提出了DFedAvgM来改善DFL,并为了进一步降低通信成本引入了量化。贡献分为三个

Algorithmically, we extend FedAvg to the decentralized setting, where all clients are connected by an undirected graph. We motivate DFedAvgM from the DSGD algorithm. In particular, we use SGD with momentum to train ML models on each client. To reduce the communication cost between each client, we further introduce a quantized version of DFedAvgM, in which each client will send and receive a quantized model.

 Theoretically, we prove the convergence of (quantized) DFedAvgM. Our theoretical results show that the convergence rate of (quantized) DFedAvgM is not inferior to that of SGD or DSGD. More specifically, we show that the convergence rates of both DFedAvgM and quantized DFedAvgM depend on the local training and the graph that connects all clients. Besides the convergence results under nonconvex assumptions, we also establish their convergence guarantee under the Polyak-Lojasiewicz (PL) condition, which has been widely studied in nonconvex optimization. Under the PL condition, we establish a faster convergence rate for (quantized) DFedAvgM. Furthermore, we present a sufficient condition to guarantee reducing communication costs.

Empirically, we perform extensive numerical experiments on training deep neural networks (DNNs) on various datasets in both IID and Non-IID settings. Our results show the effectiveness of (quantized) DFedAvgM for training ML models, saving communication costs, and protecting membership privacy of training data.

算法方面:提出DFedAvgM+量化

理论方面:收敛性证明(凸、非凸、PL)

实验证明。

1.4 论文组成

1.5 符号说明

2 问题的表述和假设

定义优化问题:

 无向图建立去中心化系统的结构

 

假设:

1. 函数f可微且L光滑

2.梯度噪声有界

3.梯度有界

4. PL性质

3 去中心化联邦均衡

3.1 DFedAvgM

简述去中心化训练过程:

1.客户端i持有参数的近似副本并计算梯度的无偏估计;

2.客户端i将局部参数更新为所有邻近节点的平均

3.客户端对更新后的参数进行梯度下降

 画图说明不足:其中每次局部训练迭代后都需要一个通信步骤。这说明上述朴素去中心化训练算法与FedAvg不同,注意后者在每次通信前都会执行多个局部训练步骤。为此,我们不得不对去中心化算法的方案进行稍微的修改。为了简单起见,我们考虑修改DSGD来激励我们的去中心化FedAvg算法。

当将DSGD应用于解决问题(1)时,得到以下结果:

 3.2 量化

 随机量化:

4 收敛性证明

点明算法和传统算法的不同,讲述证明的难点

 

<think>好的,我需要帮助用户找到STM32F103C8T6引脚分配图PDF数据手册。首先,我应该确认这个微控制器的型号是否正确,STM32F103C8T6属于STMicroelectronics的STM32F1系列,是常用的型号,所以数据手册应该是存在的。 接下来,用户需要的是引脚分配图,也就是pinout diagram。通常,数据手册里会有详细的引脚定义和功能描述。不过有时候引脚图可能会单独出现在参考手册或应用笔记里,但数据手册是首要查找的地方。 然后,我需要考虑如何指导用户下载这个PDF。官方来源是最可靠的,所以应该建议用户访问ST的官方网站。在ST官网,他们通常有一个产品页面,每个微控制器型号都有相关的文档资源。用户可以通过搜索型号找到对应的页面,然后在“文档”或“资源”部分下载数据手册。 另外,有时候第三方网站如Mouser、Digi-Key或者社区论坛也可能提供数据手册的下载链接,但为了确保准确性和安全性,官方来源是首选。需要提醒用户注意来源的可靠性,避免下载到过时或错误的文档。 还有,用户可能需要具体的步骤指导,比如如何导航到ST官网,如何搜索型号,以及如何找到下载链接。可能需要分步骤说明,确保用户即使不熟悉官网也能顺利操作。 此外,考虑到用户可能需要快速访问,或许可以提供直接链接,但通常外部链接可能会有变动,所以最好还是指导用户自行搜索。同时,应该提醒用户数据手册中的具体章节,比如引脚配置部分通常在数据手册的第几页或第几章节,例如在STM32F103xx的数据手册中,引脚定义可能在第三章或类似的章节。 还需要检查是否有常见的误区,比如混淆参考手册和数据手册。数据手册(datasheet)通常包含电气特性和引脚定义,而参考手册(reference manual)更侧重于内部模块的详细功能。所以用户需要的是数据手册。 最后,确保回答清晰,步骤明确,并且包含必要的注意事项,比如核对文档版本和型号匹配,避免用户下载错误的手册导致问题。</think>### STM32F103C8T6引脚分配图及数据手册下载方法 1. **访问ST官网获取数据手册** - 打开ST官网([www.st.com](https://www.st.com)),在搜索栏输入“STM32F103C8T6”。 - 在搜索结果中找到对应型号的“产品页面”,进入“文档”标签页。 - 下载名为**STM32F103xx datasheet**的PDF文件(例如版本Rev 18),其中包含完整的引脚分配图(见第3章“Pinouts and pin descriptions”)[^1]。 2. **通过第三方平台下载** - 若官网访问受限,可尝试在**Mouser**、**Digi-Key**等电子元件分销商页面搜索该型号,通常在“文档与资源”部分提供数据手册下载链接。 3. **引脚分配图关键内容** - 在数据手册中,引脚功能分为以下几类: - **电源引脚**:VDD、VSS、VBAT等 - **GPIO**:PA0-PA15, PB0-PB15等 - **通信接口**:USART、SPI、I2C - **调试接口**:SWD(SWCLK、SWDIO) - 特殊功能:BOOT0/1、NRST等 - 例如:PB6引脚可复用为I2C1_SCL或USART1_TX[^2]。 4. **验证文档版本** - 确保数据手册版本与芯片批次匹配(通常印在芯片表面),避免因修订差异导致设计错误。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值