【代码阅读】Byzantine对比试验代码解读

本文详细介绍了如何使用Multi-Krum算法处理CIFAR10数据集上的拜占庭攻击,包括数据预处理、Krum和Multi-Krum算法的实现,以及Fang攻击的梯度生成方法。代码示例展示了算法的实现细节,并讨论了其在分布式环境中的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  今天开始详细阅读一下做拜占庭攻击对比试验的代码,这个代码来自于FedCut的补充材料。
2023.3.31更新:
  这篇博客精读的代码连接在此


1 Multi-krum聚合算法

1.1 import package

import argparse, os, sys, csv, shutil, time, random, operator, pickle, ast, math
import numpy as np
import pandas as pd
from torch.optim import Optimizer
import torch.nn.functional as F
import torch
import pickle
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data as data
import torch.multiprocessing as mp
os.chdir(sys.path[0])
sys.path.append("..")
sys.path.insert(0,'./../utils/')
from utils.logger import *
from utils.eval import *
from utils.misc import *

from cifar10_normal_train import *
from cifar10_util import *
from adam import Adam
from sgd import SGD

这里还出现了一些小插曲,在import兄弟文件的代码文件的时候,VScode会找不到这些文件,解决了运行的问题后还有debug的问题,解决办法建议看之前写的文章。

1.2 处理Cifar10数据集,并打乱划分给50个用户

# 获取cifar10数据,并以IID的形式划分给50个客户
import torchvision.transforms as transforms
import torchvision.datasets as datasets
data_loc='D:/FedAffine/data/cifar' # 自己改一改
# 加载训练集和测试机
train_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
cifar10_train = datasets.CIFAR10(root=data_loc, train=True, download=True, transform=train_transform)
cifar10_test = datasets.CIFAR10(root=data_loc, train=False, download=True, transform=train_transform)

这里就是常见的加载数据集了,这里有50000个训练图片,10000个测试图片。

X=[]
Y=[]
for i in range(len(cifar10_train)):
    X.append(cifar10_train[i][0].numpy())
    Y.append(cifar10_train[i][1])
for i in range(len(cifar10_test)):
    X.append(cifar10_test[i][0].numpy())
    Y.append(cifar10_test[i][1])
X=np.array(X)
Y=np.array(Y)
print('total data len: ',len(X)) #长度60000

if not os.path.isfile('./cifar10_shuffle.pkl'):
    all_indices = np.arange(len(X))
    np.random.shuffle(all_indices)
    pickle.dump(all_indices,open('./cifar10_shuffle.pkl','wb'))
else:
    all_indices=pickle.load(open('./cifar10_shuffle.pkl','rb'))
X=X[all_indices]
Y=Y[all_indices]

这里把图片数据和标签存在了X(60000, 3, 32, 32)Y(60000,)中。然后加载了一个文件“cifar10_shuffle.pkl”,这个文件是打乱的索引。之后按照这个索引打乱了XY

# data loading
nusers=50
user_tr_len=1000
total_tr_len=user_tr_len*nusers
val_len=5000
te_len=5000
# 取前50000个作为训练集
total_tr_data=X[:total_tr_len]
total_tr_label=Y[:total_tr_len]
# 取之后5000个作为评估集
val_data=X[total_tr_len:(total_tr_len+val_len)]
val_label=Y[total_tr_len:(total_tr_len+val_len)]
# 取最后5000个作为评估集
te_data=X[(total_tr_len+val_len):(total_tr_len+val_len+te_len)]
te_label=Y[(total_tr_len+val_len):(total_tr_len+val_len+te_len)]
#全部转成tensor
total_tr_data_tensor=torch.from_numpy(total_tr_data).type(torch.FloatTensor)
total_tr_label_tensor=torch.from_numpy(total_tr_label).type(torch.LongTensor)
val_data_tensor=torch.from_numpy(val_data).type(torch.FloatTensor)
val_label_tensor=torch.from_numpy(val_label).type(torch.LongTensor)
te_data_tensor=torch.from_numpy(te_data).type(torch.FloatTensor)
te_label_tensor=torch.from_numpy(te_label).type(torch.LongTensor)

print('total tr len %d | val len %d | test len %d'%(len(total_tr_data_tensor),len(val_data_tensor),len(te_data_tensor)))

于是,划分后为:total tr len 50000 | val len 5000 | test len 5000

user_tr_data_tensors=[]
user_tr_label_tensors=[]
for i in range(nusers): # 遍历五十个users    
    user_tr_data_tensor=torch.from_numpy(total_tr_data[user_tr_len*i:user_tr_len*(i+1)]).type(torch.FloatTensor)
    user_tr_label_tensor=torch.from_numpy(total_tr_label[user_tr_len*i:user_tr_len*(i+1)]).type(torch.LongTensor)

    user_tr_data_tensors.append(user_tr_data_tensor)
    user_tr_label_tensors.append(user_tr_label_tensor)
    print('user %d tr len %d'%(i,len(user_tr_data_tensor)))

此举是给五十个用户每个人分配1000个训练数据,刚好分完。

1.3 Krum算法(Multi-Krum)主体

  首先要知道算法的流程是怎样的,这里我大量引用了这篇文章:Krum and Multi-Krum
  算法假设:

  我们具有 n n n个客户端 { c 1 , c 2 , ⋯   , c n } \{c_1,c_2,\cdots,c_n\} {c1,c2,,cn},和一个Server,每个客户端 c i c_i ci具有数据 D i D_i Di。一般假设数据是独立同分布的,也就是IID。
  在这 n n n个客户端中,有 f f f个是Byzantine Attacker,满足: 2 f + 2 < n 2f+2<n 2f+2<n
  也就是说,当 n = 10 n=10 n=10的时候, f f f最大是3;当 n = 100 n=100 n=100的时候, f f f最大是48。

  Krum算法步骤:

  1. Server将全局参数 W W W分发给所有的Client;
  2. 对于每个客户端 c i c_i ci同时进行:
    计算本地梯度 g i g_i gi,然后发送给Server;
  3. Server收到Client的梯度后,计算两两梯度之间的距离:
    d i j = ∥ g i − g j ∥ F 2 d_{ij}=\Vert g_i-g_j\Vert_F^2 dij=gigjF2
  4. 对于每个梯度 g i g_i gi,选择与他最近的 n − f − 1 n-f-1 nf1个距离,即 { d i , 1 , d i , 2 , ⋯   , d i , n } \{d_{i,1},d_{i,2},\cdots,d_{i,n}\} {di,1,di,2,,di,n}中最小的 n − f − 1 n-f-1 nf1个,不妨设为 { d i , 1 , d i , 2 , ⋯   , d i , n − f − 1 } \{d_{i,1},d_{i,2},\cdots,d_{i,n-f-1}\} {di,1,di,2,,di,nf1},然后加起来作为该梯度的得分 K r ( i ) = ∑ j = 1 n − f − 1 d i j Kr(i)=\sum^{n-f-1}_{j=1}d_{ij} Kr(i)=j=1nf1dij
  5. 计算所有梯度的得分后,求得分最小的梯度 g ∗ g^* g
  6. 更新: W = W − l r × g ∗ W=W-lr\times g^* W=Wlr×g

  Multi-Krum实在算法中的第5步选出得分最小的 m m m个梯度,最后的梯度为这 m m m个的平均。
  接下来就是代码部分:

# Code for Multi-krum aggregation
def multi_krum(all_updates, n_attackers, multi_k=False):

    candidates = []
    candidate_indices = []
    remaining_updates = all_updates
    all_indices = np.arange(len(all_updates))
	#
    while len(remaining_updates) > 2 * n_attackers + 2:
        torch.cuda.empty_cache()
        distances = []
        for update in remaining_updates:
            distance = []
            for update_ in remaining_updates:
            	# 计算距离
                distance.append(torch.norm((update - update_)) ** 2)
            distance = torch.Tensor(distance).float()
            # None的作用主要是在使用None的位置新增一个维度
            distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)

        distances = torch.sort(distances, dim=1)[0] # 排序
        scores = torch.sum(distances[:, :len(remaining_updates) - 2 - n_attackers], dim=1) # 计算得分,上述算法的第四步
        indices = torch.argsort(scores)[:len(remaining_updates) - 2 - n_attackers] # 返回排序后的值所对应的下标

        candidate_indices.append(all_indices[indices[0].cpu().numpy()]) # 添加一个下标
        all_indices = np.delete(all_indices, indices[0].cpu().numpy())
        candidates = remaining_updates[indices[0]][None, :] if not len(candidates) else torch.cat((candidates, remaining_updates[indices[0]][None, :]), 0)
        remaining_updates = torch.cat((remaining_updates[:indices[0]], remaining_updates[indices[0] + 1:]), 0)
        if not multi_k: # 如果不是multi-krum算法,就只取一个分数最好的candidate
            break
    # print(len(remaining_updates))
    aggregate = torch.mean(candidates, dim=0)
    return aggregate, np.array(candidate_indices)

  上面的代码用了一个比较精妙的循环结构,同时实现了krum和multi-krum算法。

1.4 Fang攻击梯度生成

  接下来,代码开始设计Fang攻击的代码。首先是个工具函数,计算lambda:

# Code for Fang attack on Multi-krum
def compute_lambda_fang(all_updates, model_re, n_attackers):

    distances = []
    n_benign, d = all_updates.shape
    # 计算每个梯度到其他梯度的距离
    for update in all_updates:
        distance = torch.norm((all_updates - update), dim=1)
        distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)
	# 将所有为零的距离改为10000
    distances[distances == 0] = 10000
    distances = torch.sort(distances, dim=1)[0] # 选出距离最小的点
    scores = torch.sum(distances[:, :n_benign - 2 - n_attackers], dim=1)
    min_score = torch.min(scores)
    term_1 = min_score / ((n_benign - n_attackers - 1) * torch.sqrt(torch.Tensor([d]))[0])
    max_wre_dist = torch.max(torch.norm((all_updates - model_re), dim=1)) / (torch.sqrt(torch.Tensor([d]))[0])

    return (term_1 + max_wre_dist)

  紧接着是获得攻击梯度:

def get_malicious_updates_fang(all_updates, model_re, deviation, n_attackers):

    lamda = compute_lambda_fang(all_updates, model_re, n_attackers)
    threshold = 1e-5

    mal_updates = []
    while lamda > threshold:
        mal_update = (- lamda * deviation)

        mal_updates = torch.stack([mal_update] * n_attackers)
        mal_updates = torch.cat((mal_updates, all_updates), 0)

        agg_grads, krum_candidate = multi_krum(mal_updates, n_attackers, multi_k=False)
        
        if krum_candidate < n_attackers:
            return mal_updates
        
        lamda *= 0.5

    if not len(mal_updates):
        print(lamda, threshold)
        mal_update = (model_re - lamda * deviation)
        
        mal_updates = torch.stack([mal_update] * n_attackers)
        mal_updates = torch.cat((mal_updates, all_updates), 0)

    return mal_updates

1.5 开始测试算法

评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值