一、隐私保护机器学习背景
1.1 数据隐私的重要性
- 高质量数据需求:训练高质量的机器学习模型需要大量的数据,这些数据通常包含用户的敏感信息,如生物数据(图像、声音、基因信息)和金融数据(收入、支出、信贷等)。
- 法律法规监管:随着《个人信息保护法》和GDPR等法律法规的实施,保护数据隐私变得尤为重要。
1.2 安全多方计算(MPC)
- 基本概念:MPC允许多个参与方在不泄露各自数据的情况下协作计算一个函数的结果。这种技术为隐私保护机器学习(PPML)提供了基础。
- 应用场景:MPC可以用于隐私保护训练和推理,确保数据在整个计算过程中始终保持加密状态。
二、SPU架构简介
2.1 SecretFlow-SPU概述
- 系统组件:
- 前端:支持现有的机器学习框架如JAX、PyTorch和TensorFlow。
- 编译器:生成并优化SPU的中间表示(PPHLO),使得机器学习程序可以在MPC环境中高效执行。
- 运行时:通过MPC协议执行PPHLO,确保计算的隐私保护。
2.2 设计目标
- 易用性:通过简单的API接口,用户可以轻松将现有的机器学习程序迁移到SPU。
- 可扩展性:支持多种MPC协议,如ABY3、Cheetah、SPDZ2k。
- 高性能:通过编译优化和并行执行,提升计算效率。
三、NN密态训练与推理示例
3.1 密态训练与推理的基本流程
- 数据来源:数据提供方(如Alice和Bob)负责加载数据。
- 数据加密保护:数据提供方对数据加密后发送给MPC计算方,确保数据在传输和计算过程中的隐私保护。
- 模型计算定义:通过JAX手动实现前向与反向传播,以适应MPC的计算需求。
- 执行密态模型计算:将训练/推理计算图通过SPU编译器转换为密态算子计算图,并由SPU设备按照MPC协议逐个执行。
3.2 示例代码
pythonCopy codedef load_data(path):
# 从本地路径加载数据
return data
def train(x1, x2, y):
x = jax.numpy.concatenate((x1, x2), axis=1)
lr = jax_utils.logistic_regression()
return lr.fit_auto_grad(x, y)
# 初始化设备
init_device(device_config) # p1, p2, mpc
# 加载数据
x1 = device(p1)(load_data)("/data/x1.csv")
x2 = device(p2)(load_data)("/data/x2.csv")
y = device(p2)(load_data)("/data/y.csv")
# 训练
x1_, x2_, y_ = to(x1, mpc), to(x2, mpc), to(y, mpc)
w_ = device(mpc)(train)(x1_, x2_, y_)
# 结果揭示
w = plaintext_to(w_, p1)
- 该代码示例展示了如何加载数据、训练模型和揭示结果的完整流程。
四、复杂建模的应用
4.1 深度神经网络(DNN)
- 逻辑回归(LR):适合手动实现,因为计算相对简单。
- 深度神经网络(DNN):
- 复杂度较高,建议使用stax或flax库进行建模,以简化实现过程。
- Flax实现MLP:用于构建多层感知机模型。
- Stax实现CNN:用于构建卷积神经网络模型。
4.2 迁移Huggingface模型
- 模型复用:可以将Huggingface构建的模型迁移到SPU进行密态推理。
- 示例代码:
pythonCopy codedef run_on_spu():
inputs_ids = tokenizer.encode('I enjoy walking with my cute dog’, return_tensors='jax’)
input_ids = ppd.device("P1")(lambda x: x)(inputs_ids)
params = ppd.device("P2")(lambda x: x)(pretrained_model.params)
outputs_ids = ppd.device("SPU")(text_generation,)(input_ids, params)
outputs_ids = ppd.get(outputs_ids)
return outputs_ids
- 该代码示例展示了如何将Huggingface GPT-2模型迁移到SPU进行密态推理。
五、常见问题与解决方案
5.1 支持的算子
- 算子支持列表:参考SPU文档,了解支持的算子列表和相关状态。
5.2 非线性算子的误差范围
- 误差来源:系统设定误差(如环大小、定点位数、截断协议等)和非线性算子拟合误差。
5.3 密态运行错误
- 常见原因:
- 实现的算法是否可JIT编译。
- 是否使用了SPU不支持的算子。
- 输入数据或参数是否太大或太小,导致溢出或下溢。
5.4 性能优化
- 算法层优化:减少耗时算子的调用,避免重复计算,使用高阶tensor运算代替for循环。
- 运行时优化:尝试开启更多并行功能,如DAG并行。
六、实践
nn_with_spu
import sys
!{sys.executable} -m pip install flax==0.6.0
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import Normalizer
def breast_cancer(party_id=None, train: bool = True)->(np.ndarray, np.ndarray):
x, y= load_breast_cancer(return_X_y=True)
x = (x-np.min(x))/(np.max(x)- np.min(x))
x_train,x_test,y_train,y_test=train_test_split(
x, y, test_size=0.2, random_state=42
)
if train:
if party_id:
if party_id == 1:
return x_train[:,:15], _
else:
return x_train[:, 15:],y_train
else:
return x_train,y_train
else:
return x_test,y_test
from typing import Sequence
import flax.linen as nn
FEATURES =[30, 15, 8, 1]
class MLP(nn.Module):
features: Sequence[int]
@nn.compact
def __call__(self, x):
for feat in self.features[:-1]:
x= nn.relu(nn.Dense(feat)(x))
x= nn.Dense(self.features[-1])(x)
return x
import jax.numpy as jnp
def predict(params, x):
from typing import Sequence
import flax.linen as nn
FEATURES =[30, 15, 8,1]
class MLP(nn.Module):
features: Sequence[int]
@nn.compact
def __call__(self, x):
for feat in self.features[:-1]:
x= nn.relu(nn.Dense(feat)(x))
x= nn.Dense(self.features[-1])(x)
return x
return MLP(FEATURES).apply(params, x)
def loss_func(params, x, y):
pred =predict(params, x)
def mse(y, pred):
def squared_error(y,y_pred):
return jnp.multiply(y-y_pred,y-y_pred)/2.0
return jnp.mean(squared_error(y, pred))
return mse(y,pred)
def train_auto_grad(x1,x2,y, params,n_batch=10,n_epochs=10, step_size=0.01):
x= jnp.concatenate((x1, x2), axis=1)
xs =jnp.array_split(x,len(x)/n_batch,axis=0)
ys =jnp.array_split(y,len(y)/n_batch, axis=0)
def body_fun(_,loop_carry):
params =loop_carry
for x,y in zip(xs,ys):
_,grads = jax.value_and_grad(loss_func)(params, x, y)
params = jax.tree_util.tree_map(
lambda p,g:p-step_size*g,params,grads
)
return params
params =jax.lax.fori_loop(0,n_epochs, body_fun, params)
return params
def model_init(n_batch=10):
model = MLP(FEATURES)
return model.init(jax.random.PRNGKey(1),jnp.ones((n_batch, FEATURES[0])))
from sklearn.metrics import roc_auc_score
def validate_model(params,X_test,y_test):
y_pred =predict(params,X_test)
return roc_auc_score(y_test,y_pred)
import jax
# Load the data
x1,_= breast_cancer(party_id=1, train=True)
x2, y=breast_cancer(party_id=2,train=True)
# Hyperparameter
n_batch =10
n_epochs =10
step_size=0.01
#Train the model
init_params = model_init(n_batch)
params = train_auto_grad(x1,x2,y,init_params,n_batch,n_epochs, step_size)
# Test the model
X_test,y_test=breast_cancer(train=False)
auc =validate_model(params,X_test,y_test)
print(f'auc={auc}')
#使用spu训练
import secretflow as sf
# Check the version of your SecretFlow
print('The version of SecretFlow:{}'.format(sf.__version__))
# In case you have a running secretflow runtime already.
sf.shutdown()
sf.init(['alice','bob'],address='local')
alice, bob = sf.PYU('alice'),sf.PYU('bob')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))
x1,_= alice(breast_cancer)(party_id=1,train=True)
x2,y= bob(breast_cancer)(party_id=2, train=True)
init_params = model_init(n_batch)
device =spu
x1_, x2_,y_=x1.to(device), x2.to(device),y.to(device)
init_params_= sf.to(alice,init_params).to(device)
params_spu = spu(train_auto_grad, static_argnames=['n_batch', 'n_epochs', 'step_size'])(
x1_,x2_,y_,init_params_,n_batch=n_batch, n_epochs=n_epochs, step_size=step_size
)
#使用reveal将模型参数转换为明文
params_spu=spu(train_auto_grad)(x1_,x2_,y_,init_params)
params = sf.reveal(params_spu)
print(params)
#测试密文训练得到的模型的效果
X_test,y_test = breast_cancer(train=False)
auc =validate_model(params,X_test,y_test)
print(f'auc={auc}')
明文的评估结果:
密文的评估结果:
gpt2_with_spu
import sys
!{sys,executable} -m pip install transformers[flax]
import os
import sys
!{sys.executable} -m pip install huggingface_hub
os.environ['HF_ENDPOINT']='https://hf-mirror.com'
!pip install transformers[flax] -U
from transformers import AutoTokenizer, FlaxGPT2LMHeadModel, GPT2Config
tokenizer= AutoTokenizer.from_pretrained("gpt2")
pretrained_model = FlaxGPT2LMHeadModel.from_pretrained("gpt2" )
def text_generation(input_ids, params):
config =GPT2Config()
model= FlaxGPT2LMHeadModel(config=config)
for _ in range(10):
outputs = model(input_ids=input_ids,params=params)
next_token_logits=outputs[0][0,-1,:]
next_token =jnp.argmax(next_token_logits)
input_ids = jnp.concatenate([input_ids, jnp.array([[next_token]])], axis=1)
return input_ids
#生成明文:
import jax.numpy as jnp
inputs_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='jax')
outputs_ids= text_generation(inputs_ids, pretrained_model.params)
print('_'*65+'\nRun on CPU:\n'+'_'* 65)
print(tokenizer.decode(outputs_ids[0],skip_special_tokens=True))
print('_ '*65)
#生成密文
import secretflow as sf
#In case you have a running secretflow runtime already.
sf.shutdown()
sf.init(['alice','bob','carol'], address='local')
alice,bob= sf.PYU('alice'),sf.PYU('bob')
conf = sf.utils.testing.cluster_def(['alice', 'bob','carol'])
conf['runtime_config']['fxp_exp_mode'] = 1
conf['runtime_config']['experimental_disable_mmul_split'] = True
spu = sf.SPU(conf)
def get_model_params():
pretrained_model = FlaxGPT2LMHeadModel.from_pretrained("gpt2")
return pretrained_model.params
def get_token_ids():
tokenizer = AutoTokenizer.from_pretrained("gpt2")
return tokenizer.encode('I enioy walking with my cute dog', return_tensors='jax')
model_params =alice(get_model_params)()
input_token_ids = bob(get_token_ids)()
device = spu
model_params_,input_token_ids_= model_params.to(device), input_token_ids.to(device)
output_token_ids =spu(text_generation)(input_token_ids_, model_params )
#检查SPU输出是否和明文一致
outputs_ids = sf.reveal(output_token_ids)
print('-'*65 +'\nRun on SPU:\n'+'_'* 65)
print(tokenizer.decode(outputs_ids[0],skip_special_tokens=True))
print('-'* 65)
生成的明文:
明文跑了17s就跑出来,密文跑了一个小时还没跑出来,后面结果出来再补上~
第二次更新,我是在Windows环境下,WSL2-Ubuntu的docker容器环境下跑的secretflow,因内存OOM问题,密文跑不出来,在参考隐语课程学习笔记10-基于SPU的机器学习建模实操
后,先暂时调用SPU设备但是不在SPU设备上执行是可以执行成功的,其完整代码如下:
import sys
!{sys.executable} -m pip install transformers[flax]
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
from transformers import AutoTokenizer, FlaxGPT2LMHeadModel, GPT2Config
tokenizer= AutoTokenizer.from_pretrained("gpt2")
pretrained_model = FlaxGPT2LMHeadModel.from_pretrained("gpt2")
import jax.numpy as jnp
layers_selected = 12 # 选择的topk层数
def text_generation(input_ids,params):
# 自定义配置
config = GPT2Config(
n_layer=layers_selected, # 仅使用选择的topk层
vocab_size=pretrained_model.config.vocab_size, # 继承预训练模型的词汇表大小
n_positions=pretrained_model.config.n_positions, # 继承预训练模型的最大位置
n_ctx=pretrained_model.config.n_ctx, # 继承预训练模型的上下文大小
n_embd=pretrained_model.config.n_embd, # 继承预训练模型的嵌入维度
n_head=pretrained_model.config.n_head, # 继承预训练模型的头数
)
model = FlaxGPT2LMHeadModel(config=config)
for _ in range(10):
outputs = model(input_ids=input_ids, params=params)
next_token_logits=outputs[0][0,-1,:]
next_token=jnp.argmax(next_token_logits)
input_ids = jnp.concatenate([input_ids, jnp.array([[next_token]])],axis=1)
return input_ids
# 获取预训练模型的参数
params = pretrained_model.params
# 只保留topk层的参数
new_params = {
'transformer': {
'wte': params['transformer']['wte'],
'wpe': params['transformer']['wpe'],
'h': {str(i): params['transformer']['h'][str(i)] for i in range(layers_selected)}, # 仅使用选择的topk层
'ln_f': params['transformer']['ln_f'],
}
}
import jax.numpy as jnp
inputs_ids = tokenizer.encode('I enjoy walking with my cute dog',return_tensors='jax')
outputs_ids = text_generation(inputs_ids, new_params)
print('-' * 65 + '\nRun on CPU:\'n' + '-' * 65)
print(tokenizer.decode(outputs_ids[0], skip_special_tokens=True))
print('-'*65)
import secretflow as sf
# In case you have a running secretflow runtime alLready
sf.shutdown()
sf.init(['alice','bob'],address='local')
alice,bob=sf.PYU('alice'), sf.PYU('bob')
conf = sf.utils.testing.cluster_def(['alice', 'bob'])
conf['runtime_config']['fxp_exp_mode'] = 1
conf['runtime_config']['experimental_disable_mmul_split'] = True
spu = sf.SPU(conf)
def get_model_params():
pretrained_model = FlaxGPT2LMHeadModel.from_pretrained("gpt2")
# 获取预训练模型的参数
params = pretrained_model.params
new_params = {
'transformer': {
'wte': params['transformer']['wte'],
'wpe': params['transformer']['wpe'],
'h': {str(i): params['transformer']['h'][str(i)] for i in range(layers_selected)}, # 仅使用指定层数
'ln_f': params['transformer']['ln_f'],
}}
return new_params
def get_token_ids():
return tokenizer.encode('I enjoy walking with my cute dog', return_tensors='jax')
model_params=alice(get_model_params)()
# input_token_ids=bob(get_token_ids)()
input_token_ids=alice(get_token_ids)()
#完成数据获取
print("step1")
#device=spu
#model_params_, input_token_ids_ = model_params.to(device), input_token_ids.to(device)
#完成生成任务之后进行标记
print("step2")
output_token_ids = alice(text_generation)(input_token_ids, model_params)
# output_token_ids = spu(text_generation)(input_token_ids_, model_params_)
outputs_ids=sf.reveal(output_token_ids)
print('-' * 65 + '\nRun on CPU:\'n' + '-' * 65)
print(tokenizer.decode(outputs_ids[0], skip_special_tokens=True))
print('-'*65)
待后续上服务器后再执行完整的run_on_spu~