深度学习大潮兴起,用不用得到的都要下水弄潮一波。
本文重点总结从Pytorch到飞桨的一些概念、编程惯例、API的对应关系,持续补充中...
适合:接触过Pytorch,想快速上手PaddlePaddle的同学。
总结:基于动态图的PaddlePaddle使用起来比Pytorch更容易上手,更加简便,且PaddlePaddle提供了完善的模型优化、终端部署支持,转战飞桨前景光明啊。
之前跟着《PyTorch深度学习》那本书学了一遍Pytorch框架,但苦于手中无卡,不能亲手实践大网络;幸好百度AI Studio目前提供了基于自家框架PaddlePaddle(飞桨)的免费GPU资源(V100),可以让穷人家的孩子也能上手过过深度学习的瘾。跟随百度最近的“深度学习7日入门-CV疫情特辑”课程完成7天打卡,体验了PaddlePaddle的使用魅力,也再次稍加总结。
最新的1.7版飞桨支持动态图机制(Dygraph),与Pytorch的动态图编程方式基本相同,熟悉Pytorch的同学基本可以无伤切换。希望进一步学习的同学建议去百度 AI studio社区加入课程学习一番。
一、以房价预测为例
1. 张量Tensor操作
paddle.fluid.Variable <---> torch.Variable
与numpy矩阵的转换
Paddle | Pytorch | |
tensor转numpy | fluid.Variable.numpy() | torch.Variable.numpy() |
numpy转tensor | fluid.dygraph.to_variable(x) | torch.from_numpy() |
2. 数据集
(1) Paddle
数据读取器reader
飞桨中训练数据和测试数据一般封装成名为reader的函数对象,调用后返回Python yield generator;
shuffle修饰器,输入reader,输出随机打乱数据顺序后的reader;
batch修饰器,输入reader,输出batched reader,调用后每次yield一个mini batch;
示例:
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500
),
batch_size=BATCH_SIZE
)
(2) Pytorch
数据生成器Dataset,继承自torch.utils.data.Dataset类,定义__getitem__和__len__方法:
def data_preprocess():
boston = load_boston()
ss_input = MinMaxScaler()
ss_label = MinMaxScaler()
input = ss_input.fit_transform(boston['data'])
label = ss_label.fit_transform(boston['target'][:, np.newaxis])
return input, label
class TrainDataset(Dataset):
def __init__(self):
super(TrainDataset, self).__init__()
input, label = data_preprocess()
self.input = input[0:int(train_data_ratio*len(input))]
self.label = label[0:int(train_data_ratio*len(input))]
def __getitem__(self, idx):
return self.input[idx], self.label[idx]
def __len__(self):
return len(self.input)
数据加载器,配置shuffle、batch等参数:
train_dataloader = DataLoader(dataset=TrainDataset(),
batch_size=32,
shuffle=True,
num_workers=4,
pin_memory=True,
drop_last=True)
3. 网络定义
(1) Paddle
继承自fluid.dygraph.Layer类,实现init和forward方法:
class Linear_net(fluid.dygraph.Layer):
def __init__(self, input_size, hidden_size ,output_size):
super(Linear_net, self).__init__()
self.fc1 = fluid.dygraph.Linear(input_size, hidden_size, act='relu')
self.fc2 = fluid.dygraph.Linear(hidden_size, output_size, act='sigmoid')
def forward(self, input):
x = self.fc1(input)
x = self.fc2(x)
return x
(2) Pytorch
继承自torch.nn.Module类,同样实现init和forward方法:
class linear_network(nn.Module):
def __init__(self, input_num, hidden_num, output_num):
super(linear_network, self).__init__()
self.net = nn.Sequential(
nn.Linear(input_num, hidden_num),
nn.ReLU(),
nn.Linear(hidden_num, output_num),
nn.Sigmoid()
)
def forward(self, input):
return self.net(input)
4. 训练过程
两个框架的训练过程都包括以下几个步骤:
for data in dataset:
累积梯度清空 #**clear accumulative gradients**
前向预测 #**calc output**
计算损失 #**calc loss**
反向传播 #**calc gradients**
优化参数 #**modify learning parameters**
(1) Paddle
定义train函数:
def train_linear(reader, model, optimizer):
avg_loss = 0
for batch_id, data in enumerate(reader()):
dy_x = np.array([x[0] for x in data]).astype('float32')
dy_y = np.array([x[1] for x in data]).astype('float32')
dy_x = fluid.dygraph.to_variable(dy_x)
dy_y = fluid.dygraph.to_variable(dy_y)
predict = model(dy_x)
cost = fluid.layers.square_error_cost(input=predict, label=dy_y)
avg_loss = fluid.layers.mean(cost)
avg_loss.backward()
optimizer.minimize(avg_loss)
model.clear_gradients()
return float(avg_loss.numpy()) # return avg_loss of the last batch_id
调用train函数,迭代训练:
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
with fluid.dygraph.guard(place):
epoch_num = 100
for epoch in range(epoch_num):
linear_model = Linear_net(13, 1)
sgd_optimizer = fluid.optimizer.SGD(learning_rate = 0.01, parameter_list=linear_model.parameters())
avg_loss = train_linear(train_reader, linear_model, sgd_optimizer)
print('Linear model training avg loss : {0}'.format(avg_loss))
(2) Pytorch
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = linear_network(13,14,1).to(device)
cost = nn.MSELoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01)
for i in range(max_epoch):
model.train()
for data in train_dataloader:
input, label = data
preds = model(input.to(device).float())
loss = cost(preds, label.to(device).float())
batch_loss.append(loss.data.numpy())
optimizer.zero_grad()
loss.backward()
optimizer.step()
API对照:
Paddle | Pytorch | |
清零累积梯度 | model.clear_gradients() | optimizer.zero_grad() |
网络参数优化 | optimizer.minimize(avg_loss) | optimizer.step() |
注意:
Pytorch需要用 .to(device) 、.cpu()、.cuda()等 函数手动进行CPU与GPU之间的变量、模型交互;
Paddle使用with fluid.dygraph.guard(place):控制运行的目标设备,无需手动进行内存交互。
5. 保存网络
(1) Paddle
fluid.save_dygraph(model.state_dict(), "model")
(2) Pytorch
torch.save(model.state_dict(), os.path.join(args.outputs_dir, epoch_{}.pth'.format(epoch)))
二、模型转换
百度AI studio提供了将Pytorch项目模型转换为Paddle模型的方法: