[Deep learning 系列] Pytorch 与 Numpy 的磁盘读写效率实验对比

前言

作者经常看见很多人写代码进行数据保存与加载的时候,会选择numpy与pytorch混用,这是否是有必要的?
本文以比较torch与numpy的磁盘读写效率来进行研究探讨。

Pytorch 与 numpy 的磁盘读写效率实验

import torch
import numpy as np
import time

save() method

Python List 数据类型

length = int(1e7)
l = [i for i in range(length)]
torch.save()
start = time.time()
u_start = time.process_time()
torch.save(l, 'torch.pth')
u_end = time.process_time()
end = time.time()
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
cpu time:  2.796875
time:  4.318857431411743
numpy.save()
start = time.time()
u_start = time.process_time()
np.save('np.npy', l)
u_end = time.process_time()
end = time.time()
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
cpu time:  0.484375
time:  0.6534786224365234

Pytorch Tensor 数据类型

length = int(1e6)
batch_size = 256
t = torch.randn(batch_size, length, dtype=torch.float32).to('cuda:0')
torch.save()
start = time.time()
u_start = time.process_time()
torch.save(t, 'torch.pth')
u_end = time.process_time()
end = time.time()
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
cpu time:  0.375
time:  2.1566760540008545
numpy.save()
start = time.time()
u_start = time.process_time()
np.save('np.npy', t.to('cpu'))
u_end = time.process_time()
end = time.time()
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
cpu time:  0.375
time:  2.4183483123779297

Numpy ndarray 数据类型

length = int(1e7)
a = np.array(range(length))
torch.save()
start = time.time()
u_start = time.process_time()
torch.save(a, 'torch.pth')
u_end = time.process_time()
end = time.time()
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
cpu time:  0.21875
time:  0.3329944610595703
numpy.save()
start = time.time()
u_start = time.process_time()
np.save('np.npy', a)
u_end = time.process_time()
end = time.time()
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
cpu time:  0.078125
time:  0.14899635314941406

load() method

Python List 数据类型

length = int(1e7)
l = [i for i in range(length)]
torch.load()
torch.save(l, 'torch.pth')
start = time.time()
u_start = time.process_time()
l = torch.load('torch.pth')
u_end = time.process_time()
end = time.time()
print(type(l))
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
<class 'list'>
cpu time:  1.1875
time:  1.6928999423980713
numpy.load()
np.save('np.npy', l)
start = time.time()
u_start = time.process_time()
l = np.load('np.npy')
u_end = time.process_time()
end = time.time()
print(type(l))
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
<class 'numpy.ndarray'>
cpu time:  0.140625
time:  0.18318390846252441

Pytorch Tensor 数据类型

length = int(1e6)
batch_size = 256
t = torch.randn(batch_size, length, dtype=torch.float32).to('cuda:0')
torch.load()
torch.save(t, 'torch.pth')
start = time.time()
u_start = time.process_time()
t = torch.load('torch.pth')
u_end = time.process_time()
end = time.time()
print(type(t))
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
<class 'torch.Tensor'>
cpu time:  0.46875
time:  0.6859660148620605
numpy.load()
np.save('np.npy', t.to('cpu'))
start = time.time()
u_start = time.process_time()
t = torch.from_numpy(np.load('np.npy'))
u_end = time.process_time()
end = time.time()
print(type(t))
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
<class 'torch.Tensor'>
cpu time:  0.21875
time:  0.3618292808532715

Numpy ndarray 数据类型

length = int(1e7)
a = np.array(range(length))
torch.load()
torch.save(a, 'torch.pth')
start = time.time()
u_start = time.process_time()
a = torch.load('torch.pth')
u_end = time.process_time()
end = time.time()
print(type(a))
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
<class 'numpy.ndarray'>
cpu time:  0.09375
time:  0.14090633392333984
numpy.load()
np.save('np.npy', a)
start = time.time()
u_start = time.process_time()
a = np.load('np.npy')
u_end = time.process_time()
end = time.time()
print(type(a))
print('cpu time: ', u_end - u_start)
print('time: ', end - start)
<class 'numpy.ndarray'>
cpu time:  0.03125
time:  0.022988080978393555

结论

使用numpy对大量数据进行保存与加载要比使用pytorch效率高,总体上相差一个数量级左右。在进行大量数据的保存与加载时,可以考虑使用numpy来进行效率优化。
(如有错误,望请指正)

  • 12
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

kkdark

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值