python上传大文件s3_如何在Python中有效地将小文件上传到Amazon S3

Recently, I need to implement a program to upload files resides in Amazon EC2 to S3 in Python as quickly as possible. And the size of files are 30KB.

I have tried some solutions, using multiple threading, multiple processing, co-routine. The following is my performance test result on Amazon EC2.

3600 (the amount of files) * 30K (file size) ~~ 105M (Total) --->

**5.5s [ 4 process + 100 coroutine ]**

10s [ 200 coroutine ]

14s [ 10 threads ]

The code as following shown

For multithreading

def mput(i, client, files):

for f in files:

if hash(f) % NTHREAD == i:

put(client, os.path.join(DATA_DIR, f))

def test_multithreading():

client = connect_to_s3_sevice()

files = os.listdir(DATA_DIR)

ths = [threading.Thread(target=mput, args=(i, client, files)) for i in range(NTHREAD)]

for th in ths:

th.daemon = True

th.start()

for th in ths:

th.join()

For coroutine

client = connect_to_s3_sevice()

pool = eventlet.GreenPool(int(sys.argv[2]))

xput = functools.partial(put, client)

files = os.listdir(DATA_DIR)

for f in files:

pool.spawn_n(xput, os.path.join(DATA_DIR, f))

pool.waitall()

For multiprocessing

def pproc(i):

client = connect_to_s3_sevice()

files = os.listdir(DATA_DIR)

pool = eventlet.GreenPool(100)

xput = functools.partial(put, client)

for f in files:

if hash(f) % NPROCESS == i:

pool.spawn_n(xput, os.path.join(DATA_DIR, f))

pool.waitall()

def test_multiproc():

procs = [multiprocessing.Process(target=pproc, args=(i, )) for i in range(NPROCESS)]

for p in procs:

p.daemon = True

p.start()

for p in procs:

p.join()

The configuration of the machine is Ubuntu 14.04, 2 CPUs (2.50GHz), 4G Memory

The highest speed reached is about 19Mb/s (105 / 5.5). Overall, it is too slow. Any way to speed it up? Does stackless python could do it faster?

解决方案

Sample parallel upload times to Amazon S3 using the Python boto SDK are available here:

Rather than writing the code yourself, you might also consider calling out to the AWS Command Line Interface (CLI), which can do uploads in parallel. It is also written in Python and uses boto.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值