python bytes和bytearray_Python bytearray与字节列表

I'm trying to find out the most efficient way to create a long byte string (or bytearray) by concatenating multiple shorter strings when the length of the whole string is known beforehand. I made this script and came up with these results:

import time

MSG = b'test message'

COUNT = 30000

def bytes_list_test():

tStart = time.clock()

l = []

for i in range(COUNT):

l.append(MSG)

bs = b''.join(l)

print('byte list time:', time.clock() - tStart)

def bytearray_test():

tStart = time.clock()

ba = bytearray()

for i in range(COUNT):

for c in MSG:

ba.append(c)

print('array time:', time.clock() - tStart)

def initialized_bytearray_test():

tStart = time.clock()

ba = bytearray([0x00]*len(MSG)*COUNT)

for i in range(COUNT):

ba[i*len(MSG):i*len(MSG)+len(MSG)] = MSG

print('initialized array time:', time.clock() - tStart)

bytes_list_test()

bytearray_test()

initialized_bytearray_test()

Results:

byte list time: 0.0076534920117410365

array time: 0.08107178658246994

initialized array time: 0.08843219671325642

A few questions:

1) Is creating a list of bytes and using the join() method the way to go as implied by the results?

2) Why is using a list of bytes so much faster than using a bytearray which seems like it's designed for this type of thing?

3) You would think the initialized array would be faster than the uninitialized array because the initialized array doesn't have to be resized (Note that it did occasionally perform better, but not by much and inconsistently so). Is it not faster because of the slicing operation?

解决方案

The first function creates a list of pointers to the same object (NOT a list of bytes), then join would do one memory allocation and COUNT calls to memcpy.

You can make the first function times faster (5x in my test) by dropping temporary list and using itertools.repeat:

def bytes_list_test_opt():

tStart = time.clock()

bs = b''.join(itertools.repeat(MSG, COUNT))

print('byte list opt time:', time.clock() - tStart)

or, in this particular case, simply use * operator of bytes objects, which does exactly that:

bs = MSG*COUNT

The second function repeatedly iterates over MSG, stores data byte-by-byte and has to repeatedly reallocate memory as the bytearray grows.

You can make the second function almost as fast as the original (unoptimized) first one by replacing iteration with single call to extend:

def bytearray_test_opt():

tStart = time.clock()

ba = bytearray()

for i in range(COUNT):

ba.extend(MSG)

print('array opt time:', time.clock() - tStart)

After this modification, the second function will be slower than the first one only because of additional reallocations (~15% in my test).

The third function uses bytearray's slice assignment, which accepts iterable and seems to be doing the same byte-by-byte iteration without recognizing that they could just memcpy bytes into the place. This looks like a defect in the standard library that can be fixed.

As you see from the previous optimization, allocations take very small amount time compared to byte-by-byte copying, so preallocating has no visible impact here. You could save some time on doing fewer calculations, but it wont help much either:

def initialized_bytearray_test_opt():

tStart = time.clock()

L = len(MSG)

ba = bytearray(L*COUNT)

ofs = 0

for i in range(COUNT):

ba[ofs : ofs+L] = MSG

ofs += L

print('initialized array opt time:', time.clock() - tStart)

The final timings from my machine:

byte list time: 0.004823000000000001

byte list opt time: 0.0008649999999999977

array time: 0.043324

array opt time: 0.005505999999999997

initialized array time: 0.05936899999999999

initialized array opt time: 0.040164000000000005

P.S. Use timeit module to perform measures like this, it provides much higher accuracy.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值