python中字符串比大小,Python：内存中字符串的大小

最新推荐文章于 2024-04-02 11:40:19 发布

有人叫我黑花

最新推荐文章于 2024-04-02 11:40:19 发布

阅读量190

点赞数

文章标签： python中字符串比大小

Consider the following code:

arr = []

for (str, id, flag) in some_data:

arr.append((str, id, flag))

Imagine the input strings being 2 chars long in average and 5 chars max and some_data having 1 million elements.

What will the memory requirement of such a structure be?

May it be that a lot of memory is wasted for the strings? If so, how can I avoid that?

解决方案

In this case, because the strings are quite short, and there are so many of them, you stand to save a fair bit of memory by using intern on the strings. Assuming there are only lowercase letters in the strings, that's 26 * 26 = 676 possible strings, so there must be a lot of repetitions in this list; intern will ensure that those repetitions don't result in unique objects, but all refer to the same base object.

It's possible that Python already interns short strings; but looking at a number of different sources, it seems this is highly implementation-dependent. So calling intern in this case is probably the way to go; YMMV.

As an elaboration on why this is very likely to save memory, consider the following:

>>> sys.getsizeof('')

>>> sys.getsizeof('a')

>>> sys.getsizeof('ab')

>>> sys.getsizeof('abc')

Adding single characters to a string adds only a byte to the size of the string itself, but every string takes up 40 bytes on its own.