I am using Python 3.4 32 bits on win 7.
I found that an integer in an numpy array has 4 bytes, but in a list it has 10 bytes.
import numpy as np
s = 10;
lt = [None] * s;
cnt = 0 ;
for i in range(0, s):
lt[cnt] = i;
cnt += 1;
lt = [x for x in lt if x is not None];
a = np.array(lt);
print("len(a) is " + str(len(a)) + " size is " + str(sys.getsizeof(a)) \
+ " bytes " + " a.itemsize is " + str(a.itemsize) + " total size is " \
+ str(a.itemsize * len(a)) + " Bytes , len(lt) is " \
+ str(len(lt)) + " size is " + str(sys.getsizeof(lt)) + " Bytes ");
len(a) is 10 size is 40 bytes a.itemsize is 4 total size is 40 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 12 Bytes
Because in a list, each element has to keep a pointer to point to the next element ?
If I assigned a string to the list:
lt[cnt] = "A";
len(a) is 10 size is 40 bytes a.itemsize is 4 total size is 40 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 30 Bytes
So, in array, each element has 4 bytes and in list, it is 30 bytes.
But, if I tried:
lt[cnt] = "AB";
len(a) is 10 size is 40 bytes a.itemsize is 8 total size is 80 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 33 Bytes
In array, each element has 8 bytes but in list, it is 33 bytes.
if I tried :
lt[cnt] = "csedvserb revrvrrw gvrgrwgervwe grujy oliulfv qdqdqafwg5u u56i78k8 awdwfw"; # 73 characters long
len(a) is 10 size is 40 bytes a.itemsize is 292 total size is 2920 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 246 Bytes
In array, each element has 292 bytes (=73 * 4) but in list, it has 246 bytes ?
Any explanation will be appreciated.
解决方案
The element size in arrays is easy - it's determined by the dtype, and as your code shows can be found with .itemsize. 4bytes is common, such as for np.int32, np.float64. Unicode strings are also allocated 4 bytes per character - though the real unicode uses a variable number of characters.
The per element size for lists (and tuples) is trickier. A list does not contain the elements directly, rather it contains pointers to objects which are stored elsewhere. Your list size records the number of pointers, plus a pad. The pad lets it grow in size (with .append) efficiently. All your lists have the same size, regardless of 'first item' size.
My data:
In [2324]: lt=[None]*10
In [2325]: sys.getsizeof(lt)
Out[2325]: 72
In [2326]: lt=[i for i in range(10)]
In [2327]: sys.getsizeof(lt)
Out[2327]: 96
In [2328]: lt=['A' for i in range(10)]
In [2329]: sys.getsizeof(lt)
Out[2329]: 96
In [2330]: lt=['AB' for i in range(10)]
In [2331]: sys.getsizeof(lt)
Out[2331]: 96
In [2332]: lt=['ABCDEF' for i in range(10)]
In [2333]: sys.getsizeof(lt)
Out[2333]: 96
In [2334]: lt=[None for i in range(10)]
In [2335]: sys.getsizeof(lt)
Out[2335]: 96
and for the corresponding arrays:
In [2344]: lt=[None]*10; a=np.array(lt)
In [2345]: a
Out[2345]: array([None, None, None, None, None, None, None, None, None, None], dtype=object)
In [2346]: a.itemsize
Out[2346]: 4
In [2347]: lt=['AB' for i in range(10)]; a=np.array(lt)
In [2348]: a
Out[2348]:
array(['AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB'],
dtype='
In [2349]: a.itemsize
Out[2349]: 8
When the list contains None, the array is object dtype, and the elements are all pointers (4 bytes integers).