python列表添加元素为什么是只读的,锁定免费只读列表在Python中吗?

I've done some basic performance and memory consumption benchmarks and I was wondering if there is any way to make things even faster...

I have a giant 70,000 element list with a numpy ndarray, and the file path in a tuple in the said list.

My first version passed a sliced up copy of the list to each of the processes in python multiprocess module, but it would explode ram usage to over 20+ Gigabytes

The second version I moved it into the global space and access it via index such as foo[i] in a loop in each of my processes which seems to put it into a shared memory area/CoW semantics with the processes thus it does not explode the memory usage (Stays at ~3 Gigabytes)

However according to the performance benchmarks/tracing, it seems like the large majority of the application time is now spent in "acquire" mode...

So I was wondering if there is any way i can somehow turn this list into some sort of lockfree/read only so that I can do away with part of the acquire step to help speed up access even more.

Edit 1: Here's the top few line output of the profiling of the app

ncalls tottime percall cumtime percall filename:lineno(function)

65 2450.903 37.706 2450.903 37.706 {built-in method acquire}

39320 0.481 0.000 0.481 0.000 {method 'read' of 'file' objects}

600 0.298 0.000 0.298 0.000 {posix.waitpid}

48 0.271 0.006 0.271 0.006 {posix.fork}

Edit 2: Here's a example of the list structure:

# Sample code for a rough idea of how the list is constructed

sim = []

for root, dirs, files in os.walk(rootdir):

path = os.path.join(root, filename)

image= Image.open(path)

np_array = np.asarray(image)

sim.append( (np_array, path) )

# Roughly it would look something like say this below

sim = List( (np.array([[1, 2, 3], [4, 5, 6]], np.int32), "/foobar/com/what.something") )

Then henceforth the SIM list is to be read only.

解决方案

The multiprocessing module provides exactly what you need: a shared array with optional locking, namely the multiprocessing.Array class. Pass lock=False to the constructor to disable locking.

Edit (taking into account your update): Things are actually considerably more involved than I initially expected. The data of all elements in your list needs to be created in shared memory. Whether you put the list itself (i.e. the pointers to the actual data) in shared memory, does not matter too much because this should be a small compared to the data of all files. To store the file data in shared memory, use

shared_data = multiprocessing.sharedctypes.RawArray("c", data)

where data is the data you read from the file. To use this as a NumPy array in one of the processes, use

numpy.frombuffer(shared_data, dtype="c")

which will create a NumPy array view for the shared data. Similarly, to put the path name into shared memory, use

shared_path = multiprocessing.sharedctypes.RawArray("c", path)

where path is an ordinary Python string. In your processes, you can access this as a Python string by using shared_path.raw. Now append (shared_data, shared_path) to your list. The list will get copied to the other processes, but the actual data won't.

I initially meant to use an multiprocessing.Array for the actual list. This would be perfectly possible and would ensure that also the list itself (i.e. the pointers to the data) is in shared memory. Now I think this is not that important at all, as long as the actual data is shared.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值