python 线程开销_在Python中的线程之间共享字典时，是否可以避免锁定开销？

最新推荐文章于 2021-09-05 22:19:51 发布

weixin_39715652

最新推荐文章于 2021-09-05 22:19:51 发布

阅读量119

点赞数

文章标签： python 线程开销

I have a multi-threaded application in Python where threads are reading very large (so I cannot copy them to thread-local storage) dicts (read from disk and never modified). Then they process huge amounts of data using the dicts as read-only data:

# single threaded

d1,d2,d3 = read_dictionaries()

while line in stdin:

stdout.write(compute(line,d1,d2,d3)+line)

I am trying to speed this up by using threads, which would then each read its own input and write its own output, but since the dicts are huge, I want the threads to share the storage.

IIUC, every time a thread reads from the dict, it has to lock it, and that imposes a performance cost on the application. This data locking is not necessary because the dicts are read-only.

Does CPython actually lock the data individually or does it just use the GIL?

If, indeed, there is per-dict locking, is there a way to avoid it?

解决方案

Multithreading processing in python is useless. It's better to use multiprocessing module. Because multithreading can give positive effort only in lower number of cases.

Python implementation detail: In CPython, due to the Global

Interpreter Lock, only one thread can execute Python code at once

(even though certain performance-oriented libraries might overcome

this limitation). If you want your application to make better use of

the computational resources of multi-core machines, you are advised to

use multiprocessing. However, threading is still an appropriate model

if you want to run multiple I/O-bound tasks simultaneously.

Official documentation.

Without any code examples from your side I can only recommend to split your big dictionary on several parts and process every part using Pool.map. And merge results in main process.

Unfortunately, it's impossible to share a lot of memory between different python processes effective (we are not talking about shared memory pattern based on mmap). But you can read different parts of your dictionary in different processes. Or just read entire dictionary in main process and give a small chunks to child processes.

Also, I should warn you that you should be very carefully with multiprocessing algorithms. Because every extra megabytes will be multiplied on number of process.

So, based on your pseudocode example I can assume two possible algorithm based on compute function:

# "Stateless"

for line in stdin:

res = compute_1(line) + compute_2(line) + compute_3(line)

print res, line

# "Shared" state

for line in stdin:

res = compute_1(line)

res = compute_2(line, res)

res = compute_3(line, res)

print res, line

In first case, you can create a several workers, read each dictionary in separate worker based on Process class (it's good idea to decrease memory usage for each process), and compute it like a production line.

In second case, you have a shared state. For each next worker you need a result of previous one. It's worst case for multithreading/multiprocessing programming. But you can write algorithm there several workers are using same Queue and pushing result to it without waiting finish of all cycle. And you just share a Queue instance between your processes.