Python Multiprocessing

26 篇文章 1 订阅
24 篇文章 0 订阅

official documentation: 

multiprocessing — Process-based parallelism — Python 3.10.4 documentation

breakdown of the API:

Python multiprocessing - process-based parallelism in Python

Barebone Example:

#!/usr/bin/python

from multiprocessing import Process
import time

def fun():

    print('starting fun')
    time.sleep(2)
    print('finishing fun')

def main():

    p = Process(target=fun)
    p.start()
    p.join()


if __name__ == '__main__':

    print('starting main')
    main()
    print('finishing main')

Multiprocessing vs. Threading in Python: What you need to know.

What Is Threading? Why Might You Want It?

By nature, Python is a linear language, but the threading module comes in handy when you want a little more processing power. While threading in Python cannot be used for parallel CPU computation, it's perfect for I/O operations such as web scraping because the processor is sitting idle waiting for data.

Threading is game-changing because many scripts related to network/data I/O spend the majority of their time waiting for data from a remote source. Because downloads might not be linked (i.e., scraping separate websites), the processor can download from different data sources in parallel and combine the result at the end. For CPU intensive processes, there is little benefit to using the threading module.

What Is Multiprocessing? How Is It Different Than Threading?

Without multiprocessing, Python programs have trouble maxing out your system's specs because of the GIL (Global Interpreter Lock). Python wasn't designed considering that personal computers might have more than one core (shows you how old the language is), so the GIL is necessary because Python is not thread-safe and there is a globally enforced lock when accessing a Python object. Though not perfect, it's a pretty effective mechanism for memory management. What can we do?

Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. Though it is fundamentally different from the threading library, the syntax is quite similar. The multiprocessing library gives each process its own Python interpreter and each their own GIL.

Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don't share memory, they can't modify the same memory concurrently.

Keynotes

Shared variable in python's multiprocessing - Stack Overflow

==> for a mutable object to be updated by all processes, it must be declared as a shared object by Value(), Array(), or their respective Manager. variant nad dict(), list().

When you use Value you get a ctypes object in shared memory that by default is synchronized using RLock. When you use Manager you get a SynManager object that controls a server process which allows object values to be manipulated by other processes. You can create multiple proxies using the same manager; there is no need to create a new manager in your loop:

manager = Manager()
for i in range(5):
    new_value = manager.Value('i', 0)

The Manager can be shared across computers, while Value is limited to one computer. Value will be faster (run the below code to see), so I think you should use that unless you need to support arbitrary objects or access them over a network.

import time
from multiprocessing import Process, Manager, Value


def foo(data, name=''):
    print(type(data), data.value, name)
    data.value += 1


if __name__ == "__main__":
    manager = Manager()
    x = manager.Value('i', 0)
    y = Value('i', 0)

    for i in range(5):
        Process(target=foo, args=(x, 'x')).start()
        Process(target=foo, args=(y, 'y')).start()

    print('Before waiting: ')
    print('x = {0}'.format(x.value))
    print('y = {0}'.format(y.value))

    time.sleep(5.0)
    print('After waiting: ')
    print('x = {0}'.format(x.value))
    print('y = {0}'.format(y.value))

To summarize:

  1. Use Manager to create multiple shared objects, including dicts and lists. Use Manager to share data across computers on a network.
  2. Use Value or Array when it is not necessary to share information across a network and the types in ctypes are sufficient for your needs.
  3. Value is faster than Manager.

Warning

By the way, sharing data across processes/threads should be avoided if possible. The code above will probably run as expected, but increase the time it takes to execute foo and things will get weird. Compare the above with:

def foo(data, name=''):
    print type(data), data.value, name
    for j in range(1000):
        data.value += 1

You'll need a Lock to make this work correctly.

I am not especially knowledgable about all of this, so maybe someone else will come along and offer more insight. I figured I would contribute an answer since the question was not getting attention. Hope that helps a little.

Multi-processing Communication by Queue

multithreading - How to use multiprocessing queue in Python? - Stack Overflow

Check Available Cores

from above, either share a counting variable, and lock/atomicize it properly or open up communication channels; but you can also rely on the os to handle the bookkeeping:

Python | os.sched_getaffinity() method - GeeksforGeeks

OS module in Python provides functions for interacting with the operating system. OS comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality.

os.sched_getaffinity() method in Python is used to get the set of CPUs on which the process with the specified process id is eligible to run.

Note: This method is only available on some UNIX platforms.

Syntax: os.sched_getaffinity(pid)

Parameter:
pid: The process id of the process whose CPU affinity mask is required. Process’s CPU affinity mask determines the set of CPUs on which it is eligible to run.
A pid of 0 represents the calling process.

Return Type: This method returns a set object which represents the CPUs, on which the process with the specified process id is eligible to run.

Tips and Caveats

Things I Wish They Told Me About Multiprocessing in Python

Tip #1

Don’t Share Resources, Pass Messages

Tip #2

Always clean up after yourself

Tip #3

Always handle TERM and INT signals

Tip #4

Don’t Wait Forever

Tip # 5

Report exceptions, and have a time based, shared logging system.

Python’s mutliprocessing module allows you to take advantage of the CPU power available on modern systems, but writing and maintaining robust multiprocessing apps requires avoiding certain patterns that can lead to unexpected difficulties, while also spending a fair amount of time and energy focusing on details that aren’t the primary focus of the application. For these reasons, deciding to use multiprocessing in an application is not something to take on lightly, but when you do, these tips will make your work go more smoothly, and will allow you to focus on your core problems.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值