python 多线程处理_Python中的多处理与多线程：您需要了解的内容。

最新推荐文章于 2021-03-29 09:02:31 发布

cumi6497

最新推荐文章于 2021-03-29 09:02:31 发布

阅读量275

点赞数

文章标签： python java 多线程 linux 编程语言

原文链接：https://www.freecodecamp.org/news/multiprocessing-vs-multithreading-in-python-what-you-need-to-know-ef6bdc13d018/

版权

python 多线程处理

by Timber.io

由Timber.io

Python中的多处理与多线程：您需要了解的内容。 (Multiprocessing vs Multithreading in Python: What you need to know.)

什么是线程？你为什么想要它？ (What Is Threading? Why Might You Want It?)

Python is a linear language. However, the threading module comes in handy when you want a little more processing power.

Python是一种线性语言。但是，当您需要更多处理能力时，线程模块会派上用场。

Threading in Python cannot be used for parallel CPU computation. But it is perfect for I/O operations such as web scraping, because the processor is sitting idle waiting for data.

Python中的线程不能用于并行CPU计算。但是，它非常适合Web抓取等I / O操作，因为处理器正闲置等待数据。

Threading is game-changing, because many scripts related to network/data I/O spend the majority of their time waiting for data from a remote source.

线程改变了游戏规则，因为许多与网络/数据I / O相关的脚本花费了大部分时间来等待来自远程源的数据。

Because downloads might not be linked (for example, if you are scraping separate websites), the processor can download from different data sources in parallel and combine the result at the end.

因为可能未链接下载(例如，如果您正在抓取单独的网站)，则处理器可以并行地从不同的数据源下载并在最后合并结果。

For CPU intensive processes, there is little benefit to using the threading module.

对于CPU密集型进程，使用线程模块几乎没有好处。

Threading is included in the standard library:

线程包含在标准库中：

import threading from queueimport Queueimport time

You can use target as the callable object, args to pass parameters to the function, and start to start the thread.

您可以使用target作为可调用对象，使用args将参数传递给函数，并start启动线程。

def testThread(num):    print numif __name__ == '__main__':    for i in range(5):        t = threading.Thread(target=testThread, arg=(i,))        t.start()

If you’ve never seen if __name__ == '__main__': before, it's basically a way to make sure the code that's nested inside it will only run if the script is run directly (not imported).

如果您以前从未看过if __name__ == '__main__':基本上，这是一种确保嵌套在其中的代码仅在脚本直接运行(不导入)的情况下才能运行的方法。

锁 (The Lock)

You’ll often want your threads to be able to use or modify variables common between threads. To do this, you’ll have to use something known as a lock.

您通常会希望您的线程能够使用或修改线程之间共有的变量。为此，您必须使用称为lock东西。

Whenever a function wants to modify a variable, it locks that variable. When another function wants to use a variable, it must wait until that variable is unlocked.

每当函数要修改变量时，它都会锁定该变量。当另一个函数要使用变量时，它必须等待直到该变量被解锁。

Imagine two functions which both iterate a variable by 1. The lock allows you to ensure that one function can access the variable, perform calculations, and write back to that variable before another function can access the same variable.

想象两个都将变量迭代1的函数。使用锁可以确保一个函数可以访问该变量，执行计算并写回该变量，然后另一个函数才能访问相同的变量。

You can use a print lock to ensure that only one thread can print at a time. This prevents the text from getting jumbled up (and causing data corruption) when you print.

您可以使用打印锁来确保一次只能打印一个线程。这样可以防止打印时使文本混乱(并导致数据损坏)。

In the code below, we’ve got ten jobs that we want to get done and five workers that will work on the job:

在下面的代码中，我们要完成十个工作，五个工作的工人：

print_lock = threading.Lock()def threadTest():    # when this exits, the print_lock is released    with print_lock:        print(worker)def threader():  while True:    # get the job from the front of the queue    threadTest(q.get())    q.task_done()q = Queue()for x in range(5):    thread = threading.Thread(target = threader)    # this ensures the thread will die when the main thread dies    # can set t.daemon to False if you want it to keep running    t.daemon = True    t.start()for job in range(10):    q.put(job)

多线程并不总是完美的解决方案 (Multithreading is not always the perfect solution)

I find that many guides tend to skip the negatives of using the tool they’ve just been trying to teach you. It’s important to understand that there are both pros and cons associated with using all these tools. For example:

我发现许多指南倾向于忽略使用他们刚刚尝试教给您的工具的负面影响。重要的是要了解使用所有这些工具既有优点也有缺点。例如：

There is overhead associated with managing threads, so you don’t want to use it for basic tasks (like the example)
与管理线程相关的开销很大，因此您不想将其用于基本任务(例如示例)
Threading increases the complexity of the program, which can make debugging more difficult
线程化增加了程序的复杂性，这会使调试更加困难

什么是多处理？它与线程有何不同？ (What is Multiprocessing? How is it different from threading?)

Without multiprocessing, Python programs have trouble maxing out your system’s specs because of the GIL (Global Interpreter Lock). Python wasn't designed considering that personal computers might have more than one core (which shows you how old the language is).

如果不进行多重处理，由于GIL (全局解释器锁定)，Python程序将无法最大化系统的规格。在设计Python时，并未考虑到个人计算机可能具有多个内核(这向您显示了该语言的年代)。

The GIL is necessary because Python is not thread-safe, and there is a globally enforced lock when accessing a Python object. Though not perfect, it's a pretty effective mechanism for memory management. What can we do?

GIL是必需的，因为Python不是线程安全的，并且在访问Python对象时存在全局强制的锁。尽管不是完美的，但是它是内存管理的一种非常有效的机制。 我们可以做什么？

Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. Though it is fundamentally different from the threading library, the syntax is quite similar. The multiprocessing library gives each process its own Python interpreter, and each their own GIL.

多处理允许您创建可以同时运行(绕过GIL)并使用整个CPU内核的程序。尽管它与线程库有根本的不同，但是语法非常相似。多处理库为每个进程提供了自己的Python解释器，并为每个进程提供了自己的GIL。

Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don’t share memory, they can’t modify the same memory concurrently.

因此，与线程相关的常见问题(例如数据损坏和死锁)不再是问题。由于进程不共享内存，因此它们不能同时修改相同的内存。

让我们开始吧 (Let’s get started)

import multiprocessingdef spawn():  print('test!')if __name__ == '__main__':  for i in range(5):    p = multiprocessing.Process(target=spawn)    p.start()

If you have a shared database, you want to make sure that you’re waiting for relevant processes to finish before starting new ones.

如果您拥有共享数据库，则要确保在启动新进程之前等待相关进程完成。

for i in range(5):  p = multiprocessing.Process(target=spawn)  p.start()  p.join() # this line allows you to wait for processes

If you want to pass arguments to your process, you can do that with args:

如果要将参数传递给您的进程，可以使用args ：

import multiprocessingdef spawn(num):  print(num)if __name__ == '__main__':  for i in range(25):    ## right here    p = multiprocessing.Process(target=spawn, args=(i,))    p.start()

Here’s a neat example, because the numbers don’t come in the order you’d expect (without the p.join()).

这是一个很好的例子，因为数字没有按照您期望的顺序排列(没有p.join() )。

缺点 (Drawbacks)

As with threading, there are still drawbacks with multiprocessing … you’ve got to pick your poison:

与线程处理一样，多处理仍有弊端……您必须选择毒药：

There is I/O overhead from data being shuffled around between processes
数据在进程之间混洗会产生I / O开销
The entire memory is copied into each subprocess, which can be a lot of overhead for more significant programs
整个内存被复制到每个子进程中，这对于更重要的程序可能会带来很多开销

结论 (Conclusion)

When should you use multithreading vs multiprocessing?

什么时候应该使用多线程与多处理？

If your code has a lot of I/O or Network usage, multithreading is your best bet because of its low overhead.
如果您的代码有很多I / O或网络使用情况，那么多线程是您最好的选择，因为它的开销很低。
If you have a GUI, use multithreading so your UI thread doesn’t get locked up.
如果您有GUI，请使用多线程，这样您的UI线程就不会被锁定。
If your code is CPU bound, you should use multiprocessing (if your machine has multiple cores)
如果您的代码受CPU限制，则应使用多处理(如果您的计算机具有多个内核)

Just a disclaimer: we’re a logging company here @ Timber. We’d love it if you tried out our product (it’s seriously great!), but that’s all we’re going to advertise it.

只是免责声明：我们是@ Timber的伐木公司。 如果您试用了我们的产品，我们将非常乐意(非常棒！)，但这就是我们要做广告的全部。

If you’re interested in getting more posts from Timber in your inbox, feel free to sign up here. We promise there’ll be no spam, just great content on a weekly basis.

如果您有兴趣从收件箱中获取更多来自Timber的帖子，请随时在此处注册。我们保证不会出现垃圾邮件，只会每周发送大量内容。

Originally published at timber.io.

最初发布于timber.io 。