numa slit
如何利用Python中的并行性可以使您的软件快几个数量级。 (How taking advantage of parallelism in Python can make your software orders of magnitude faster.)
I recently developed a project that I called Hydra: a multithreaded link checker written in Python. Unlike many Python site crawlers I found while researching, Hydra uses only standard libraries, with no external dependencies like BeautifulSoup. It’s intended to be run as part of a CI/CD process, so part of its success depended on being fast.
我最近开发了一个名为Hydra的项目:用Python编写的多线程链接检查器。 与我在研究中发现的许多Python网站抓取工具不同,Hydra仅使用标准库,没有像BeautifulSoup这样的外部依赖项。 它打算作为CI / CD流程的一部分运行,因此其成功的部分取决于快速。
Multiple threads in Python is a bit of a bitey subject (not sorry) in that the Python interpreter doesn’t actually let multiple threads execute at the same time.
Python中的多个线程有点麻烦(不对不起),因为Python解释器实际上并不让多个线程同时执行。
Python’s Global Interpreter Lock, or GIL, prevents multiple threads from executing Python bytecodes at once. Each thread that wants to execute must first wait for the GIL to be released by the currently executing thread. The GIL is pretty much the microphone in a low-budget conference panel, except where no one gets to shout.
Python的Global Interpreter Lock或GIL可防止多个线程一次执行Python字节码。 每个要执行的线程必须首先等待当前执行的线程释放GIL。 GIL几乎是低预算会议面板中的麦克风,除非没有人大声喊叫。
This has the advantage of preventing race conditions. It does, however, lack the performance advantages afforded by running multiple tasks in parallel. (If you’d like a refresher on concurrency, parallelism, and multithreading, see Concurrency, parallelism, and the many threads of Santa Claus.)
这具有防止比赛条件的优点。 但是,它确实缺乏并行运行多个任务所提供的性能优势。 (如果您想对并发,并行和多线程进行复习,请参阅并发,并行和圣诞老人的许多线程 。)
While I prefer Go for its convenient first-class primitives that support concurrency (see Goroutines), this project’s recipients were more comfortable with Python. I took it as an opportunity to test and explore!
虽然我更喜欢Go,因为它具有方便的支持并发的一流原语(请参阅Goroutines ),但该项目的接收者更喜欢Python。 我以此为契机进行测试和探索!
Simultaneously performing multiple tasks in Python isn’t impossible; it just takes a little extra work. For Hydra, the main advantage is in overcoming the input/output (I/O) bottleneck.
在Python中同时执行多个任务并非不可能; 它只需要一点额外的工作。 对于Hydra,主要优势在于克服了输入/输出(I / O)瓶颈。
In order to get web pages to check, Hydra needs to go out to the Internet and fetch them. When compared to tasks that are performed by the CPU alone, going out over the network is comparatively slower. How slow?
为了检查网页,Hydra需要访问Internet并获取它们。 与仅由CPU执行的任务相比,通过网络出去的速度相对较慢。 多慢
Here are approximate timings for tasks performed on a typical PC:
以下是在典型PC上执行的任务的大概时间:
Task | Time | |
---|---|---|
CPU | execute typical instruction | 1/1,000,000,000 sec = 1 nanosec |
CPU | fetch from L1 cache memory | 0.5 nanosec |
CPU | branch misprediction | 5 nanosec |
CPU | fetch from L2 cache memory | 7 nanosec |
RAM | Mutex lock/unlock | 25 nanosec |
RAM | fetch from main memory | 100 nanosec |
Network | send 2K bytes over 1Gbps network | 20,000 nanosec |
RAM | read 1MB sequentially from memory | 250,000 nanosec |
Disk | fetch from new disk location (seek) | 8,000,000 nanosec (8ms) |
Disk | read 1MB sequentially from disk | 20,000,000 nanosec (20ms) |
Network | send packet US to Europe and back | 150,000,000 nanosec (150ms) |
任务 | 时间 | |
---|---|---|
中央处理器 | 执行典型的指令 | 1 / 1,000,000,000秒= 1纳秒 |
中央处理器 | 从L1缓存中获取 | 0.5纳秒 |
中央处理器 | 分支预测错误 | 5纳秒 |
中央处理器 | 从二级缓存中获取 | 7纳秒 |
内存 | 互斥锁 | 25纳秒 |
内存 | 从主内存中获取 | 100纳秒 |
网络 | 通过1Gbps网络发送2K字节 | 20,000纳秒 |
内存 | 从内存顺序读取1MB | 250,000纳秒 |
磁碟 | 从新磁盘位置获取(查找) | 8,000,000纳秒(8ms) |
磁碟 | 从磁盘顺序读取1MB | 20,000,000纳秒(20ms) |
网络 | 将小包美国发送到欧洲并返回 | 150,000,000纳秒(150ms) |
Peter Norvig first published these numbers some years ago in Teach Yourself Programming in Ten Years. Since computers and their components change year over year, the exact numbers shown above aren’t the point. What these numbers help to illustrate is the difference, in orders of magnitude, between operations.
彼得·诺维格(Peter Norvig)几年前在十年的自学编程中首次公布了这些数字。 由于计算机及其组件每年都在变化,因此上面显示的确切数字并不是重点。 这些数字有助于说明操作之间的数量级差异。
Compare the difference between fetching from main memory and sending a simple packet over the Internet. While both these operations occur in less than the blink of an eye (literally) from a human perspective, you can see that sending a simple packet over the Internet is over a million times slower than fetching from RAM. It’s a difference that, in a single-thread program, can quickly accumulate to form troublesome bottlenecks.
比较从主内存获取和通过Internet发送简单数据包之间的区别。 从人的角度来看,这两种操作发生的时间都少于眨眼(从字面上看),但您可以看到,通过Internet发送简单的数据包比从RAM提取要慢一百万倍。 在单线程程序中,可以Swift积累以形成麻烦的瓶颈是不同的。
In Hydra, the task of parsing response data and assembling results into a report is relatively fast, since it all happens on the CPU. The slowest portion of the program’s execution, by over six orders of magnitude, is network latency. Not only does Hydra need to fetch packets, but whole web pages!
在Hydra中,解析响应数据并将结果组合到报告中的任务相对较快,因为所有这些都发生在CPU上。 程序执行最慢的部分(超过六个数量级)是网络延迟。 Hydra不仅需要获取数据包,而且还需要获取整个网页!
One way of improving Hydra’s performance is to find a way for the page fetching tasks to execute without blocking the main thread.
提高Hydra性能的一种方法是找到一种在不阻塞主线程的情况下执行页面提取任务的方法。
Python has a couple options for doing tasks in parallel: multiple processes, or multiple threads. These methods allow you to circumvent the GIL and speed up execution in a couple different ways.
Python有两个并行执行任务的选项:多个进程或多个线程。 这些方法使您可以绕过GIL并以几种不同的方式加快执行速度。
多个过程 (Multiple processes)
To execute parallel tasks using multiple processes, you can use Python’s ProcessPoolExecutor
. A concrete subclass of Executor
from the concurrent.futures
module, ProcessPoolExecutor
uses a pool of processes spawned with the multiprocessing
module to avoid the GIL.
要使用多个进程执行并行任务,可以使用Python的ProcessPoolExecutor
。 ProcessPoolExecutor
是来自concurrent.futures
模块的Executor
的具体子类,它使用由multiprocessing
模块生成的进程池来避免GIL。
This option uses worker subprocesses that maximally default to the number of processors on the machine. The multiprocessing
module allows you to maximally parallelize function execution across processes, which can really speed up compute-bound (or CPU-bound) tasks.
此选项使用辅助子进程,该子进程最大默认为计算机上的处理器数量。 multiprocessing
模块允许您最大程度地并行化跨进程的函数执行,这可以真正加快计算绑定(或CPU绑定 )任务的速度。
Since the main bottleneck for Hydra is I/O and not the processing to be done by the CPU, I’m better served by using multiple threads.
由于Hydra的主要瓶颈是I / O,而不是CPU要做的处理,因此最好使用多个线程来为我服务。
多线程 (Multiple threads)
Fittingly named, Python’s ThreadPoolExecutor
uses a pool of threads to execute asynchronous tasks. Also a subclass of Executor
, it uses a defined number of maximum worker threads (at least five by default, according to the formula min(32, os.cpu_count() + 4)
) and reuses idle threads before starting new ones, making it pretty efficient.
恰当地命名,Python的ThreadPoolExecutor
使用线程池执行异步任务。 也是Executor
的子类,它使用已定义数量的最大工作线程(根据公式min(32, os.cpu_count() + 4)
默认为至少五个),并在启动新线程之前重用空闲线程,使其成为新线程。非常有效。
Here is a snippet of Hydra with comments showing how Hydra uses ThreadPoolExecutor
to achieve parallel multithreaded bliss:
这是Hydra的一小段,其中带有注释,显示了Hydra如何使用ThreadPoolExecutor
来实现并行多线程幸福:
# Create the Checker class
class Checker:
# Queue of links to be checked
TO_PROCESS = Queue()
# Maximum workers to run
THREADS = 100
# Maximum seconds to wait for HTTP response
TIMEOUT = 60
def __init__(self, url):
...
# Create the thread pool
self.pool = futures.ThreadPoolExecutor(max_workers=self.THREADS)
def run(self):
# Run until the TO_PROCESS queue is empty
while True:
try:
target_url = self.TO_PROCESS.get(block=True, timeout=2)
# If we haven't already checked this link
if target_url["url"] not in self.visited:
# Mark it as visited
self.visited.add(target_url["url"])
# Submit the link to the pool
job = self.pool.submit(self.load_url, target_url, self.TIMEOUT)
job.add_done_callback(self.handle_future)
except Empty:
return
except Exception as e:
print(e)
You can view the full code in Hydra’s GitHub repository.
您可以在Hydra的GitHub存储库中查看完整的代码。
单线程到多线程 (Single thread to multithread)
If you’d like to see the full effect, I compared the run times for checking my website between a prototype single-thread program, and the multiheaded - I mean multithreaded - Hydra.
如果您想看到全部效果,我比较了在单线程原型程序和多头(我的意思是多线程)Hydra之间检查网站的运行时间。
time python3 slow-link-check.py https://victoria.dev
real 17m34.084s
user 11m40.761s
sys 0m5.436s
time python3 hydra.py https://victoria.dev
real 0m15.729s
user 0m11.071s
sys 0m2.526s
The single-thread program, which blocks on I/O, ran in about seventeen minutes. When I first ran the multithreaded version, it finished in 1m13.358s - after some profiling and tuning, it took a little under sixteen seconds.
阻塞I / O的单线程程序运行了大约十七分钟。 当我第一次运行多线程版本时,它在1m13.358s中完成-经过一些性能分析和调整后,花了不到16秒的时间。
Again, the exact times don’t mean all that much; they’ll vary depending on factors such as the size of the site being crawled, your network speed, and your program’s balance between the overhead of thread management and the benefits of parallelism.
再说一次,确切的时间并没有那么重要。 它们的变化取决于各种因素,例如要爬网的站点的大小,网络速度以及程序在线程管理的开销与并行性的好处之间的平衡。
The more important thing, and the result I’ll take any day, is a program that runs some orders of magnitude faster.
更重要的是,我要花费一天的时间,是一个运行速度快几个数量级的程序。
翻译自: https://www.freecodecamp.org/news/multithreaded-python/
numa slit