numa slit_多线程Python:穿越I / O瓶颈slit

numa slit

如何利用Python中的并行性可以使您的软件快几个数量级。 (How taking advantage of parallelism in Python can make your software orders of magnitude faster.)

I recently developed a project that I called Hydra: a multithreaded link checker written in Python. Unlike many Python site crawlers I found while researching, Hydra uses only standard libraries, with no external dependencies like BeautifulSoup. It’s intended to be run as part of a CI/CD process, so part of its success depended on being  fast.

我最近开发了一个名为Hydra的项目:用Python编写的多线程链接检查器。 与我在研究中发现的许多Python网站抓取工具不同,Hydra仅使用标准库,没有像BeautifulSoup这样的外部依赖项。 它打算作为CI / CD流程的一部分运行,因此其成功的部分取决于快速。

Multiple threads in Python is a bit of a bitey subject (not sorry) in that the Python interpreter doesn’t actually let multiple threads execute at the same time.

Python中的多个线程有点麻烦(不对不起),因为Python解释器实际上并不让多个线程同时执行。

Python’s Global Interpreter Lock, or GIL, prevents multiple threads from executing Python bytecodes at once. Each thread that wants to execute must first wait for the GIL to be released by the currently executing thread. The GIL is pretty much the microphone in a low-budget conference panel, except where no one gets to shout.

Python的Global Interpreter Lock或GIL可防止多个线程一次执行Python字节码。 每个要执行的线程必须首先等待当前执行的线程释放GIL。 GIL几乎是低预算会议面板中的麦克风,除非没有人大声喊叫。

This has the advantage of preventing race conditions. It does, however, lack the performance advantages afforded by running multiple tasks in parallel. (If you’d like a refresher on concurrency, parallelism, and multithreading, see Concurrency, parallelism, and the many threads of Santa Claus.)

这具有防止比赛条件的优点。 但是,它确实缺乏并行运行多个任务所提供的性能优势。 (如果您想对并发,并行和多线程进行复习,请参阅并发,并行和圣诞老人​​的许多线程 。)

While I prefer Go for its convenient first-class primitives that support concurrency (see Goroutines), this project’s recipients were more comfortable with Python. I took it as an opportunity to test and explore!

虽然我更喜欢Go,因为它具有方便的支持并发的一流原语(请参阅Goroutines ),但该项目的接收者更喜欢Python。 我以此为契机进行测试和探索!

Simultaneously performing multiple tasks in Python isn’t impossible; it just takes a little extra work. For Hydra, the main advantage is in overcoming the input/output (I/O) bottleneck.

在Python中同时执行多个任务并非不可能; 它只需要一点额外的工作。 对于Hydra,主要优势在于克服了输入/输出(I / O)瓶颈。

In order to get web pages to check, Hydra needs to go out to the Internet and fetch them. When compared to tasks that are performed by the CPU alone, going out over the network is comparatively slower. How slow?

为了检查网页,Hydra需要访问Internet并获取它们。 与仅由CPU执行的任务相比,通过网络出去的速度相对较慢。 多慢

Here are approximate timings for tasks performed on a typical PC:

以下是在典型PC上执行的任务的大概时间:

TaskTime
CPUexecute typical instruction1/1,000,000,000 sec = 1 nanosec
CPUfetch from L1 cache memory0.5 nanosec
CPUbranch misprediction5 nanosec
CPUfetch from L2 cache memory7 nanosec
RAMMutex lock/unlock25 nanosec
RAMfetch from main memory100 nanosec
Networksend 2K bytes over 1Gbps network20,000 nanosec
RAMread 1MB sequentially from memory250,000 nanosec
Diskfetch from new disk location (seek)8,000,000 nanosec (8ms)
Diskread 1MB sequentially from disk20,000,000 nanosec (20ms)
Networksend packet US to Europe and back150,000,000 nanosec (150ms)
任务 时间
中央处理器 执行典型的指令 1 / 1,000,000,000秒= 1纳秒
中央处理器 从L1缓存中获取 0.5纳秒
中央处理器 分支预测错误 5纳秒
中央处理器 从二级缓存中获取 7纳秒
内存 互斥锁 25纳秒
内存 从主内存中获取 100纳秒
网络 通过1Gbps网络发送2K字节 20,000纳秒
内存 从内存顺序读取1MB 250,000纳秒
磁碟 从新磁盘位置获取(查找) 8,000,000纳秒(8ms)
磁碟 从磁盘顺序读取1MB 20,000,000纳秒(20ms)
网络 将小包美国发送到欧洲并返回 150,000,000纳秒(150ms)

Peter Norvig first published these numbers some years ago in Teach Yourself Programming in Ten Years. Since computers and their components change year over year, the exact numbers shown above aren’t the point. What these numbers help to illustrate is the difference, in orders of magnitude, between operations.

彼得·诺维格(Peter Norvig)几年前在十年的自学编程中首次公布了这些数字。 由于计算机及其组件每年都在变化,因此上面显示的确切数字并不是重点。 这些数字有助于说明操作之间的数量级差异。

Compare the difference between fetching from main memory and sending a simple packet over the Internet. While both these operations occur in less than the blink of an eye (literally) from a human perspective, you can see that sending a simple packet over the Internet is over a million times slower than fetching from RAM. It’s a difference that, in a single-thread program, can quickly accumulate to form troublesome bottlenecks.

比较从主内存获取和通过Internet发送简单数据包之间的区别。 从人的角度来看,这两种操作发生的时间都少于眨眼(从字面上看),但您可以看到,通过Internet发送简单的数据包比从RAM提取要慢一百万倍。 在单线程程序中,可以Swift积累以形成麻烦的瓶颈是不同的。

In Hydra, the task of parsing response data and assembling results into a report is relatively fast, since it all happens on the CPU. The slowest portion of the program’s execution, by over six orders of magnitude, is network latency. Not only does Hydra need to fetch packets, but whole web pages!

在Hydra中,解析响应数据并将结果组合到报告中的任务相对较快,因为所有这些都发生在CPU上。 程序执行最慢的部分(超过六个数量级)是网络延迟。 Hydra不仅需要获取数据包,而且还需要获取整个网页!

One way of  improving Hydra’s performance is to find a way for the page fetching tasks to execute without blocking the main thread.

提高Hydra性能的一种方法是找到一种在不阻塞主线程的情况下执行页面提取任务的方法。

Python has a couple options for doing tasks in parallel: multiple processes, or multiple threads. These methods allow you to circumvent the GIL and speed up execution in a couple different ways.

Python有两个并行执行任务的选项:多个进程或多个线程。 这些方法使您可以绕过GIL并以几种不同的方式加快执行速度。

多个过程 (Multiple processes)

To execute parallel tasks using multiple processes, you can use Python’s ProcessPoolExecutor. A concrete subclass of Executor from the concurrent.futures module, ProcessPoolExecutor uses a pool of processes spawned with the multiprocessing module to avoid the GIL.

要使用多个进程执行并行任务,可以使用Python的ProcessPoolExecutorProcessPoolExecutor是来自concurrent.futures模块Executor的具体子类,它使用由multiprocessing模块生成的进程池来避免GIL。

This option uses worker subprocesses that maximally default to the number of processors on the machine. The multiprocessing module allows you to maximally parallelize function execution across processes, which can really speed up compute-bound (or CPU-bound) tasks.

此选项使用辅助子进程,该子进程最大默认为计算机上的处理器数量。 multiprocessing模块允许您最大程度地并行化跨进程的函数执行,这可以真正加快计算绑定(或CPU绑定 )任务的速度。

Since the main bottleneck for Hydra is I/O and not the processing to be done by the CPU, I’m better served by using multiple threads.

由于Hydra的主要瓶颈是I / O,而不是CPU要做的处理,因此最好使用多个线程来为我服务。

多线程 (Multiple threads)

Fittingly named, Python’s ThreadPoolExecutor uses a pool of threads to execute asynchronous tasks. Also a subclass of Executor, it uses a defined number of maximum worker threads (at least five by default, according to the formula min(32, os.cpu_count() + 4)) and reuses idle threads before starting new ones, making it pretty efficient.

恰当地命名,Python的ThreadPoolExecutor使用线程池执行异步任务。 也是Executor的子类,它使用已定义数量的最大工作线程(根据公式min(32, os.cpu_count() + 4)默认为至少五个),并在启动新线程之前重用空闲线程,使其成为新线程。非常有效。

Here is a snippet of Hydra with comments showing how Hydra uses ThreadPoolExecutor to achieve parallel multithreaded bliss:

这是Hydra的一小段,其中带有注释,显示了Hydra如何使用ThreadPoolExecutor来实现并行多线程幸福:

# Create the Checker class
class Checker:
    # Queue of links to be checked
    TO_PROCESS = Queue()
    # Maximum workers to run
    THREADS = 100
    # Maximum seconds to wait for HTTP response
    TIMEOUT = 60

    def __init__(self, url):
        ...
        # Create the thread pool
        self.pool = futures.ThreadPoolExecutor(max_workers=self.THREADS)


def run(self):
    # Run until the TO_PROCESS queue is empty
    while True:
        try:
            target_url = self.TO_PROCESS.get(block=True, timeout=2)
            # If we haven't already checked this link
            if target_url["url"] not in self.visited:
                # Mark it as visited
                self.visited.add(target_url["url"])
                # Submit the link to the pool
                job = self.pool.submit(self.load_url, target_url, self.TIMEOUT)
                job.add_done_callback(self.handle_future)
        except Empty:
            return
        except Exception as e:
            print(e)

You can view the full code in Hydra’s GitHub repository.

您可以在Hydra的GitHub存储库中查看完整的代码。

单线程到多线程 (Single thread to multithread)

If you’d like to see the full effect, I compared the run times for checking my website between a prototype single-thread program, and the multiheaded - I mean multithreaded - Hydra.

如果您想看到全部效果,我比较了在单线程原型程序和多头(我的意思是多线程)Hydra之间检查网站的运行时间。

time python3 slow-link-check.py https://victoria.dev

real    17m34.084s
user    11m40.761s
sys     0m5.436s


time python3 hydra.py https://victoria.dev

real    0m15.729s
user    0m11.071s
sys     0m2.526s

The single-thread program, which blocks on I/O, ran in about seventeen minutes. When I first ran the multithreaded version, it finished in 1m13.358s - after some profiling and tuning, it took a little under sixteen seconds.

阻塞I / O的单线程程序运行了大约十七分钟。 当我第一次运行多线程版本时,它在1m13.358s中完成-经过一些性能分析和调整后,花了不到16秒的时间。

Again, the exact times don’t mean all that much; they’ll vary depending on factors such as the size of the site being crawled, your network speed, and your program’s balance between the overhead of thread management and the benefits of  parallelism.

再说一次,确切的时间并没有那么重要。 它们的变化取决于各种因素,例如要爬网的站点的大小,网络速度以及程序在线程管理的开销与并行性的好处之间的平衡。

The more important thing, and the result I’ll take any day, is a program that runs some orders of magnitude faster.

更重要的是,我要花费一天的时间,是一个运行速度快几个数量级的程序。

翻译自: https://www.freecodecamp.org/news/multithreaded-python/

numa slit

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值