PHP中使用Pthread进行并行编程-基础

This article was peer reviewed by Christopher Pitt. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!

本文由Christopher Pitt同行评审。 感谢所有SitePoint的同行评审人员使SitePoint内容达到最佳状态!



PHP developers seem to rarely utilise parallelism. The appeal of the simplicity of synchronous, single-threaded programming certainly is high, but sometimes the usage of a little concurrency can bring some worthwhile performance improvements.

PHP开发人员似乎很少利用并行性。 同步单线程编程简单性的吸引力当然很高,但是有时并发的使用可以带来一些有价值的性能改进。

In this article, we will be taking a look at how threading can be achieved in PHP with the pthreads extension. This will require a ZTS (Zend Thread Safety) version of PHP 7.x installed, along with the pthreads v3 installed. (At the time of writing, PHP 7.1 users will need to install from the master branch of the pthreads repo – see this article’s section for details on building third-party extensions from source.)

在本文中,我们将研究如何通过pthreads扩展在PHP中实现线程化。 这将需要安装PHP 7.x的ZTS(Zend线程安全)版本以及已安装的pthreads v3。 (在撰写本文时,PHP 7.1用户将需要从pthreads存储库的master分支中进行安装-有关从源代码构建第三方扩展的详细信息,请参见本文的部分 。)

Just as a quick clarification: pthreads v2 targets PHP 5.x and is no longer supported; pthreads v3 targets PHP 7.x and is being actively developed.

快速澄清一下:pthreads v2以PHP 5.x为目标,不再受支持。 pthreads v3针对PHP 7.x,并且正在积极开发中。

Parallel execution abstract image

A big thank you to Joe Watkins (creator of the pthreads extension) for proofreading and helping to improve my article!

非常感谢Joe Watkins (pthreads扩展的创建者)进行的校对和帮助改进了我的文章!

什么时候不使用pthreads (When not to use pthreads)

Before we move on, I would first like to clarify when you should not (as well as cannot) use the pthreads extension.

在继续之前,我首先想澄清一下何时 (以及不能 )使用pthreads扩展。

In pthreads v2, the recommendation was that pthreads should not be used in a web server environment (i.e. in an FCGI process). As of pthreads v3, this recommendation has been enforced, so now you simply cannot use it in a web server environment. The two prominent reasons for this are:

在pthreads v2中,建议不要在Web服务器环境中(即,在FCGI进程中)使用pthreads。 从pthreads v3开始,此建议已得到执行,因此现在您根本无法在Web服务器环境中使用它。 造成这种情况的两个突出原因是:

  1. It is not safe to use multiple threads in such an environment (causing IO issues, amongst other problems).

    在这样的环境中使用多个线程是不安全的(导致IO问题以及其他问题)。
  2. It does not scale well. For example, let’s say you have a PHP script that creates a new thread to handle some work, and that script is executed upon each request. This means that for each request, your application will create one new thread (this is a 1:1 threading model – one thread to one request). If your application is serving 1,000 requests per second, then it is creating 1,000 threads per second! Having this many threads running on a single machine will quickly inundate it, and the problem will only be exacerbated as the request rate increases.

    它的伸缩性不好。 例如,假设您有一个PHP脚本,该脚本创建一个新线程来处理某些工作,并且该脚本会在每次请求时执行。 这意味着对于每个请求,您的应用程序将创建一个新线程(这是1:1线程模型–一个请求的线程)。 如果您的应用程序每秒处理1,000个请求,那么它每秒就创建1,000个线程! 在一台计算机上运行这么多线程将很快淹没它,并且随着请求率的提高,问题只会更加严重。

That’s why threading is not a good solution in such an environment. If you’re looking for threading as a solution to IO-blocking tasks (such as performing HTTP requests), then let me point you in the direction of asynchronous programming, which can be achieved via frameworks such as Amp. SitePoint has released some excellent articles that cover this topic (such as writing asynchronous libraries and Modding Minecraft in PHP), in case you’re interested.

这就是为什么在这种环境下线程化不是一个好的解决方案。 如果您正在寻找将线程作为IO阻止任务(例如执行HTTP请求)的解决方案的方法,那么让我指出异步编程的方向,这可以通过Amp之类的框架来实现。 如果您有兴趣,SitePoint已发布了一些涵盖该主题的优秀文章(例如, 编写异步库使用PHP进行Modding Minecraft )。

With that out of the way, let’s jump straight into things!

顺便说一句,让我们直接开始吧!

处理一次性任务 (Handling one-off tasks)

Sometimes, you will want to handle one-off tasks in a multi-threaded way (such as performing some IO-bound task). In such instances, the Thread class may be used to create a new thread and run some unit of work in that separate thread.

有时,您可能希望以多线程方式处理一次性任务(例如执行某些IO绑定任务)。 在这种情况下,可以使用Thread类创建新线程并在该单独线程中运行某些工作单元。

For example:

例如:

$task = new class extends Thread {
    private $response;

    public function run()
    {
        $content = file_get_contents("http://google.com");
        preg_match("~<title>(.+)</title>~", $content, $matches);
        $this->response = $matches[1];
    }
};

$task->start() && $task->join();

var_dump($task->response); // string(6) "Google"

In the above, the run method is our unit of work that will be executed inside of the new thread. When invoking Thread::start, the new thread is spawned and the run method is invoked. We then join the spawned thread back to the main thread (via Thread::join), which will block until the separate thread has finished executing. This ensures that the task has finished executing before we attempt to output the result (stored in $task->response).

在上面, run方法是我们的​​工作单元,将在新线程内部执行。 调用Thread::start ,将生成新线程并调用run方法。 然后,我们将生成的线程重新加入主线程(通过Thread::join ),该主线程将阻塞直到单独的线程完成执行。 这样可以确保在我们尝试输出结果之前(存储在$task->response ),任务已完成执行。

It may not be desirable to pollute a class’s responsibility with thread-related logic (including having to define a run method). We are able to segregate such classes by having them extend the Threaded class instead, where they can then be run inside other threads:

可能不希望使用与线程相关的逻辑(包括必须定义run方法)来污染类的责任。 通过使它们扩展Threaded类,我们可以隔离此类,然后可以在其他线程中运行它们:

class Task extends Threaded
{
    public $response;

    public function someWork()
    {
        $content = file_get_contents('http://google.com');
        preg_match('~<title>(.+)</title>~', $content, $matches);
        $this->response = $matches[1];
    }
}

$task = new Task;

$thread = new class($task) extends Thread {
    private $task;

    public function __construct(Threaded $task)
    {
        $this->task = $task;
    }

    public function run()
    {
        $this->task->someWork();
    }
};

$thread->start() && $thread->join();

var_dump($task->response);

Any class that needs to be run inside of a separate thread must extend the Threaded class in some way. This is because it provides the necessary abilities to run inside different threads, as well as providing implicit safety and useful interfaces (for things like resource synchronization).

任何需要在单独的线程中运行的类都必须以某种方式扩展Threaded类。 这是因为它提供了在不同线程中运行的必要能力,以及提供隐式安全性和有用的接口(例如资源同步之类的东西)。

Let’s take a quick look at the hierarchy of classes exposed by pthreads:

让我们快速看一下pthreads公开的类的层次结构:

Threaded (implements Traversable, Collectable)
    Thread
        Worker
    Volatile
Pool

We’ve already seen and learnt the basics about the Thread and Threaded classes, so now let’s take a look at the remaining three (Worker, Volatile, and Pool).

我们已经看过并学习了有关ThreadThreaded类的基础知识,所以现在让我们看一下剩下的三个( WorkerVolatilePool )。

回收线 (Recycling threads)

Spinning up a new thread for every task to be parallelised is expensive. This is because a shared-nothing architecture must be employed by pthreads in order to achieve threading inside PHP. What this means is that the entire execution context of the current instance of PHP’s interpreter (including every class, interface, trait, and function) must be copied for each thread created. Since this incurs a noticeable performance impact, a thread should always be reused when possible. Threads may be reused in two ways: with Workers or with Pools.

为要并行执行的每个任务启动新线程非常昂贵。 这是因为pthread必须采用无共享体系结构才能在PHP中实现线程化。 这意味着必须为创建的每个线程复制PHP解释器的当前实例的整个执行上下文(包括每个类,接口,特征和函数)。 由于这会引起明显的性能影响,因此应尽可能重用线程。 线程可以两种方式重用: WorkerPool

The Worker class is used to execute a series of tasks synchronously inside of another thread. This is done by creating a new Worker instance (which creates a new thread), and then stacking the tasks onto that separate thread (via Worker::stack).

Worker类用于在另一个线程内同步执行一系列任务。 这是通过创建一个新的Worker实例(它创建一个新线程),然后将任务堆叠到该单独的线程上(通过Worker::stack )来完成的。

Here’s a quick example:

这是一个简单的例子:

class Task extends Threaded
{
    private $value;

    public function __construct(int $i)
    {
        $this->value = $i;
    }

    public function run()
    {
        usleep(250000);
        echo "Task: {$this->value}\n";
    }
}

$worker = new Worker();
$worker->start();

for ($i = 0; $i < 15; ++$i) {
    $worker->stack(new Task($i));
}

while ($worker->collect());

$worker->shutdown();

Output:

输出:

Pool output

The above stacks 15 tasks onto the new $worker object via Worker::stack, and then processes them in the stacked order. The Worker::collect method, as seen above, is used to clean up the tasks once they have finished executing. By using it inside of a while loop, we block the main thread until all stacked tasks have finished executing and have been cleaned up before we trigger Worker::shutdown. Shutting down the worker prematurely (i.e. whilst there are still tasks to be executed) will still block the main thread until all tasks have finished executing – the tasks will simply not be garbage collected (causing memory leaks).

上面的代码通过Worker::stack将15个任务堆叠到新的$worker对象上,然后以堆叠的顺序处理它们。 如上所示, Worker::collect方法用于在任务完成执行后清理它们。 通过在while循环内使用它,我们将阻塞主线程,直到所有堆叠的任务完成执行并在触发Worker::shutdown之前被清除为止。 过早地关闭工作线程(即,尽管仍有任务要执行)仍将阻塞主线程,直到所有任务执行完毕–不会简单地收集垃圾任务(导致内存泄漏)。

The Worker class provides a few other methods pertaining to its task stack, including Worker::unstack to remove the oldest stacked item, and Worker::getStacked for the number of items on the execution stack. The worker’s stack only holds the tasks that are to be executed. Once a task in the stack has been executed, it is removed and then placed on a separate (internal) stack to be garbage collected (using Worker::collect).

Worker类提供了与其任务堆栈相关的其他一些方法,其中包括Worker::unstack来删除最旧的堆栈项,而Worker::getStacked表示执行堆栈上的项数。 工作者的堆栈仅保存要执行的任务。 一旦执行了堆栈中的任务,便将其删除,然后放在一个单独的(内部)堆栈上以进行垃圾回收(使用Worker::collect )。

Another way to reuse a thread when executing many tasks is to use a thread pool (via the Pool class). Thread pools are powered by a group of Workers to enable for tasks to be executed concurrently, where the concurrency factor (the number of threads the pool runs on) is specified upon pool creation.

执行许多任务时重用线程的另一种方法是使用线程池(通过Pool类)。 线程池由一组Worker供电,以使任务能够并发执行,其中在创建池时指定并发因子(运行池的线程数)。

Let’s adapt the above example to use a pool of workers instead:

让我们将上面的示例改编为改为使用工作池:

class Task extends Threaded
{
    private $value;

    public function __construct(int $i)
    {
        $this->value = $i;
    }

    public function run()
    {
        usleep(250000);
        echo "Task: {$this->value}\n";
    }
}

$pool = new Pool(4);

for ($i = 0; $i < 15; ++$i) {
    $pool->submit(new Task($i));
}

while ($pool->collect());

$pool->shutdown();

Output:

输出:

Pool output

There are a few notable differences between using a pool as opposed to a worker. Firstly, pools do not need to be manually started, they begin executing tasks as soon as they become available. Secondly, we submit tasks to the pool, rather than stack them. Also, the Pool class does not extend Threaded, and so it may not be passed around to other threads (unlike Worker).

使用池而不是使用工作池之间存在一些显着差异。 首先,池不需要手动启动,它们在可用时就开始执行任务。 其次,我们任务提交到池中,而不是将其堆叠 。 而且, Pool类不会扩展Threaded ,因此它可能不会传递给其他线程(与Worker不同)。

As a matter of good practice, workers and pools should always have their tasks collected once finished, and be manually shut down. Threads created via the Thread class should also be joined back to the creator thread.

作为一个好的实践,工作人员和资源池应始终在完成任务后收集其任务,并手动将其关闭。 通过Thread类创建的Thread也应重新加入创建者线程。

pthread和(im)可变性 (pthreads and (im)mutability)

The final class to cover is Volatile – a new addition to pthreads v3. Immutability has become an important concept in pthreads, since without it, performance is severely degraded. Therefore, by default, the properties of Threaded classes that are themselves Threaded objects are now immutable, and so they cannot be reassigned after initial assignment. Explicit mutability for such properties is now favoured, and can still be done by using the new Volatile class.

涉及的最后一类是Volatile -pthreads v3的新增功能。 不变性已成为pthread中的重要概念,因为如果没有它,性能将严重下降。 因此,默认情况下,本身就是Threaded对象的Threaded类的属性现在是不可变的,因此在初始分配后无法重新分配它们。 现在支持此类属性的显式可变性,并且仍然可以通过使用新的Volatile类来实现。

Let’s take a quick look at an example to demonstrate the new immutability constraints:

让我们快速看一个示例,以演示新的不变性约束:

class Task extends Threaded // a Threaded class
{
    public function __construct()
    {
        $this->data = new Threaded();
        // $this->data is not overwritable, since it is a Threaded property of a Threaded class
    }
}

$task = new class(new Task()) extends Thread { // a Threaded class, since Thread extends Threaded
    public function __construct($tm)
    {
        $this->threadedMember = $tm;
        var_dump($this->threadedMember->data); // object(Threaded)#3 (0) {}
        $this->threadedMember = new StdClass(); // invalid, since the property is a Threaded member of a Threaded class
    }
};

Threaded properties of Volatile classes, on the other hand, are mutable:

另一方面, Volatile类的Threaded属性是可变的:

class Task extends Volatile
{
    public function __construct()
    {
        $this->data = new Threaded();
        $this->data = new StdClass(); // valid, since we are in a volatile class
    }
}

$task = new class(new Task()) extends Thread {
    public function __construct($vm)
    {
        $this->volatileMember = $vm;

        var_dump($this->volatileMember->data); // object(stdClass)#4 (0) {}

        // still invalid, since Volatile extends Threaded, so the property is still a Threaded member of a Threaded class
        $this->volatileMember = new StdClass();
    }
};

We can see that the Volatile class overrides the immutability enforced by its parent Threaded class to enable for Threaded properties to be reassignable (as well as unset()).

我们可以看到, Volatile类重写了其父Threaded类强制实施的不变性,以使Threaded属性可以重新分配(以及unset() )。

There’s just one last fundamental topic to cover with respect to mutability and the Volatile class – arrays. Arrays in pthreads are automatically coerced to Volatile objects when assigned to the property of a Threaded class. This is because it simply isn’t safe to manipulate an array from multiple contexts in PHP.

关于可变性和Volatile类-数组,只有最后一个基本主题需要讨论。 将pthread中的数组分配给Threaded类的属性后,它们会自动强制转换为Volatile对象。 这是因为在PHP中从多个上下文操纵数组根本不安全。

Let’s again take a quick look at an example to better understand things:

让我们再次快速看一个示例,以更好地理解事物:

$array = [1,2,3];

$task = new class($array) extends Thread {
    private $data;

    public function __construct(array $array)
    {
        $this->data = $array;
    }

    public function run()
    {
        $this->data[3] = 4;
        $this->data[] = 5;

        print_r($this->data);
    }
};

$task->start() && $task->join();

/* Output:
Volatile Object
(
    [0] => 1
    [1] => 2
    [2] => 3
    [3] => 4
    [4] => 5
)
*/

We can see that Volatile objects can be treated as if they were arrays, since they provide support for the array-based operations (as shown above) with the subset operator ([]). Volatile classes are not, however, supported by the common array-based functions, such as array_pop and array_shift. Instead, the Threaded class provides us with such operations as built-in methods.

我们可以看到, Volatile对象可以被视为数组,因为它们通过子集运算符( [] )为基于数组的操作提供支持(如上所示)。 但是,常见的基于数组的函数(例如array_poparray_shift不支持Volatile类。 相反, Threaded类为我们提供了诸如内置方法之类的操作。

As a demonstration:

作为演示:

$data = new class extends Volatile {
    public $a = 1;
    public $b = 2;
    public $c = 3;
};

var_dump($data);
var_dump($data->pop());
var_dump($data->shift());
var_dump($data);

/* Output:
object(class@anonymous)#1 (3) {
  ["a"]=> int(1)
  ["b"]=> int(2)
  ["c"]=> int(3)
}
int(3)
int(1)
object(class@anonymous)#1 (1) {
  ["b"]=> int(2)
}
*/

Other supported operations include Threaded::chunk and Threaded::merge.

其他受支持的操作包括Threaded::chunkThreaded::merge

同步化 (Synchronization)

The final topic we will be covering in this article is synchronization in pthreads. Synchronization is a technique for enabling controlled access to shared resources.

我们将在本文中讨论的最后一个主题是pthread中的同步。 同步是一种用于对共享资源进行受控访问的技术。

For example, let’s implement a naive counter:

例如,让我们实现一个幼稚的计数器:

$counter = new class extends Thread {
    public $i = 0;

    public function run()
    {
        for ($i = 0; $i < 10; ++$i) {
            ++$this->i;
        }
    }
};

$counter->start();

for ($i = 0; $i < 10; ++$i) {
    ++$counter->i;
}

$counter->join();

var_dump($counter->i); // outputs a number from between 10 and 20

Without using synchronization, the output isn’t deterministic. Multiple threads writing to a single variable without controlled access has caused updates to be lost.

如果不使用同步,则输出是不确定的。 多个线程在没有控制访问的情况下写入单个变量已导致更新丢失。

Let’s rectify this by adding synchronization so that we receive the correct output of 20:

让我们通过添加同步来纠正此问题,以便我们收到正确的输出20

$counter = new class extends Thread {
    public $i = 0;

    public function run()
    {
        $this->synchronized(function () {
            for ($i = 0; $i < 10; ++$i) {
                ++$this->i;
            }
        });
    }
};

$counter->start();

$counter->synchronized(function ($counter) {
    for ($i = 0; $i < 10; ++$i) {
        ++$counter->i;
    }
}, $counter);

$counter->join();

var_dump($counter->i); // int(20)

Synchronized blocks of code can also cooperate with one-another using Threaded::wait and Threaded::notify (along with Threaded::notifyOne).

同步的代码块还可以使用Threaded::waitThreaded::notify (以及Threaded::notifyOne ) Threaded::notifyOne

Here’s a staggered increment from two synchronized while loops:

这是两个同步while循环的交错增量:

$counter = new class extends Thread {
    public $cond = 1;

    public function run()
    {
        $this->synchronized(function () {
            for ($i = 0; $i < 10; ++$i) {
                var_dump($i);
                $this->notify();

                if ($this->cond === 1) {
                    $this->cond = 2;
                    $this->wait();
                }
            }
        });
    }
};

$counter->start();

$counter->synchronized(function ($counter) {
    if ($counter->cond !== 2) {
        $counter->wait(); // wait for the other to start first
    }

    for ($i = 10; $i < 20; ++$i) {
        var_dump($i);
        $counter->notify();

        if ($counter->cond === 2) {
            $counter->cond = 1;
            $counter->wait();
        }
    }
}, $counter);

$counter->join();

/* Output:
int(0)
int(10)
int(1)
int(11)
int(2)
int(12)
int(3)
int(13)
int(4)
int(14)
int(5)
int(15)
int(6)
int(16)
int(7)
int(17)
int(8)
int(18)
int(9)
int(19)
*/

You may have noticed the additional conditions that have been placed around the invocations to Threaded::wait. These conditions are crucial because they only allow a synchronized callback to resume when it has received a notification and the specified condition is true. This is important because notifications may come from places other than calls to Threaded::notify. Thus, if the calls to Threaded::wait were not enclosed within conditions, we would be open to spurious wakeup calls, which will lead to unpredictable code.

您可能已经注意到在Threaded::wait调用周围放置的其他条件。 这些条件至关重要,因为它们仅允许同步回调在收到通知指定条件为true时恢复。 这很重要,因为通知可能来自调用Threaded::notify之外的其他地方。 因此,如果对Threaded::wait的调用未包含在条件中,则我们可能会接受虚假的唤醒调用 ,这将导致无法预测的代码。

结论 (Conclusion)

We have seen the five classes pthreads packs with it (Threaded, Thread, Worker, Volatile, and Pool), including covering when each of the classes are used. We have also looked at the new immutability concept in pthreads, as well as having a quick tour of the synchronization feature it supports. With these fundamentals covered, we can now begin to look into applying pthreads to some real world use-cases! That will be the topic of our next post.

我们已经看到了五个带有pthreads包的类( ThreadedThreadWorkerVolatilePool ),其中包括何时使用每个类。 我们还研究了pthread中的新不变性概念,并快速浏览了其支持的同步功能。 涵盖了这些基础知识之后,我们现在可以开始研究将pthread应用于一些实际的用例! 这将是我们下一篇文章的主题。

In the meanwhile, if you have some application ideas regarding pthreads, don’t hesitate to drop them below into the comments area!

同时,如果您有关于pthread的一些应用程序想法,请不要犹豫,将它们放在注释区域以下!

翻译自: https://www.sitepoint.com/parallel-programming-pthreads-php-fundamentals/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值