Nvidia PhysX 学习文档6： Threading

最新推荐文章于 2021-12-22 19:40:10 发布

First Snowflakes

最新推荐文章于 2021-12-22 19:40:10 发布

阅读量783

点赞数 1

分类专栏： Nvidia PhysX

本文链接：https://blog.csdn.net/qq_35865125/article/details/102890447

版权

Nvidia PhysX 专栏收录该内容

13 篇文章 15 订阅

订阅专栏

official site: https://gameworksdocs.nvidia.com/PhysX/4.1/documentation/physxguide/Manual/Threading.html#threading

红色代表需要弄懂的。

Threading

Introduction

This chapter explains how to use PhysX in multithreaded applications. There are three main aspects to using PhysX with multiple threads:

how to make read and write calls into the PhysX API from multiple threads without causing race conditions.
how to use multiple threads to accelerate simulation processing.
how to perform asynchronous simulation, and read and write to the API while simulation is being processed.

该篇介绍如何在多线程程序中使用physx,主要关注以下三个方面：

多个线程调用physx的读，写 api时，如何保证不会产生竞争。

如何利用多个线程来加速仿真。

如何实施异步仿真；当仿真正在进行时，如何调用physx的read, write api.

Data Access from Multiple Threads

For efficiency reasons, PhysX does not internally lock access to its data structures by the application, so be careful when calling the API from multiple application threads. The rules are as follows:

API interface methods marked 'const' are read calls, other API interface methods are write calls.
API read calls may be made simultaneously from multiple threads.
Objects in different scenes may be safely accessed by different threads.
Different objects outside a scene may be safely accessed from different threads. Be aware that accessing an object may indirectly cause access to another object via a persistent reference (such as joints and actors referencing one another, an actor referencing a shape, or a shape referencing a mesh.)

出与效率原因，physx没有在其内部对数据的访问进行加锁，因此，多线程环境下需要格外注意。一些规则：

const类型的api接口函数是用于读取数据的函数，其他的api接口函数是用于写数据的函数；
多个线程可以同时调用api read函数；
属于不同scene的objects或许可以被不同的线程同时访问；
scene之外的不同的objects或许可以被不同的线程同时访问。注意：访问一个object时，或许会导致对另一个object的访问，因为 persistent reference。（例如，joints和actors referencing one another, actor reference a shape, 后者，a shape reference a mesh）

Access patterns which do not conform to the above rules may result in data corruption, deadlocks, or crashes. Note in particular that it is not legal to perform a write operation on an object in a scene concurrently with a read operation to an object in the same scene. The checked build contains code which tracks access by application threads to objects within a scene, to try and detect problems at the point when the illegal API call is made.

不符合上述规则的行为会导致data corruption, 死锁，崩溃。特别注意：在同一个scene内，对一个object执行写操作时，不能同时进行对该scene中的object进行读操作，即同一个scene中的东西，不能同时读写。 checked版本的build可以track程序的线程对scene中objects的访问，会检测出不符合规则的访问。

Scene Locking

Each PxScene object provides a multiple reader, single writer lock that can be used to control access to the scene by multiple threads. This is useful for situations where the PhysX scene is shared between more than one independent subsystem. The scene lock provides a way for these systems to coordinate with each other without creating direct dependencies.

每个PxScene object都提供多个reader锁，单一的writer锁。这对于以下情况非常有利：一个physx scene被多个独立的subsystem共享。scene lock为这些subsystem提供了一个相互协作的方法，不同的subsystem不直接相互依赖即可协作。

It is not mandatory to use the lock. If all access to the scene is from a single thread, using the lock adds unnecessary overhead. Even if you are accessing the scene from multiple threads, you may be able to synchronize the threads using a simpler or more efficient application-specific mechanism that guarantees your application meets the above conditions. However, using the scene lock has two potential benefits:

If the PxSceneFlag::eREQUIRE_RW_LOCK is set, the checked build will issue a warning for any API call made without first acquiring the lock, or if a write call is made when the lock has only been acquired for read,
The APEX SDK uses the scene lock to ensure that it shares the scene safely with your application.

并不是所有的场合都需要使用lock。 如果所有对scene的访问都来自同一个thread，那么，使用锁时多此一举。即使程序中的确会有多个线程对同一个scene进行访问，你仍然在应用程序层面采取某种简易的或者更加高效的机制来保证不会产生竞争。但是呢，使用scene lock有两个大的优点：

如果设置了PxSceneFlag::eREQUIRE_RW_LOCK flag, 那么， checked build版本会产生一个warning，当你的程序在没有先取得lock的情况下对scene进行访问，或者，当你在read lock已被获取的情况下调用write call时。
apex sdk使用了scene lock来保证它与你的app安全地使用一个场景。

There are four methods for for acquiring / releasing the lock:

以下四个函数用于取得/释放 lock:

void PxScene::lockRead(const char* file=NULL, PxU32 line=0);
void PxScene::unlockRead();

void PxScene::lockWrite(const char* file=NULL, PxU32 line=0);
void PxScene::unlockWrite();

Additionally there is an RAII helper class to manage these locks, see PxSceneLock.h.

另外，RAII helper class 可用于管理这些锁，请参考 PxSceneLock.h

RAII （Resource Acquisition Is Initialization）,也称为“资源获取就是初始化”，是C++语言的一种管理资源、避免泄漏的惯用法。

Locking Semantics (加锁语法)

There are precise rules regarding the usage of the scene lock:

使用scene lock时的规则：

Multiple threads may read at the same time. 多个线程可以同时进行 read.
Only one thread may write at a time, no thread may write if any threads are reading. 在同一时间，只允许一个线程写，当有其他线程正在读时，不运行任何线程进行写操作。
If a thread holds a write lock then it may call both read and write API methods. 如果一个线程已经获得了write lock,该线程可以调用读、写 API。
Re-entrant read locks are supported, meaning a lockRead() on a thread that has already acquired a read lock is permitted. Each lockRead() must have a paired unlockRead(). 如果一个线程已经获得了read lock，该线程可以继续通过lockRead()函数再次获得锁，但是lockRead和unlockRead函数要成对出现。
Re-entrant write locks are supported, meaning a lockWrite() on a thread that has already acquired a write lock is permitted. Each lockWrite() must have a paired unlockWrite(). 如果一个线程已经获得了writelock，该线程可以继续通过lockWrite()函数再次获得锁，但是lockWrite和unlockWrite函数要成对出现。
Calling lockRead() by a thread that has already acquired the write lock is permitted and the thread will continue to have read and write access. Each lock*() must have an associated unlock*() that occurs in reverse order. 如果一个线程已经获得了writelock，该线程可以继续调用lockRead()函数再次获得锁，但是lock*和unlock*函数要成对出现，且最后一个lock对应的unlock先被调用。
Lock upgrading is not supported - a lockWrite() by a thread that has already acquired a read lock is not permitted. Attempting this in checked builds will result in an error, in release builds it will lead to deadlock. 如果一个线程已经获得了read lock，该线程不被允许获得write lock，如果试图去获得，在checked build版本中会导致一个error，在release 版本中会导致死锁。
Writers are favored - if a thread attempts a lockWrite() while the read lock is acquired it will be blocked until all readers leave. If new readers arrive while the writer thread is blocked they will be put to sleep and the writer will have first chance to access the scene. This prevents writers being starved in the presence of multiple readers. 如果存在线程B已经获得了read lock,这时另一个线程A尝试获得write lock时，线程A会被blocked直到获得B线程解锁read。在线程A等待的过程中，如果线程C试图获得read lock，C会被排到A的后面。
If multiple writers are queued then the first writer will receive priority, subsequent writers will be granted access according to OS scheduling. 如果有多个线程排队等待获得write锁，则先到的优先。

Note: PxScene::release() automatically attempts to acquire the write lock, it is not necessary to acquire it manually before calling release().

PxScene::release()函数会自动去获得write lock，调用它前没有必要手动先获得锁。

Locking Best Practices

It is often useful to arrange your application to acquire the lock a single time to perform multiple operations. This minimizes the overhead of the lock, and in addition can prevent cases such as a sweep test in one thread seeing a rag doll that has been only partially inserted by another thread.

应该获得一次锁，进行多次操作，即，尽量减少锁操作吧。

Clustering writes can also help reduce contention for the lock, as acquiring the lock for write will stall any other thread trying to perform a read access.

Asynchronous Simulation

PhysX simulation is asynchronous by default. Start simulation by calling:

默认情况下physx的仿真是异步的，开始仿真的方法：

scene->simulate(dt);

When this call returns, the simulation step has begun in a separate thread. While simulation is running, you can still make calls into the API. Where those calls affect simulation state, the results will be buffered and reconciled with the simulation results when the simulation step completes.

当该函数返回时，仿真已经在另一个线程中开始（该函数会立即返回）。当仿真正在进行时，你仍然可以调用API，如果你在某一个仿真step进行时调用了api，则，该setp的仿真结果会综合考虑仿真的结果和你调用api的影响。(在仿真进行时，最好不要乱来吧)

To wait until simulation completes, call:

以下函数会等待一个step的完成：

scene->fetchResults(true);

The boolean parameter to fetchResults denotes whether the call should wait for simulation to complete, or return immediately with the current completion status. See the API documentation for more detail.

fetResults函数的入参为true: 直到仿真结束才返回； false: 立即返回；具体细节请参考API doc.

It is important to distinguish two time slots for data access:

必须区分两个不同的时间片片：

After the call to PxScene::fetchResults() has returned and before the next PxScene::simulate() call (see figure below, blue area "1").
After the call to PxScene::simulate() has returned and before the corresponding PxScene::fetchResults() call (see figure below, green area "2").

In the first time slot, the simulation is not running and there are no restrictions for reading or writing object properties. Changes to the position of an object, for example, are applied instantaneously and the next scene query or simulation step will take the new state into account.

在slot1时，没有仿真正在运行（也就是收fetchResults返回后没有simulation在运行，这需要通过给fetchResults函数传递true参数吧），对读写objects没有限制。例如，在此slot你改变了某一个object的位置，则，你的改变会马上生效，意味着，改变其位置后，调用scene query会马上看到效果，而且，该变化会被下一个step的仿真立即考虑在内。

In the second time slot the simulation is running and in the process, reading and changing the state of objects. Concurrent access from the user might corrupt the state of the objects or lead to data races or inconsistent views in the simulation code. Hence the simulation code's view of the objects is protected from API writes, and any attributes the simulation updates are buffered to allow API reads. The consequences will be discussed in detail in the next section.

在slot2时，仿真正在运行中，仿真线程正在读，改变objects的状态。在这时，如果用户改变objects的状态可能会导致混乱、数据竞争、inconsistet view. 因此，仿真过程会被保护，具体见下文。

Note that simulate() and fetchResults() are write calls on the scene, and as such it is illegal to access any object in the scene while these functions are running.

Double Buffering

While a simulation is running, PhysX supports read and write access to objects in the scene (with some exceptions, see further below). This includes adding/removing them to/from a scene.

当仿真正在运行时， physx支持用户对场景中的objects进行读写（除了一些特殊情况），也允许向场景中添加object或者移除objects。

From the user perspective, API changes are reflected immediately. For example, if the velocity of a rigid body is set and then queried, the new velocity will be returned. Similarly, if an object is created while the simulation is running, it can be accessed/modified as any other object. However, these changes are buffered so that the simulation code sees the object state as it was when PxScene::simulate() was called. For instance, changes to the filter data of an object while the simulation is running are ignored for collision pair generation of the running step, and will only affect for the next simulation step.

从用户的角度来看，用户调用api去改变object的状态的动作会马上有效果。例如如果用户设置了某个object的速度，然后，query该object的速度时，新设置的速度会被马上返回。相似地，如果在仿真运行时，用户创建了一个object，该object可以被用户正常地acess/modified.

然而，用户的这些改变只是被缓存起来了，因此，仿真线程并不会看到这些用户的改变。例如，在仿真运行时改变了某个object的filter data，该改变并不会影响当前仿真线程中的碰撞检测，只能从下一个仿真step开始发挥影响。

When PxScene::fetchResults() is called, any buffered changes are flushed: changes made by the simulation are reflected in API view of the objects, and API changes are made visible to the simulation code for the next step. User changes take precedence: for example, a user change to the position of an object while the simulation is running will overwrite the position which resulted from the simulation.

PxScene::fetchResults()返回后（应该是 PxScene::fetchResults(true)返回后吧）, 仿真线程对万物状态的改变可以通过api得到了，用户在仿真进行中对object的改变可以被下一个仿真step考虑在内。另外，当前的仿真step结束后，如果仿真线程对object A的状态进行了改变，同时用户刚才在仿真进行时也改变了这个object的状态，那么，以用户的改变为准。

The delayed application of updates does not affect scene queries, which always take into account the latest changes.

Events involving removed objects

Deleting objects or removing them from the scene while the simulation is in process will affect the simulation events sent out at PxScene::fetchResults(). The behavior is as follows:

当仿真正在进行时，删除object或者将object移除出场景，会影响simulation events，这些events是在调用PxScene::fetchResults()后发出。

PxSimulationEventCallback::onWake(), ::onSleep() events will not get fired if an object is involved which got deleted/removed during the running simulation. 仿真进行时删除或移除了object，则其对应的PxSimulationEventCallback::onWake(), ::onSleep() 事件不会被发出啦。
PxSimulationEventCallback::onContact(), ::onTrigger() events will get fired if an object is involved which got deleted/removed during the running simulation. The deleted/removed object will be marked as such (see PxContactPairHeaderFlag::eREMOVED_ACTOR_0, PxContactPairFlag::eREMOVED_SHAPE_0, PxTriggerPairFlag::eREMOVED_SHAPE_TRIGGER). Furthermore, if PxPairFlag::eNOTIFY_TOUCH_LOST, ::eNOTIFY_THRESHOLD_FORCE_LOST events were requested for the pair containing the deleted/removed object, then these events will be created.
仿真进行时删除或移除了object，则其对应的PxSimulationEventCallback::onContact(), ::onTrigger() 事件会被发射，这些object会被标识为“我已被删除啊”从，（例如，PxContactPairHeaderFlag::eREMOVED_ACTOR_0, PxContactPairFlag::eREMOVED_SHAPE_0, PxTriggerPairFlag::eREMOVED_SHAPE_TRIGGER）。

另外，对于成对的object，如果一个object被删除或移除了，以下事件会被发射： PxPairFlag::eNOTIFY_TOUCH_LOST, ::eNOTIFY_THRESHOLD_FORCE_LOST 。

Memory Considerations

The buffers to store the object changes while the simulation is running are created on demand. If memory usage concerns outweigh the advantage of reading/writing objects in parallel with simulation, do not write to objects while the simulation is running.

当仿真正在进行时，存储objects变化的buffer会被按需创建（仿真进行期间用户对场景进行读/写操作会产生这种buffer的需求），如果，用户的内存不咋地，则不要在仿真正在进行时乱搞了。

Multithreaded Simulation

PhysX includes a task system for managing CPU and GPU compute resources. Tasks are created with dependencies so that they are resolved in a given order, when ready they are then submitted to a user-implemented dispatcher for execution.

Physx具备用于管理cpu和gpu资源的任务系统。 Tasks are created with dependencies so that they are resolved in a given order, when ready they are then submitted to a user-implemented dispatcher for execution.

Middleware products typically do not want to create CPU threads for their own use. This is especially true on consoles where execution threads can have significant overhead. In the task model, the computational work is broken into jobs that are submitted to the application's thread pool as they become ready to run.

Middleware products一般不会自己创建cpu 线程给自己使用，特别是那些控制台程序。In the task model, the computational work is broken into jobs that are submitted to the application's thread pool as they become ready to run.

The following classes comprise the CPU task management.

以下class构成了cpu任务管理系统。

TaskManager

A TaskManager manages inter-task dependencies and dispatches ready tasks to their respective dispatcher. There is a dispatcher for CPU tasks and GPU tasks assigned to the TaskManager.

线程池子么。。。

TaskManagers are owned and created by the SDK. Each PxScene will allocate its own TaskManager instance which users can configure with dispatchers through either the PxSceneDesc or directly through the TaskManager interface.

用户通过sdk来创建TaskManagers。 每个PxScene会创建属于自己的TaskManager 实例，用户可以使用dispatchers对该实例进行配置，配置的方式是利用PxSceneDesc 后或者直接使用TaskManager的接口（具体见后面的例子）。

CpuDispatcher

The CpuDispatcher is an abstract class the SDK uses for interfacing with the application's thread pool. Typically, there will be one single CpuDispatcher for the entire application, since there is rarely a need for more than one thread pool. A CpuDispatcher instance may be shared by more than one TaskManager, for example if multiple scenes are being used.

CpuDispatcher 是一个抽象类，sdk使用该类与应用程序的线程池子进行交互。一般，程序中只有一个CpuDispatcher ，因为，一般只需要一个线程池子。一个CpuDispatcher 可以被多个TaskManager共享，例如，当存在多个scene时（前面提到： 每个PxScene会创建属于自己的TaskManager 实例）。

PhysX includes a default CpuDispatcher implementation, but we prefer applications to implement this class themselves so PhysX and APEX can efficiently share CPU resources with the application.

对于CpuDispatcher 类，physx提供了一个默认的实现，但是，建议用户自己实现，以使得physx和apex能够高效地与应用程序共享cpu资源。

Note

The TaskManager will call CpuDispatcher::submitTask() from either the context of API calls (aka: scene::simulate()) or from other running tasks, so the function must be thread-safe.

TaskManager会调用CpuDispatcher::submitTask() ，调用的方式包括： the context of API calls (aka: scene::simulate()) ，from other running tasks,因此该函数必须是线程安全函数。

An implementation of the CpuDispatcher interface must call the following two methods on each submitted task for it to be run correctly:

每个 CpuDispatcher interface都必须对submitted task调用下面的两个函数：

baseTask->run();    // optionally call runProfiled() to wrap with PVD profiling events
baseTask->release();

The PxExtensions library has default implementations for all dispatcher types, the following code snippets are taken from SampleBase and show how the default dispatchers are created. mNbThreads which is passed to PxDefaultCpuDispatcherCreate defines how many worker threads the CPU dispatcher will have.:

PxExtensions library 针对所有的dispatcher types都有默认的实现，下面代码来自SampleBase 例子，展示了如何创建default dispatchers，并传递给scece。传递给 PxDefaultCpuDispatcherCreate函数的mNbThreads参数定义了cpu dispatcher有几个work thereads。

PxSceneDesc sceneDesc(mPhysics->getTolerancesScale());
[...]
// create CPU dispatcher which mNbThreads worker threads
mCpuDispatcher = PxDefaultCpuDispatcherCreate(mNbThreads);
if(!mCpuDispatcher)
    fatalError("PxDefaultCpuDispatcherCreate failed!");
sceneDesc.cpuDispatcher = mCpuDispatcher;
[...]
mScene = mPhysics->createScene(sceneDesc);

Note

Best performance is usually achieved if the number of threads is less than or equal to the available hardware threads of the platform you are running on, creating more worker threads than hardware threads will often lead to worse performance. For platforms with a single execution core, the CPU dispatcher can be created with zero worker threads (PxDefaultCpuDispatcherCreate(0)). In this case all work will be executed on the thread that calls PxScene::simulate(), which can be more efficient than using multiple threads.

建议创建的线程数量少于或等于平台能提供的最大线程数目，超过这个数目的话一般会导致差的性能。对于只有一个core的平台，应该设置cpu dispatcher的线程数目为0，这时，所有的任务会在调用PxScene::simulate()的线程中执行。

Note

CudaContextManagerDesc support appGUID now. It only works on release build. If your application employs PhysX modules that use CUDA you need to use a GUID so that patches for new architectures can be released for your application. You can obtain a GUID for your application from NVIDIA. The application should log the failure into a file which can be sent to NVIDIA for support.

--- cuda相关。

CpuDispatcher Implementation Guidelines

After the scene's TaskManager has found a ready-to-run task and submitted it to the appropriate dispatcher it is up to the dispatcher implementation to decide how and when the task will be run.

scene的TaskManager 发现一个ready task并把它提交给dispatcher后，该task如何执行，何时被执行都取决于dispatcher的实现。

Often in game scenarios the rigid body simulation is time critical and the goal is to reduce the latency from simulate() to the completion of fetchResults(). The lowest possible latency will be achieved when the PhysX tasks have exclusive access to CPU resources during the update. In reality, PhysX will have to share compute resources with other application tasks. Below are some guidelines to help ensure a balance between throughput and latency when mixing the PhysX update with other work.

在游戏中，刚体仿真对时间的要求比较高，一个目标是减少从simulate()调用到fetchResults()完成的时间长度。当Physx tasks 在update期间独占cpu资源的时，该时间长度最小。但在实际中，physx必须与其他的application tasks共享cpu资源。以下的建议用于平衡时间延迟和吞吐量：

Avoid interleaving long running tasks with PhysX tasks, this will help reduce latency. 避免将长时间运行的task插入Physx tasks, 有助于减少时延。
Avoid assigning worker threads to the same execution core as higher priority threads. If a PhysX task is context switched during execution the rest of the rigid body pipeline may be stalled, increasing latency.
PhysX occasionally submits tasks and then immediately waits for them to complete, because of this, executing tasks in LIFO (stack) order may perform better than FIFO (queue) order. Physx有时会提交task后立即等待task完成，鉴于此，用后进先出的顺序执行tasks比先进先出顺序要好。
PhysX is not a perfectly parallel SDK, so interleaving small to medium granularity tasks will generally result in higher overall throughput. Physx不是一个完美的并行sdk，因此，交错执行小的、中型的任务一般会得到更高的吞吐量。
If your thread pool has per-thread job-queues then queuing tasks on the thread they were submitted may result in more optimal CPU cache coherence, however this is not required.

For more details see the default CpuDispatcher implementation that comes as part of the PxExtensions package. It uses worker threads that each have their own task queue and steal tasks from the back of other worker's queues (LIFO order) to improve workload distribution.

更多细节请参考CpuDispatcher 的默认实现，该实现在PxExtensions package部分。它的每个worker threads都有自己的task queue， steal tasks from the back of other worker's queues (LIFO order) to improve workload distribution.

BaseTask

BaseTask is the abstract base class for all task types. All task run() functions will be executed on application threads, so they need to be careful with their stack usage, use a little stack as possible, and they should never block for any reason.

BaseTask是所有task 类型的抽象基类。 All task run() functions will be executed on application threads，因此，需要注意stack的使用，尽量少用stack， and they should never block for any reason.

Task

The Task class is the standard task type. Tasks must be submitted to the TaskManager each simulation step for them to be executed. Tasks may be named at submission time, this allows them to be discoverable. Tasks will be given a reference count of 1 when they are submitted, and the TaskManager::startSimulation() function decrements the reference count of all tasks and dispatches all Tasks whose reference count reaches zero. Before TaskManager::startSimulation() is called, Tasks can set dependencies on each other to control the order in which they are dispatched. Once simulation has started, it is still possible to submit new tasks and add dependencies, but it is up to the programmer to avoid race hazards. You cannot add dependencies to tasks that have already been dispatched, and newly submitted Tasks must have their reference count decremented before that Task will be allowed to execute.

Task class是标准的task type. 在每个仿真tick, tasks必须被提交给TaskManager执行。在提交task时，用户可以命名提交的task, 以使得该task可以被discover。 Task被提交到的时候TaskManager时，被分配值为1的引用计数， TaskManager::startSimulation()函数会decrements the reference count of all tasks and dispatches all Tasks whose reference count reaches zero（什么时候decrease?）。在TaskManager::startSimulation() 函数调用之前， tasks可以设置相互之间的dependencies 以控制它们被dispatch的先后顺序。simulation开始以后，仍然可以submit new tasks并设置这些new tasks的依赖关系，但是，用户须自行避免竞争。用户不能对已经被dispatch的线程建立依赖关系，新被提交的tasks must have their reference count decremented before that Task will be allowed to execute。

Synchronization points can also be defined using Task names. The TaskManager will assign the name a TaskID with no Task implementation. When all of the named TaskID's dependencies are met, it will decrement the reference count of all Tasks with that name.

可以使用task name来设置同步点。 TaskManager会assign the name a TaskID with no Task implementation. 当所有被命名的TaskID的相互依赖关系被满足时， it will decrement the reference count of all Tasks with that name.（多个task被分配相同的taskID?）。

APEX uses the Task class almost exclusively to manage CPU resources. The ApexScene defines a number of named Tasks that the modules use to schedule their own Tasks (ex: start after LOD calculations are complete, finish before the PhysX scene is stepped).

APEX几乎只使用Task Class来管理cpu 资源。 ApexScene定义了一些named tasks，并用这些task去shcedule their own tasks(ex: start after LOD calculations are complete, finish before the PhysX scene is stepped)。

LightCpuTask

LightCpuTask is another subclass of BaseTask that is explicitly scheduled by the programmer. LightCpuTasks have a reference count of 1 when they are initialized, so their reference count must be decremented before they are dispatched. LightCpuTasks increment their continuation task reference count when they are initialized, and decrement the reference count when they are released (after completing their run() function)

LightCpuTask时另一个BaseTask的子类（上面讲的Task类也是BaseTask的子类），这种task被用户显示地调度。该类被初始化时具有值为1的引用计数，因此，在其被dispatched之前，其引用计数必须被decremented. LightCpuTasks increment their continuation task reference count when they are initialized, and decrement the reference count when they are released (after completing their run() function)

PhysX 3.x uses LightCpuTasks almost exclusively to manage CPU resources. For example, each stage of the simulation update may consist of multiple parallel tasks, when each of these tasks has finished execution it will decrement the reference count on the next task in the update chain. This will then be automatically dispatched for execution when its reference count reaches zero.

PhysX 3.x 基本上只用LightCpuTasks 去管理cpu资源。例如，每个simulation update阶段由多个平行的task构成，当一个task执行完后，lightCPUTasks会减少下一个task的引用计数。当task的引用计数被减少到零时，其会被dispatched。

Note

Even when using LightCpuTasks exclusively to manage CPU resources, the TaskManager startSimulation() and stopSimulation() calls must be made each simulation step to keep the GpuDispatcher synchronized.

即使仅使用LightCpuTask管理cpu资源， TaskManager类的startSimulation(), StopSimulation()函数必须在每个simulation step都被调用，以保持与GpuDispatcher的同步。

The following code snippets show how the crabs' A.I. in SampleSubmarine is run as a CPU Task. By doing so the Crab A.I. is run as a background Task in parallel with the PhysX simulation update.

以下例子展示了SampleSubmarine 例子中的carb's A.I 如何被作为一个CPU task。该例子中， crab A.I 以后台线程运行，与physx simulation update并行。

For a CPU task that does not need handling of multiple continuations LightCpuTask can be subclassed. A LightCpuTask subclass requires that the getName and a run method be defined:

对于不需要处理 multiple continuations的cpu task，用于可继承LightCpuTask来实现自己的子类，该子类必须定义getName和run函数。

class Crab: public ClassType, public physx::PxLightCpuTask, public SampleAllocateable
{
public:
    Crab(SampleSubmarine& sample, const PxVec3& crabPos, RenderMaterial* material);
    ~Crab();
    [...]

    // Implements LightCpuTask
    virtual  const char*    getName() const { return "Crab AI Task"; }
    virtual  void           run();

    [...]
}

After PxScene::simulate() has been called, and the simulation started, the application calls removeReference() on each Crab task, this in turn causes it to be submitted to the CpuDispatcher for update. Note that it is also possible to submit tasks to the dispatcher directly (without manipulating reference counts) as follows:

PxScene::simulate()被调用后，simulation开始。然后， application 对每个Crab task(LightCpuTask的子类对象)调用removeReference()，使得Crab taks被提交给CpuDispatcher 执行。注意：不考虑引用计数，直接把tasks提交给dispatcher也是可以可能的：

PxLightCpuTask& task = &mCrab;
mCpuDispatcher->submitTask(task);

Once queued for execution by the CpuDispatcher, one of the thread pool's worker threads will eventually call the task's run method. In this example the Crab task will perform raycasts against the scene and update its internal state machine:

task被提交给CpuDispatcher后，CpuDispatcher会将其排队，最终线程池子中的一个work线程会去调用该task的run函数，于是该task就被跑起来了。下面例子中的run函数干的事情：执行raycast ，update 内部状态机。

void Crab::run()
{
    // run as a separate task/thread
    scanForObstacles();
    updateState();
}

It is safe to perform API read calls, such as scene queries, from multiple threads while simulate() is running. However, care must be taken not to overlap API read and write calls from multiple threads. In this case the SDK will issue an error, see Threading for more information.

当simulate函数正在运行时，从多个线程执行API 读函数是安全的。但是，不能从多个线程中overLap 读和写，这会导致sdk报错。

An example for explicit reference count modification and task dependency setup:

例子：显示地修改reference count，设置的依赖关系：（先设置依赖关系，然后全部启动。数组中的每个元素都是一个task吧）

// assume all tasks have a refcount of 1 and are submitted to the task manager
// 3 task chains a0-a2, b0-b2, c0-c2
// b0 shall start after a1
// the a and c chain have no dependencies and shall run in parallel
//
// a0-a1-a2
//      \
//       b0-b1-b2
// c0-c1-c2

// setup the 3 chains
for(PxU32 i = 0; i < 2; i++)
{
    a[i].setContinuation(&a[i+1]);
    b[i].setContinuation(&b[i+1]);
    c[i].setContinuation(&c[i+1]);
}

// b0 shall start after a1
b[0].startAfter(a[1].getTaskID());

// setup is done, now start all task by decrementing their refcount by 1
// tasks with refcount == 0 will be submitted to the dispatcher (a0 & c0 will start).
for(PxU32 i = 0; i < 3; i++)
{
    a[i].removeReference();
    b[i].removeReference();
    c[i].removeReference();
}

First Snowflakes

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
1
评论
Nvidia PhysX 学习文档6： Threading

official site: https://gameworksdocs.nvidia.com/PhysX/4.1/documentation/physxguide/Manual/Threading.html#threading红色代表需要弄懂的。ThreadingIntroductionThis chapter explains how to use PhysX in mu...
复制链接

扫一扫