这里写目录标题
CUsema_st和CUsemaPool_st组成
Semaphore abstractions
定义
CUSW_UNIT_SYNC provides methods to allocate and deallocate semaphores. It also provides methods to read/write semaphore values via CPU. The unit depends on the hardware abstraction layer (CUSW_UNIT_HAL unit) to provide methods that Acquire and Release semaphores on various GPU engines.
A semaphore is a memory location that is mapped and accessible to both GPU and CPU. GPU engine can primarily do two types of operations on a semaphore.
Release Semaphore (value) - writes the ‘value’ at the memory location represented by the semaphore.
Acquire Semaphore (value) - engine waits till the memory location represented by semaphore reaches the specified ‘value’.
Semaphore are classified primarily into two types: GPU semaphores which are released by GPU and CPU semaphores which are released by CPU. Acquire can happen on both CPU and GPU. Using the GPU semaphores, GPU can indicate work completion through semaphore release operation and GPU can wait on a pending work through acquire semaphore.
The different categories of semaphores are listed below:
Channel Tracking Semaphore
This abstraction represents a GPU semaphore which tracks the completion of in-order GPU tasks. In-order tasks include memcopy operations executed by the copy engine and methods executed by the GPU host unit. Since work submission to GPU from a channel happens in an inorder fashion, if a channel tracking semaphore release has been enqueued following a sequence of operations, its release guarantees that all preceding in-order tasks have been completed by the hardware (eg. DMA copies, GPU host semaphore acquires, cache flushes etc.). For more details, refer to the architecture design document for the CUSW_UNIT_CHANNEL unit.
QMD Semaphore
This abstraction represents a GPU semaphore which tracks the completion of a QMD/cuda kernel. Kernel launches on the same channel are submitted in order but they can complete in any order so we need to track individual kernel launches separately. Every kernel launch adds a QMD semaphore release method which is executed by the SKED unit inside compute engine upon completion of the kernel.
CPU Semaphore
The CPU semaphore is a conventional, operating system semaphore, which can be waited for and released, and managed through operating system specific functions to acquire and release. The CUDA driver uses CPU semaphore to signal completion of a CPU work, such as, when tracking a callback completion.
Semaphore pool
A CUDA context contains distinct semaphore pools for sub-allocating GPU semaphores and CPU semaphores that it creates during the initialization phase. The pools are created at initialization to reduce repeated memory allocation and deallocation at runtime for creating or destroying a single semaphore, which can lead to unpredictable behavior, such as repeated allocation and deallocation overhead, and memory fragmentation resulting from arbitrary patterns in deallocation and reallocation requests. CUSW_UNIT_SYNC unit contains methods to initialize and configure this pool of semaphores.
When there are no free semaphore available, CUDA driver increases the semaphore pool size by allocating more memory,
During interop and interprocess synchronization, backing memory for the semaphore pool can be allocated outside of CUDA driver and mapped to CUDA.
The class diagram below depicts relationships between the various semaphore related data types.
Detailed functional description
Allow creation and destruction of a Semaphore pool held within a context,
Provides functions to fetch memory handles corresponding with semaphore pool allocation,
Allocation and releasing of memory for semaphore underlying a semaphore-handle,
Access and modify semaphore’s memory location, both, locally (current context) and across contexts (different device’s context) from CPU,
Discard semaphores when it is complete and updates corresponding slots in the pool as reusable,
Provides functions to carry over unreleased semaphores into a target marker, so the modified marker can be used to wait for the release of those semaphores as well,
Provides functions to read the timing information associated with a GPU semaphore from CPU.
Provides functions to share semaphore pool memory handle with other CUDA contexts.

Code
/**
* \addtogroup CUISYNC
* @{
*
* Semaphore releases are either one or four word operations and the address of the semaphore
* has to be aligned to its size (i.e. one word release requires a 4-byte alignment,
* four word release requires 16-byte aligned address).
* Semaphore acquires are always one word and require 4-byte alignment.
*/
#define CU_NUM_SEMAPHORES_PER_PAGE 4096
#define CU_SEMAPHORE_ONE_WORD_SIZE 4
#define CU_SEMAPHORE_ONE_WORD_PAGE_SIZE (CU_NUM_SEMAPHORES_PER_PAGE * CU_SEMAPHORE_ONE_WORD_SIZE)
#define CU_SEMAPHORE_QMD_SIZE 8
#define CU_SEMAPHORE_QMD_PAGE_SIZE (CU_NUM_SEMAPHORES_PER_PAGE * CU_SEMAPHORE_QMD_SIZE)
#define CU_SEMAPHORE_FOUR_WORDS_SIZE 16
#define CU_SEMAPHORE_FOUR_WORDS_PAGE_SIZE (CU_NUM_SEMAPHORES_PER_PAGE * CU_SEMAPHORE_FOUR_WORDS_SIZE)
/**
* The semaphore contains data of different types depending on the kind of work i.e Kernel or UVM.
* There is another type if the timing has to be recorded.
*
*/
typedef enum CUsemaType_en
{
//! One word semaphore type to track any work completion on GPU
CU_SEMA_TYPE_ONE_WORD,
//! It is a special type of semaphore which tracks kernel launch on the GPU.
CU_SEMA_TYPE_QMD,
//! Four Word semaphore type for synchronizing and timing work on GPU
CU_SEMA_TYPE_FOUR_WORDS,
//! Channel tracking semaphores
CU_SEMA_TYPE_CHANNEL_TRACKING,
//! UVM semaphore type
CU_SEMA_TYPE_UVM,
//! External semaphores of one word size
CU_SEMA_TYPE_EXTERNAL_ONE_WORD,
//! CPU semaphore type
CU_SEMA_TYPE_CPU,
} CUsemaType;
/**
* \brief The semaphore data type is one word size.
*/
typedef struct CUsemaDataOneWord_st
{
//! This field is the semaphore value on which the synchronization is done.
NvU32 payload;
} CUsemaDataOneWord;
ct_assert(sizeof(CUsemaDataOneWord) == CU_SEMAPHORE_ONE_WORD_SIZE);
/**
* \brief Special semaphore type used for QMDs
* QMD semaphores have a 1-1 mapping to QMDs. They are used to track completion of kernel launches (grids).
*
* \detailDescription
* For each launch the semaphore starts with qmdSemaphoreGetInitialPayload() value
* and the grid is completed when the semaphore reaches qmdSemaphoreGetFinalPayload().
* When the QMD is reused the initial and final values are incremented by a constant.
* This allows for easy reuse of the semaphores as the >= acquires stay valid forever
* (in reality just for a very long time as the payload can overflow,
* but the same problem exists for other semaphores).
*
* CNP makes it tricky to keep the difference between launches constant as children launched
* by the grid on the GPU have to be tracked as well and with minimum overhead.
*
* From the release/acquire POV the semaphore is just a one word semaphore,
* where the released/acquired value is in the "payload" field.
* The CNP scheduling code, though, treats the whole 64bit struct as a single integer and does atomic operations on it.
* The layout works out so that the "lowerWord" field is the less significant bits of the 64bit value.
* Given that the "lowerWord" starts at 0, it is guaranteed that the first 64bit atomicDec() on the 64bit value
* will decrement the "payload" value by 1 and set "lowerWord" to 0xffffffff. Subsequent 2^32 - 1 decrements would
* only modify the "lowerWord". When the atomic decrements are all eventually followed by an atomic increment
* the 1 will be returned to the "payload" and "lowerWord" will again be 0.
*
* This allows for an easy implementation of the CNP semantics where the CNP scheduling code decrements the value when
* a CTA context is created (this happens when a CTA launches the first child) and increments it
* back when all the children of that CTA are completed.
* In addition to that, the QMD semaphore release does a 32bit atomic add on the payload so that it reaches the final value.
* For a description of the specific values the QMD semaphore takes, see comments in qmd.c
*
* This ensures that the "payload" value will only reach the final value when both the grid and all the children are completed.
*
* Old implementation:
*
* In the past the qmd semaphores used to have the following semantics:
* ==0 - grid and its children finished
* >=1 - grid and/or children still running (or not started)
*
* This caused some problems as QMDs (and QMD semaphores) are a limited resource and have to be reused.
* To reuse a QMD semaphore we had to ensure that all acquires waiting on it have been already consumed,
* which required extra tracking and has never been correctly implemented for remote acquires (multi-gpu).
*/
typedef struct CUsemaDataQmd_st
{
//! This field is the semaphore value on which the synchronization is done.
NvU32 lowerWord;
//! This value is decremented on every kernel launch and incremented when the kernel launch is completed.
NvU32 payload;
} CUsemaDataQmd;
ct_assert(sizeof(CUsemaDataQmd) == CU_SEMAPHORE_QMD_SIZE);
/**
* \brief Four word semaphore data structure
*/
typedef struct CUsemaDataFourWords_st
{
//! This field is the value on which the synchronization is done.
NvU32 payload;
//! This field is not used.
NvU32 unused;
//! The total time elapsed from the semaphore initialization.
NvU64 timer;
} CUsemaDataFourWords;
ct_assert(sizeof(CUsemaDataFourWords) == CU_SEMAPHORE_FOUR_WORDS_SIZE);
CUsemaPool_st
// Pool of semaphores held by a context
struct CUsemaPool_st
{
// owning context
CUctx *ctx;
// This protects all fields on CUsemaPool and CUsemaPage
// that may be modified after creation.
CUImutex mutex;
// is semaphorePoolMakeSpace allowed?
NvBool isFixedSize;
// type of the semaphores in the pool
CUsemaType type;
// size of the semaphore data
NvU32 semaphoreDataSize;
// offset of the payload in semaphore data
NvU32 semaphorePayloadOffsetInData;
// page size differs with the semaphore type
NvU64 pageSize;
// list of pages
CUsemaPage *pages;
// counter of free and abandoned semaphores
NvU64 freeOrAbandonedCount;
// semaphores are cached in gpu L2 (i.e. we must never write the semaphore from the host)
NvBool gpuCacheEnabled;
};
/**
* \brief Create a semaphore pool of a given semaphore type
*
* \detailDescription This function allocates a memory for the semaphore pool
* and initializes the pool with the \p semaType, \p gpuCacheEnabled and sets it \p ppool.
*
* \param[in] ctx Semaphore pool's context
* \param[in] semaType Determines the type of semaphore
* \param[in] gpuCacheEnabled Whether the semaphore pool has gpu l2 cache enabled or not
* \param[out] ppool The gpu semaphore pool created.
*
* \return CUresult the result of the function
*
* \retval
* CUDA_SUCCESS
* \retval CUDA_ERROR_OUT_OF_MEMORY
*
* \additionalNotes
* On \p gpuCacheEnabled set to true, the acquire performance increases, but only the semaphore
* reads can be done on the host side, instead of read and write.
* \additionalNotes
* semaphorePoolCreate returns a CUDA_ERROR_OUT_OF_MEMORY if the malloc of the
* semaphore pool handle fails.
* \additionalNotes
* The semaphore's page size and data size and are dependent on the semaphore type.
*
* \endfn
*/
CUDA_TEST_EXPORT CUresult semaphorePoolCreate(CUctx *ctx, CUsemaType semaType, NvBool gpuCacheEnabled, CUsemaPool **ppool);
/**
* \brief Import external memory for the semaphore pool
*
* \detailDescription #semaphorePoolImportMemobj imports the memory for the
* semaphore pool from the memobj mentioned. It creates a semaphore page
* from memory represented by \p memobj.
*
* \param[in] pool The semaphore pool on which to operate
* \param[in] memobj The memobj from which the semaphore imports
*
* \return CUresult the result of the function
*
* \retval
* CUDA_SUCCESS
* \retval CUDA_ERROR_OUT_OF_MEMORY
*
* \additionalNotes
* The pool's page size should be smaller than size of the memobj passed.
* Any remaining extra memory is unused.
* \additionalNotes
* The pool imports memobj, so we are not allowed to dynamically allocate more.
*
* \endfn
*/
CUDA_TEST_EXPORT CUresult semaphorePoolImportMemobj(CUsemaPool *pool, CUmemobj *memobj);
/**
* \brief Destroy the semaphore pool
*
* \detailDescription semaphorePoolDestroy Function destroys the semaphore pool and
* frees the memory for all the pages of the pool.
*
* \param[out] ppool Pointer to a pointer of the semaphore pool which should be destroyed
*
* \additionalNotes
* It is valid for \p ppool to point to NULL.
* \additionalNotes
* It frees the semaphore pool and sets the \p ppool to NULL.
* \endfn
*/
CUDA_TEST_EXPORT void semaphorePoolDestroy(CUsemaPool **ppool);
/**
* \brief Make the semaphore pool portable (shared with other contexts)
*
* \detailDescription
* semaphoreMakePortable makes the semaphore pool portable by making all the
* semaphore pages available to all the contexts. This function expects the pool to be
* valid.
*
* \param[in] pool The semaphore pool which is to be made portable
*
* \return CUresult the success or error
* \retval
* CUDA_SUCCESS
* \retval CUDA_ERROR_OUT_OF_MEMORY
* \retval CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED
*
* \additionalNotes
* \notesync
* \additionalNotes
* This function shall be called only during context initialize and not re-entrant.
* \additionalNotes
* This function shall not do any clean up as that should be handled when
* context de-initializes.
* \additionalNotes
* The channels semaphores have been shared with other contexts.
* \additionalNotes
* This whole scheme will need to be revisited to support QMDs, etc.(Old Comment).
* \additionalNotes
* Currently the function is not affecting any pages that are allocated after this
* function was called. If #semaphoreAlloc will cause an allocation of a new page
* it will not be made portable.
*
* \endfn
*/
CUDA_TEST_EXPORT CUresult semaphorePoolMakePortable(CUsemaPool *pool);
/**
* \brief Get the memobj of the semaphore pool
*
* \detailDescription Returns the memobj which represents the \p{memobjIndex}'th
* page.
*
* \param[in] pool The semaphore pool
* \param[in] memobjIndex The index of page for which we want the memobj
*
* \return CUmemobj * the memobj we requested
*
* \additionalNotes
* semaphorePoolGetMemobj shall return the memobj associated with the page present at the memobjIndex of the pool.
* \additionalNotes
* memobjIndex starts at 0 and ends at the first value where the function returns NULL.
* \additionalNotes
* It is used to track semaphore memory in WDDM.
* \endfn
*/
CUDA_TEST_EXPORT CUmemobj *semaphorePoolGetMemobj(CUsemaPool *pool, NvU32 pageIndex);
CUresult
semaphorePoolCreate(CUctx *ctx, CUsemaType semaType, NvBool gpuCacheEnabled, CUsemaPool **ppool)
{
CUsemaPool *pool;
CU_TRACE_FUNCTION();
CU_ASSERT(ctx);
CU_ASSERT(ppool);
// CPU semaphores cannot be cached by GPU as they are, by definition, written
// by CPU.
CU_ASSERT(!(semaType == CU_SEMA_TYPE_CPU && gpuCacheEnabled));
// make sure host and device representations of semaphore data are the same
pool = (CUsemaPool *)malloc(sizeof(*pool));
if (NULL == pool) {
return CUDA_ERROR_OUT_OF_MEMORY;
}
memset(pool, 0, sizeof(*pool));
cuiMutexInitialize(&pool->mutex, CUI_MUTEX_ORDER_SEMAPHORE_POOL, CUI_MUTEX_DEFAULT);
pool->ctx = ctx;
pool->type = semaType;
pool->gpuCacheEnabled = gpuCacheEnabled;
switch (pool->type) {
case CU_SEMA_TYPE_UVM:
case CU_SEMA_TYPE_ONE_WORD:
pool->pageSize = CU_SEMAPHORE_ONE_WORD_PAGE_SIZE;
pool->semaphoreDataSize = CU_SEMAPHORE_ONE_WORD_SIZE;
pool->semaphorePayloadOffsetInData = offsetof(CUsemaDataOneWord, payload);
break;
case CU_SEMA_TYPE_QMD:
pool->pageSize = CU_SEMAPHORE_QMD_PAGE_SIZE;
pool->semaphoreDataSize = CU_SEMAPHORE_QMD_SIZE;
pool->semaphorePayloadOffsetInData = offsetof(CUsemaDataQmd, payload);
break;
case CU_SEMA_TYPE_CHANNEL_TRACKING:
case CU_SEMA_TYPE_FOUR_WORDS:
case CU_SEMA_TYPE_CPU: // FIXME: Should CPU semaphores be one word?
pool->pageSize = CU_SEMAPHORE_FOUR_WORDS_PAGE_SIZE;
pool->semaphoreDataSize = CU_SEMAPHORE_FOUR_WORDS_SIZE;
pool->semaphorePayloadOffsetInData = offsetof(CUsemaDataFourWords, payload);
break;
case CU_SEMA_TYPE_EXTERNAL_ONE_WORD:
CU_ERROR_PRINT(("External semaphores are not allocated by the CUDA driver so semaphorePoolCreate should not be called.\n"));
CU_ASSERT(0);
break;
}
*ppool = pool;
return CUDA_SUCCESS;
}
CUsema_st
// Data stored with a semaphore handle passed back to the driver
struct CUsema_st
{
// owning page
CUsemaPage* page;
// CPU pointer to semaphore memory
union {
CUsemaDataOneWord *oneWord;
CUsemaDataQmd *qmd;
CUsemaDataFourWords *fourWords;
} data;
volatile NvU32 *payload;
// GPU virtual address of the payload
NvU64 offset;
// offset of the payload from the page the semaphore is from
NvU32 offsetFromPage;
// index of this semaphore in the owning pool semaphore memory
NvU32 index;
CUsemaType type;
};
CUsemaPage_st
// Chunk of semaphores in the pool of semaphores
struct CUsemaPage_st
{
// owning pool
CUsemaPool *pool;
// previous and next pointers in pool's page list
CUsemaPage *prev;
CUsemaPage *next;
// backing memory
CUmemobj *memory;
// array of unused indices
NvU32 freeCount;
NvU32 freeIndices[CU_NUM_SEMAPHORES_PER_PAGE];
// array of indices of semaphores whose results have
// been abandoned
// - once the payload of one of these semaphores becomes
// its abandonedPayloads[.] value, it can be marked as free
NvU32 abandonedCount;
NvU32 abandonedIndices[CU_NUM_SEMAPHORES_PER_PAGE];
NvU32 abandonedPayloads[CU_NUM_SEMAPHORES_PER_PAGE];
};
/**
* \brief Allocates a semaphore.
*
* \detailDescription semaphoreAlloc allocates the handle for the semaphore . It creates a backing
* from the semaphore pool. The function expects the pool and psema to be valid.
*
* \param[out] pool Semaphore pool from which the semaphore should be allocated
* \param[in] psema Semaphore reference allocated
*
* \return CUresult the result of the function
*
* \retval
* CUDA_SUCCESS
* \retval CUDA_ERROR_OUT_OF_MEMORY
* \retval CUDA_ERROR_INVALID_VALUE
*
* \additionalNotes
* \notesync
* \additionalNotes
* The function returns CUDA_ERROR_OUT_OF_MEMORY if the semaphore pool is fixed and the pool's
* abandonedOrFreeCount is 0.
* \additionalNotes
* On any internal failure, the function should free the semaphore's
* structure and memset it to 0.
* \endfn
*/
CUDA_TEST_EXPORT CUresult semaphoreAlloc(CUsemaPool *pool, CUsema **psema);
/**
* \brief Register a semaphore that was not created directly by the CUDA Driver
*
* \detailDescription semaphoreRegisterExternal allocates a semphore structure
* and registers the semaphore with the device address and host address passed as
* parameter.
*
* \param[out] psema The semaphore
* \param[in] devaddr The device address
* \param[in] hostaddr The host address of the semaphore
*
* \return CUresult the result of the function
*
* \retval
* CUDA_SUCCESS
* \retval CUDA_ERROR_OUT_OF_MEMORY
*
* \additionalNotes
* semaphoreRegisterExternal returns CUDA_ERROR_OUT_OF_MEMORY, if the allocation of sema
* fails.
* \additionalNotes
* semaphoreRegisterExternal sets the sema->page to NULL and sema->offsetFromPage to 0.
* \additionalNotes
* semaphoreRegisterExternal expects the hostaddr to point to an allocation of atleast
* size of CUsemaDataOneWord.
* \additionalNotes
* The semaphore data is of type CU_SEMA_TYPE_EXTERNAL_ONE_WORD.
*
* \endfn
*/
CUDA_TEST_EXPORT CUresult semaphoreRegisterExternal(CUsema **psema, NvU64 devaddr, void* hostaddr);
/**
* \brief Free a semaphore
*
* \detailDescription semaphoreFree frees the \p sema handle and returns
* the backing semaphore to the pool immediately.
*
* It is user responsibility to ensure there is no pending releases to
* this semaphore. It is also user responsibility to ensure there is no
* pending acquires to semaphores of type #CU_SEMA_TYPE_ONE_WORD,
* #CU_SEMA_TYPE_FOUR_WORDS or #CU_SEMA_TYPE_CHANNEL_TRACKING.
*
* \param[in] sema Semaphore handle to be freed
*
* \return void
*
* \endfn
*/
CUDA_TEST_EXPORT void semaphoreFree(CUsema *sema);
CUresult
semaphoreAlloc(CUsemaPool *pool, CUsema **psema)
{
CUresult status = CUDA_SUCCESS;
CUsema *sema = NULL;
CU_ASSERT(pool);
CU_ASSERT(psema);
// allocate the driver's handle
sema = (CUsema *)malloc(sizeof(*sema));
if (NULL == sema) {
CU_ERROR_PRINT(("malloc failure in semaphoreAlloc\n"));
status = CUDA_ERROR_OUT_OF_MEMORY;
goto Exit;
}
sema->type = pool->type;
status = semaphorePoolGetBacking(pool, sema);
if (CUDA_SUCCESS != status) {
goto FreeAndExit;
}
*psema = sema;
return CUDA_SUCCESS;
FreeAndExit:
memset(sema, 0, sizeof(*sema));
free(sema);
Exit:
return status;
}
CPU sema
/**
* \brief Release the CPU semaphore
*
* \detailDescription cpuSemaphoreRelease sets the semaphore's payload to new
* value and broadcasts to all the threads which are waiting on the condition.
*
* \param[in] ctx Context to which the semaphore belongs to.
* \param[in] sema Host pointer to semaphore which value is released.
* \param[in] payloadToRelease new payload which is written to the semaphore.
*
* \endfn
*/
CUDA_TEST_EXPORT void cpuSemaphoreRelease(CUctx *ctx, volatile NvU32 *sema, NvU32 payloadToRelease);
/**
* \brief Create marker entry for CPU semaphore
*
* \detailDescription This is a helper function meant to
* create an entry point tracking passed CPU sema with given
* value
*
* \param[in] sema The cpu semaphore to be awaited.
* \param[in] completionValue Value denoting completion.
* \param[out] markerEntry Marker entry to be filled.
* \endfn
*/
CUDA_TEST_EXPORT void cpuSemaphoreCreateMarkerEntry(CUsema *sema, NvU32 completionValue, CUctxMarkerCpuSema *markerEntry);
/**
*
* \brief Acquire the cpu semaphore completionPayload in the specified stream.
*
* \detailDescription
* cpuSemaphoreAcquire() acquires the semaphore represented by \p cpuMarkerEntry into the
* pushbuffer represented by nvCurrent. This function assumes channel, nvCurrent and
* \p cpuMarkerEntry are all valid.
*
* \param[in] channel The channel on which we are going to submit the push buffer
* \param[out] nvCurrent PB pointer we are going to write the h/w methods
* \param[in] cpuMarkerEntry Marker entry which is acquired
*
* \return CUnvCurrent* the update pushbuffer pointer after we are done writing the
* methods
*
* \additionalNotes
* cpuSemaphoreAcquire shall write the necessary H/W methods for acquiring the CPU semaphore into the pushbuffer pointed by nvCurrent.
*
* \additionalNotes
* cpuSemaphoreAcquire should increment the nvCurrent pointer by whatever number of bytes it writes to it.
* \additionalNotes
* The caller should first call the channelBlockSubmitUntilCpuSemaAcquire.
* \endfn
*/
CUDA_TEST_EXPORT CUnvCurrent *cpuSemaphoreAcquire(CUnvchannel *channel, CUnvCurrent *nvCurrent, CUctxMarkerCpuSema *cpuMarkerEntry);
/**
* \brief Returns if the CPU semaphore has been released or not.
*
* \detailDescription Returns whether the CPU semaphore represented by \p is released
* or not. This function assumes semaRef is valid.
*
* \param[in] markerEntry Marker entry containing semaphore and value to be checked
*
* \return NvBool whether the semaphore has been released or not
*
* \retval
* NV_TRUE,
* NV_FALSE
*
* \endfn
*/
CUDA_TEST_EXPORT NvBool cpuSemaphoreHasBeenReleased(CUctxMarkerCpuSema *markerEntry);
/**
* \brief Wait till the cpu semaphore has been released
*
* \detailDescription cpuSemaphoreWait does a blocking wait till the cpu semaphore is released.
*
* \param[in] markerEntry Marker entry containing semaphore and value to be waited upon
*
* \return the result of the function.
*
* \retval CUDA_SUCCESS
* \retval CUDA_ERROR_OPERATING_SYSTEM
* \retval CUDA_ERROR_NOT_SUPPORTED
*
* \endfn
*/
CUDA_TEST_EXPORT CUresult cpuSemaphoreWait(CUctxMarkerCpuSema *markerEntry);
核心操作
/**
* \brief Abandon a semaphore
*
* \detailDescription semaphoreAbandon() frees the \p sema handle and
* marks the memory backing the semaphore available for reallocation
* when \p payloadWhenFree is released to the semaphore.
*
* It is user responsibility to ensure there is no pending releases to
* this semaphore with value above \p payloadWhenFree. It is also user
* responsibility to ensure that all acquires to semaphores
* of type #CU_SEMA_TYPE_ONE_WORD, #CU_SEMA_TYPE_FOUR_WORDS or
* #CU_SEMA_TYPE_CHANNEL_TRACKING happen strictly before
* \p payloadWhenFree is released.
*
* \param[in] sema Semaphore handle to be freed
* \param[in] payloadWhenFree Final payload which will be released to semaphore
*
* \additionalNotes
* semaphoreAbandon takes the mutex on sema->page->pool before doing
* updates on the sema->page.
*
* \endfn
*/
CUDA_TEST_EXPORT void semaphoreAbandon(CUsema *sema, NvU32 payloadWhenFree);
/**
* \brief Refresh the storage for a semaphore
*
* \detailDescription semaphoreRefreshStorage abandons seamphore \p sema
* and gets a new one represented by the same handle.
*
* \param[in] sema Semaphore handle to be freed
* \param[in] payloadWhenFree Final payload which will be released to semaphore before refresh
*
* \return void
*
* on the pool.
* \additionalNotes
* semaphoreRefreshStorage returns early if the semaphore's payload matches the payloadWhenFree.
*
* \endfn
*/
CUDA_TEST_EXPORT void semaphoreRefreshStorage(CUsema *sema, NvU32 payloadWhenFree);
/**
* \brief Returns the time of last release of semaphore
*
* \detailDescription
* semaphoreGetTime returns the time of last release of semaphore
* expressed as number of nanoseconds elapsed since some unspecified
* point, which is fixed per context.
*
* \param[in] sema Semaphore queried about time
*
* \return NvU64 Number of nanoseconds elapsed before semaphore release.
*
* \additionalNotes
* The semaphore must be #CU_SEMA_TYPE_FOUR_WORDS with disabled caching.
*
* \endfn
*/
NvU64 semaphoreGetTime(CUsema *sema);
/**
* \brief Set the payload of the semaphore
*
* \detailDescription
* semaphoreSetPayload() sets the payload field of the semaphore.
*
* \param[in] sema Handle of semaphore which will be updated
* \param[in] payload Value with which semaphore is currently released
*
* \additionalNotes
* The semaphore should be from a pool which has the gpu cache enabled.
*
* \endfn
*/
CUDA_TEST_EXPORT void semaphoreSetPayload(CUsema* sema, NvU32 payload);
/**
* \brief Get the payload from the semaphore
*
* \detailDescription semaphoreGetPayload returns the latest released
* value.
*
* \param[in] sema Handle of semaphore from which value is retrieved
* \return NvU32 the semaphore payload
*
* \additionalNotes
* semaphoreGetPayload shall read the payload in a volatile way to prevent compiler from
* re-ordering optimizations.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU32 semaphoreGetPayload(CUsema* sema);
/**
* \brief Get the payload address from the semaphore
*
* \detailDescription semaphoreGetPayload returns the address of payload
* of the semaphore structure. This function assumes that sema is valid.
*
* \param[in] sema The semaphore reference on which to operate
* \return volatile NvU32 * address of the semaphore payload
*
* \endfn
*/
CUDA_TEST_EXPORT volatile NvU32 *semaphoreGetPayloadAddr(CUsema* sema);
/**
* \brief Returns the memobj of the semaphore's page
*
* \detailDescription semaphoreGetMemobj returns the memobj handle of the semaphore in the
* semaphore pool memory.
*
* \param[in] sema Handle of semaphore for which memobj is returned
*
* \return CUmemobj the memobj of the semaphore's page
*
* \additionalNotes
* semaphoreGetMemobj returns the memory of the page which contains the semaphore.
*
* \endfn
*/
CUDA_TEST_EXPORT CUmemobj *semaphoreGetMemobj(CUsema *sema);
/**
* \brief Get the context of the semaphore
*
* \detailDescription semaphoreGetCtx returns the context on semaphore.
*
* \param[in] sema Handle of semaphore for which context is returned
*
* \endfn
*/
CUDA_TEST_EXPORT CUctx *semaphoreGetCtx(CUsema *sema);
/**
*
* \brief GPU VA address of the semaphore
*
* \detailDescription Returns the "GPU offset" (GPU VA) of the semaphore.
*
* \param[in] sema Handle of semaphore for which GPU VA is returned
*
* \additionalNotes
* Returned address of the semaphore is valid only for its context.
* For other contexts #semaphoreGetOffsetRemote needs to be used.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 semaphoreGetOffset(CUsema *sema);
/**
*
* \brief Get the relative address of semaphore within the page.
*
* \detailDescription Returns the offset of the semaphore data in the
* semaphore pool page.
*
* \param[in] sema Handle of semaphore for which relative offset is returned
*
* \return NvU64 the GPU vaddr of the semaphore's data
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 semaphoreGetDataOffset(CUsema *sema);
/**
*
* \brief Index of the semaphore in the page.
*
* \detailDescription Returns the index of the semaphore in the page.
*
* \param[in] sema The semaphore reference on which to operate
*
* \return NvU32 the index of the semaphore in the semaphore pool
*
* \endfn
*/
CUDA_TEST_EXPORT NvU32 semaphoreGetIndex(CUsema *sema);
/**
* \brief Return the one-word semaphore data.
*
* \detailDescription Returns the CUsemaDataOneWord pointer. The
* semaphore must be #CU_SEMA_TYPE_ONE_WORD or #CU_SEMA_TYPE_UVM.
*
* \param[in] sema The semaphore handle for which data is returned
*
* \return CUsemaDataOneWord* the reference to CUsemaDataOneWord of the semaphore
*
* \endfn
*/
CUDA_TEST_EXPORT volatile CUsemaDataOneWord *semaphoreGetSemaDataOneWord(CUsema *sema);
/**
* \brief Returns the qmd semaphore data.
*
* \detailDescription Returns the CUsemaDataQmd pointer. The
* semaphore must be #CU_SEMA_TYPE_QMD.
*
* \param[in] sema The semaphore handle for which data is returned
*
* \return CUsemaDataQmd the address of the Qmd type of the semaphore reference
*
* \endfn
*/
CUDA_TEST_EXPORT volatile CUsemaDataQmd *semaphoreGetSemaDataQmd(CUsema *sema);
/**
* \brief Returns the four word semaphore data type
*
* \detailDescription Returns the CUsemaDataFourWords pointer. The
* semaphore must be #CU_SEMA_TYPE_FOUR_WORDS or #CU_SEMA_TYPE_CPU.
*
* \param[in] sema The semaphore handle for which data is returned
*
* \return CUsemaDataFourWords Address of the four word semaphore
*
* \additionalNotes
* The function expects that the semaphore's pool gpu cache is disabled.
*
* \endfn
*/
CUDA_TEST_EXPORT volatile CUsemaDataFourWords *semaphoreGetSemaDataFourWords(CUsema *sema);
/**
* \brief Get the GPU VA of a semaphore in other context.
*
* \detailDescription Returns the offset of the semaphore in context
* \p ctx.
*
* \param[in] sema The semaphore handle for which VA is returned
* \param[in] ctx The context for which the address is returned
*
* \return NvU64 the semaphore address in \p ctx.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 semaphoreGetOffsetRemote(CUsema *sema, CUctx *ctx);
/**
* \brief Get the CUmemobj that holds semaphore in other context.
*
* \detailDescription semaphoreGetMemobjRemote returns the shared instance
* of memobj for \p sema in context \p ctx.
*
* \param[in] sema The semaphore handle for which memobj is returned
* \param[in] ctx The context handle of the semaphore
*
* \return CUmemobj the memobj handle which points to shared instance of
* the semaphore
*
* \additionalNotes
* semaphoreGetMemobjRemote expects that pool is made portable (see
* #semaphoreMakePortable).
*
* \endfn
*/
312

被折叠的 条评论
为什么被折叠?



