CUSW_UNIT_CHANNEL implements a SW channel abstraction corresponding to HW channels for work submission. The CPU writes GPU methods (GPU commands) into the channel and GPU reads them from the channel. The CPU and GPU can be viewed as a producer-consumer pair talking to each other using a channel.
One channel acts as conduit of commands to one engine on the GPU (compute, memcpy, etc.). Methods submitted to a channel must be targeted for the engine that the channel is associated with. The channel’s type reflects this mapping. Multiple channels of same type are grouped into a channel pool as described later.
Each channel has a ring-buffer known as a pushbuffer. CPU writes GPU methods and data into the pushbuffer. The pointer at which data is written to by CPU is called as CPU put pointer (the tail of the pushbuffer). The pointer from which GPU reads is called as GPU get pointer (the head of the pushbuffer).
GPU methods written to a channel are taken up by GPU for execution in the order in which they were written. Work completion however may occur out of order for some types of channels. In addition, methods submitted in different channels may start executing in parallel.
GPU methods and data written to the channel’s pushbuffer may be buffered by the driver and sent to GPU in batches. Sending such buffered contents to GPU is termed as a channel flush. Clients of CUSW_UNIT_CHANNEL must explicitly ask the channel to flush its contents to ensure that the methods and data written to the channel will eventually be seen by the GPU. For instance, when busy-waiting on the CPU for a GPU action to take place (e.g. waiting for the completion of a kernel launch), the channel has to be flushed before starting the wait. Otherwise the GPU may never execute the action and the CPU thread waiting for it will deadlock.
The channel interface provides functions for the following operations:
GPU work submission to a channel.
Get a CPU pointer to the channel’s pushbuffer.
CPU can then write methods into the pushbuffer.
Notify the channel that the CPU is done writing to the pushbuffer.
Marker query and wait.
Query a marker representing the last work submitted to the channel.
Acquire a marker in the channel.
This inserts an asynchronous wait in the channel for the given marker to complete. This is the basic mechanism in the driver to establish dependencies between GPU work. It is used for maintaining stream order and also for maintaining user-submitted dependencies across streams.
Add memory objects to the channel’s memory tracking list.
This is required for some OS device models (notably WDDM) to ensure that the memory objects are resident in the GPU memory when the GPU needs them.
/**
* \defgroup CHANNEL The Channel Unit
* @{
*
*/
/**
* \brief Type of channel (determines which engines are available)
*
* Note that the channelIsAsyncMemcpy() helper assumes that !comppute
* implies one of the async memcpy types.
*/
typedef enum CUchannelType_enum {
//! Channel type Compute
CU_CHANNEL_TYPE_COMPUTE = 0,
//! Channel Async memcpy 0
CU_CHANNEL_TYPE_ASYNC_MEMCPY_0 = 1,
//! Channel Async memcpy 1
CU_CHANNEL_TYPE_ASYNC_MEMCPY_1 = 2,
//! Channel Async memcpy 2
CU_CHANNEL_TYPE_ASYNC_MEMCPY_2 = 3,
//! Channel Async memcpy 3
CU_CHANNEL_TYPE_ASYNC_MEMCPY_3 = 4,
//! Channel Async memcpy 4
CU_CHANNEL_TYPE_ASYNC_MEMCPY_4 = 5,
//! Channel Async memcpy 5
CU_CHANNEL_TYPE_ASYNC_MEMCPY_5 = 6,
//! Channel Async memcpy 6
CU_CHANNEL_TYPE_ASYNC_MEMCPY_6 = 7,
//! Channel Async memcpy 7
CU_CHANNEL_TYPE_ASYNC_MEMCPY_7 = 8,
//! Channel Async memcpy 8
CU_CHANNEL_TYPE_ASYNC_MEMCPY_8 = 9,
//! Channel Async memcpy 9
CU_CHANNEL_TYPE_ASYNC_MEMCPY_9 = 10,
} CUchannelType;
/**
* \brief Type of an engine
* This enum describes that the associated channel will send commands to this engine type.
*/
typedef enum
{
//! Compute Engine (The kernel launch are pushed on this engine)
CU_ENGINE_COMPUTE = 0,
//! TWOD Engine (This is defunct and only left in for tools compatibility)
CU_ENGINE_TWOD = 1,
//! mem2mem engine
CU_ENGINE_MEM2MEM = 2,
//! Async memcpy Engine (memcpy from host to device and device to host on this engine)
CU_ENGINE_ASYNC_MEMCPY = 3,
//! Type of engine count
CU_ENGINE_MAX = 4,
} CUengineType;
/**
* \brief Channel to Use.
*
* Identifies the type of channel to select
*/
typedef enum CUchannelUse_enum
{
//! Compute Channel
CU_CHANNEL_COMPUTE = 0,
//! Async memcpy from Host to Device
CU_CHANNEL_ASYNC_MEMCPY_HTOD = 1,
//! Async memcpy from Device to Host
CU_CHANNEL_ASYNC_MEMCPY_DTOH = 2,
//! Async memcpy piplined block linear
CU_CHANNEL_ASYNC_MEMCPY_PIPELINED_BLOCKLINEAR = 3,
//! Async Memcpy Non Pipelined Block Linear
CU_CHANNEL_ASYNC_MEMCPY_NONPIPELINED_BLOCKLINEAR = 4,
//! Async memcpy Peer to Peer
CU_CHANNEL_ASYNC_P2P = 5,
//! Host Ops Channel for Mobile Platforms
CU_CHANNEL_HOST_OPS = 6,
//! Async channel for MPS
CU_CHANNEL_ASYNC_MPS_RESERVED = 7,
//! Count of the Channel Use Enum
CU_CHANNEL_USE_COUNT = 8
} CUchannelUse;
//! Maximum number of compute channels supported
#define CU_CHANNEL_MAX_COMPUTE 32
//! Maximum number of channels per async engine supported by the driver
#define CU_CHANNEL_MAX_ASYNC_MEMCPY_PER_ENGINE 64
//! Maximum number of async engines supported by the driver
#define CU_ASYNC_MEMCPY_MAX_ENGINES (CU_MAX_COPY_ENGINES - 1)
/**
* \brief Identifies the type of channel push operation
*/
typedef enum CUchannelPush_enum
{
//! Default push operation
CU_CHANNEL_PUSH_NONE,
//! push at the end
CU_CHANNEL_PUSH_END,
//! push and then flush to the gpu
CU_CHANNEL_PUSH_FLUSH,
//! push and sync on this operation
CU_CHANNEL_PUSH_SYNC
} CUchannelPush;
typedef enum CUIP2PDirection_enum {
CUI_P2P_DIRECTION_NONE = 0,
CUI_P2P_DIRECTION_PULL = 1,
CUI_P2P_DIRECTION_PUSH = 2
} CUIP2Pdirection;
Static design
This section describes static aspects of the CUSW_UNIT_CHANNEL unit’s architecture.
Overview of software element architecture


Channel Manager
Channel Manager maintains all channel state for a CUDA context. It maintains one or more pools of channels for each engine on the GPU.
Channel manager interface is primarily used to get a channel to send commands to a particular engine on the GPU. The channel manager returns an appropriate channel handle based on the following inputs:
An enum specifying intended use for the channel. This can be one of the following:
Compute (launching kernels)
Asynchronous memcpy (host-to-device or device-to-host)
GPU host commands
Peer-to-peer memory transfer
Memory copy for MPS [TODO: Explain MPS somewhere and add reference here]
Optional boolean to indicate preference for least-recently used channel. If true then the least recently used channel is returned from the channel pool selected based on the channel use.
It is also responsible for selecting a channel to use for a stream push. To find the optimum channel for a stream push, the channel manager follows this heuristic: * Grab the least recently used (LRU) channel from the channel pool. * Use this channel for the first push in a stream. * For subsequent pushes, get the last channel used by this stream (stream query) * If this stream was the last stream to use it (channel query) then use this channel. * Otherwise, use the LRU channel.
This heuristic ensures that as long as number of streams is less than number of channels then each stream keeps using its own channel, thus preventing false dependencies between streams. In case there are more streams than channels it distributes work evenly across all channels, minimizing false dependencies.
The channel manager also creates and maintains a handle to a deferred procedure call (DPC) manager. This can be used to schedule CPU work in the future after some GPU work submitted on the channel is complete. This functionality is implemented in CUSW_UNIT_DPC.
Other channel manager functionality includes the following: * Channel manager creation and destruction. * Error state management. * Set async error. * Get async error. * Clear the error state. * Attempt a blocking wait. Blocking wait is described in implementation details below. * Flush all channels managed by the channel manager. * Channel manager queries. * Get the DPC manager handle. * Get pending work marker. * Query if channel manager has async memcpy channels. * Compute preemption control. * Set compute preemption mode. * Trigger preemption of the group of compute channels. * Remove active QMD list. * Wake up the DPC ISR routine.
Channel Pool
The GPU consumes work from a channel in the order in which the work is submitted. This can introduce unnecessary ordering between two chunks of work which otherwise do not depend on each other. In some cases this dependency can be removed by submitting chunks of work in two or more channels. Groups of channels thus used together are managed as part of a channel pool.
Some hardware state for GPU engine associated with a channel is maintained as part of the channel context in hardware. This state is switched when switching between the channels. Some GPUs support Time-Slice Groups of channels (TSG). Channels in a TSG share the hardware channel context and state. This allows faster scheduling switches between channels in a TSG as compared to switches between completely independent channels. When TSG support is available one channel pool is mapped to one TSG.
(The only function exported from channel pool seems to be channelPoolGetLruIdleOrLruChannel which is used in marker.c to select a preferably idle LRU channel. This seems to be an aberration. We should probably change this use the channel manager function to get such a channel.)
Channel Tracking Semaphore
Each channel has a tracking semaphore is a running atomic counter that is used to release (signal) and acquire (wait for) positions in a channel. Methods to release semaphore cause GPU to write the semaphore value to semaphore memory while methods to acquire semaphore cause the GPU to wait till the given semaphore value appears in semaphore memory.
Semaphores thus provide a way to synchronize execution between channels or to wait for work completion.
Tracking semaphores go through the following stages: * Semaphore value is submitted to channel but not submitted to GPU yet. * Semaphore value is submitted to GPU but not consumed by GPU yet. * Semaphore value was consumed and completed by the GPU.
Channel tracking semaphore interface implements the following functionality: * Initialize a tracking semaphore. * Atomically increment the semaphore value to get the next value to submit. * Query tracking semaphore values * Last submitted in channel. * Last submitted to GPU. * Last completed by GPU. * Query completion status for a semaphore.
Unit Internals
GPU work submission
On some operating systems the driver submits work to the GPU from user mode. On other operating systems the driver needs to call a kernel mode API (e.g. ioctl) to submit the pushbuffer.
Calling a kernel API for work submission has additional overhead compared to work submission directly from the user mode driver. It is done to adhere to underlying platform or OS requirements:
Some operating system like Android do more aggressive power management. Such systems might need to do power management operations at the time of submitting work to GPU engines. Calling a kernel mode API allows the kernel mode driver to do such operations.
On some operating systems (like QNX, HOS), the GPU driver inside the OS runs as a separate process and all communication to that driver happens via IPC. Due to the process protection CUDA driver cannot write to GPU registers from user mode for work submission. The CUDA driver therefore sends a signal to the OS driver process and the GPU driver process does the work submission on behalf of CUDA driver.
这里说的应该时这些,调用不同平台driver
Blocking Sync
In CUDA driver, the process of waiting for GPU work completion on CPU by means of an interrupt notification is known as blocking sync. When blocking sync is used the CPU thread waiting for GPU work completion does not poll and busy-wait, thus avoiding consumption of CPU cycles.
The GPU semaphore release (signal) can optionally generate an interrupt. The channel manager enables these interrupts when it needs to wait on CPU for pending GPU work.
Synchronization Primitives
Some engines on the Tegra SOC use syncpoints instead of tracking semaphores. A GPU can synchronize with such engines using syncpoints. [TODO: This probably belongs to the sync unit.]
Open Items
Engine context which is associated with the channel. The context has state associated with it, which might be altered based on commands sent to the engine.
We might need to talk about sub-contexts on Volta as well and how TSG, channels and sub-contexts are associated. Not sure if this affects the driver arch/design choices.
A channel pool has an associated GPU VA space. All channels in a TSG (== pool) share the same GPU VA space. Different channels (pools) can potentially use a different GPU VA space, but CUDA driver does not set it up in such a way since VA is associated with the CUDA context. This probably belongs to the overall arch or the memory manager unit arch where we would talk about how VA is managed.
GPU scheduling is not explained anywhere. CUDA programming model does not allow setting channel time-slice. It is up to the underlying platform to define this (i.e keep it fixed or configurable). GPU HW manages context switch transparent to SW. Not sure if this need to be described as a part of channel unit internals.

Code
CUchannelManager_st
/**
*
* \brief channel manager. manages all context-wide channel state (e.g., for
* synchronization and maintaining pushing state)
*/
struct CUchannelManager_st
{
//! Channel Manager's Context
CUctx *ctx;
//! A lock taken when examining or updating any channel's blocking-sync state
CUImutex blockingSyncMutex;
//! A lock taken when
//! - updating any channel's tracking semaphore value
//! - a QMD's channel launch index or presence in the active list
CUImutex markerMutex;
//! the channels owned by this manager
CUnvchannel *channels[CU_CHANNEL_COUNT_MAX];
//! Flag to check whether BlockingWait uses NvRmSync library.
NvBool usesNvRmSyncForBlockingWait;
//! Flag to check whether blocking sync uses syncpoint completion event.
NvBool blockingSyncUseSyncpointAwaken;
//! the cross-ctx semaphores owned by this manager
//! used to find the least-recently-used cross-ctx sema
NvU64 crossCtxSemaUseCounter;
//! array of cross context semaphores
CUcrossCtxSema *crossCtxSemas[CU_CROSS_CTX_SEMA_COUNT];
//! pools of channels for selection
//! compute channel pool
CUchannelPool *computePool;
//! array of pool of async copy engine
CUchannelPool *asyncCEPool[CU_MAX_COPY_ENGINES];
//! this will have a single compute channel which will be used for acquiring
//! semaphores of all other channels when doing blocking sync on mobile
CUchannelPool *hostOpsPool;
//! This pool will have a single CE channel for MPS client to
//! use. One primary use of this pool is for the client to
//! overwrite function memory to cleanly terminate the context when
//! an assertion is hit.
CUchannelPool *asyncMpsClientReserve;
//! async CE channel pools for each use.
//
// There can only be one CE per direction for device <-> host
// transfers, but there can be multiple CEs for P2P transfers
// (Pascal+) for, say, 3- or 4-way SLI with NVLINK.
//
//! Async CE pool for Device to host transfer
CUchannelPool *asyncDtoHPool;
//! Async CE pool for Host to Device transfer
CUchannelPool *asyncHtoDPool;
//! Channel Pool of Async CE Nvlink Pool
CUchannelPool *asyncCENVlinkPool[CU_MAX_COPY_ENGINES];
//! Async CE Nvlink pool size
NvU32 asyncCENVlinkPoolSize;
//! pools for devices with special needs for CE Copy
CUchannelPool *asyncCENonpipelinedBlocklinearPool;
CUchannelPool *asyncCEPipelinedBlocklinearPool;
//! the channel count.
NvU32 channelCount;
//! the queued dependencies graph
/**< see //sw/gpgpu/doc/wddm-queue/wddm-queue.pdf for details*/
CUchannelGraph *queuedDepGraph;
//! the manager for DPCs waiting on markers from this set of channels
CUIdpcManager *dpcManager;
//! isr event to push along DPCs
cuosEvent isrEvent;
//! service routine used to push along DPCs
CUintHandlerServiceRoutine *isr;
//! state of the current push that we are doing
struct
{
NvU32 active; //< set to 1 when we are between beginPush and endPush
CUnvchannel *channel; //< the channel that the current push is on
} currentPush;
//! the root of the list that holds Qmds which have not been acquired
CUqmd *activeQmdList;
//! the timeout for blocking sync (would be a constant, but we need to
//! tweak it for testing that interrupts don't vanish a la bug 783927)
NvU32 blockingSyncTimeoutMsec;
//! asyncError is read/written atomically with channelManagerGetAsyncError()
//! and channelManagerSetAsyncError().
CUresult asyncError;
//! a sticky error set whenever a void function fails, plus a scratch buffer
//! where methods that will never be executed may be written
CUnvCurrent asyncErrorBuffer[CU_PUSHBUF_MAX_PUSH_SIZE_DEFAULT/sizeof(CUnvCurrent)];
};
CUchannelPool_st
/**
* \brief Channel Pool Device Mode Abstraction Layer
*
* These are the different interfaces which will operate on the channel pool.
* These are function pointers which get assigned to dmal functions depending on the DMAL
* type.
*
*/
struct CUchannelPoolDMAL_st
{
//! Get FlushUnit COunt in the channel pool
NvU32 (*GetFlushUnitCount)(CUchannelPool *channelPool);
//! the channel schedule
CUresult (*ChannelSchedule)(CUchannelPool *channelPool);
//! enable channel in the channel pool
CUresult (*ChannelEnable)(CUchannelPool *channelPool, NvBool enable);
//! Set preemption mode to the channel pool
CUresult (*SetPreemptionMode)(CUchannelPool *channelPool, CUcomputePreemptionMode mode);
//! preempt the channel
CUresult (*Preempt)(CUchannelPool *channelPool, NvBool wait);
//! Set error noitifier to the channel pool
CUresult (*SetErrorNotifier)(CUchannelPool *channelPool);
//! Set TSG timeslice to the channel pool
CUresult (*SetTSGTimeslice)(CUchannelPool *channelPool, NvU32 timeslice);
// GPU events methods
//! gpu Event Create
CUresult (*GpuEventCreate)(CUgpuEvent *gpuEvent);
//! gpu Event Clear
CUresult (*GpuEventClear)(CUgpuEvent *gpuEvent);
//! gpu Event Destroy
CUresult (*GpuEventDestroy)(CUgpuEvent *gpuEvent);
//! change l2 Sector promotion policy for this channel pool
//! this is only for compute channel only and this is also a hint so the setting can be discarded
CUresult (*SetL2SectorPromotion)(CUchannelPool *channelPool, CUl2SectorPromotion l2SectorPromotion);
};
/**
* \brief Channel Pool
* \detailDescription channel Pool
*
*/
struct CUchannelPool_st
{
//! the manager owning this pool
CUchannelManager *channelManager;
//! the array of channels in the pool
CUnvchannel *channelArray;
//! Channel that is bound to the same PBDMA across all contexts for a device
CUnvchannel *singleIssueChannel;
//! the number of channels in the pool that have been initialized
NvU32 channelCount;
//! the size of channelArray & the maximum number of channels in the pool
NvU32 maxChannelCount;
//! the type of channels this channel pool contains
CUchannelType channelType;
//! the type of pushes this channel pool handles
CUchannelUse channelUse;
//! channel pool DMAL methods
CUchannelPoolDMAL dmal;
//! the array of channel flush units in the pool
CUchannelFlushUnit *channelFlushUnitArray;
//! the size of channelFlushUnitArray & the maximum number of channel flush units in the pool
NvU32 maxChannelFlushUnitCount;
//! the number of initialized channel flush units in the pool that are responsible for at least one channel
NvU32 channelFlushUnitCount;
//! The gpuEvent that shared across channels of the same engine object
CUgpuEvent *gpuEvent;
//! Counter for how many times the gpu event was signaled, used by unit-tests
NvU64 gpuEventSignalCount;
//! Interrupt handler service routine
CUintHandlerServiceRoutine *isr;
//! If blockingSync binding with this gpuEvent should be enabled
NvBool blockingSyncEnabled;
//! If blockingSync binding with this gpuEvent uses monitored fence
NvBool blockingSyncUseMonitoredFence;
//! If blockingSync binding with this gpuEvent uses interrupt
NvBool blockingSyncUseInterrupt;
//! If blockingSync binding with this gpuEvent uses non-stalling interrupt
NvBool blockingSyncUseNonStallingInterrupt;
//! If blockingSync binding with this gpuEvent uses syncpoint awaken
NvBool blockingSyncUseSyncpointAwaken;
// The last peer this channel pool targeted
// We assume a channel pool to be homogeneous and always target one CE
// Used to avoid pushing P2P copies targeting different peers to the same CE
//! The last passive device this channel pool targeted.
CUdev* lastPassiveDevice;
//! The last active device this channel pool targeted.
CUdev* lastActiveDevice;
};
CUnvchannel_st
/*
* \brief channel common structure
*
* \detailDescription This structure captures the state of the channel, type
* and engine to use
*/
struct CUnvchannel_st
{
//! the manager owning this channel
CUchannelManager *channelManager;
//! mutex taken whenever operating on this channel
CUImutex pushMutex;
//! the index of this channel in the channel manager array
NvU32 index;
//! the channel type (determines which engines are available)
CUchannelType type;
//! flush work down to the GPU automatically at the end of every call to streamEndPush?
/**< true by default, set to false by WDDM */
NvBool alwaysFlush;
//! if unset, locking this channel should take channelManager->channel[0]'s lock
// instead of its own.
NvBool usePerChannelLock;
//! This channel is forced to issue work via PBDMA0 on a dual PBDMA gpu.
NvU32 pbdmaIndex;
NvBool nextPushMustAcquireTrackingSem;
//! GPFIFO and pushbuffer structure
CUgpfifo *gpfifo;
//! handles to the engine objects on this channel. some some driver models (WDDM) don't
// use these handles, so they remain 0 on those DMs
NvU32 engines[CU_ENGINE_MAX];
//! These are the engine object IDs that will be used for SET_OBJECT pushbuffer methods
//! on DMs that use them. If the DM doesn't need to push that method, these will be 0.
NvU32 engineIDs[CU_ENGINE_MAX];
//! HW channel ID
NvU32 chID;
//! this is set on DMs which don't do any memory tracking. when set, we
//! skip the iteration through the memory tracking lists usually performed in streamEndPush
NvBool skipMemTracking;
//! the last tracking semaphore value that we are guaranteed to have previously
//! acquired on the other channels. we track this to avoid pushing additional
//! acquire/release methods
NvU64 trackSemValAcquiredOnOtherChannel[CU_CHANNEL_COUNT_MAX];
//! the last tracking semaphore value when we updated ECC
//! - only updated while holding CUctx::errorCheckMutex
NvU64 trackSemValLastEccCheck;
//! stream tracking data for this channel
CUchannelStreamData streamData;
//! data related to blocking-sync on this channel
CUchannelBlockingSyncData blockingSync;
//! data related to this channel's tracking semaphore
CUtrackingSemaData trackingSemaphoreData;
//! Semaphore used for channel cross-engine synchronization
CUsema *crossEngineSyncSemaphore;
//! For channel cross-engine synchronization, the last semaphore value for
//! which the CPU has issued by pushing to the compute engine
NvU64 crossEngineSyncLastValueIssued;
//! a list of tasks that need to have their submit time set when this channel is flushed
CUtask *queuedTaskList;
//! channel pool holding this channel
CUchannelPool *pool;
//! channel flush unit linked with this channel
CUchannelFlushUnit *channelFlushUnit;
//! next channel in the channel flush unit
CUnvchannel *flushUnitNext;
//! previous channel in the channel flush unit
CUnvchannel *flushUnitPrev;
// driver model specific structures and functions
struct
{
CUnvchannelAMOD *amod;
CUnvchannelRM *rm;
CUnvchannelMRM *mrm;
CUnvchannelWDDM *wddm;
CUnvchannelWDDMHWSCHED *wddm_hwsched;
CUnvchannelMPS *mps;
} dm;
CUchannelDMAL dmal;
CUmemTrackList currentMemlist;
struct
{
CUqmd *qmd;
} currentPush;
//! The maximum amount of pushbuffer queueing allowed on this channel
NvU32 maxQueueLength;
//! Set if channelEndPushInternal was most recently called
//! with "do not release tracking semaphore" specified
NvBool pendingReleaseTrackingSema : 1;
//! Channels have to be registered with UVM when UVM owns CUDA's GPU VA space.
//! When channels are allocated, RM allocates a set of buffers associated with
//! that channel's context that need to be mapped into CUDA's GPU VA space.
//! If RM owns the GPU VA space, the buffers get mapped at channel alloc time itself.
//! If UVM owns the GPU VA space we have to explicitly request UVM to map it.
NvBool registeredWithUvm;
//! Virtual address reserved for mapping channel's context buffers
NvU64 ctxBufVa;
//! Total size of channel's context buffers
NvU64 ctxBufSize;
//! If true, DMAL should ask for a return sync fence from RM while submitting the PB.
//! Currently used by MRM.
NvBool nextPushShouldAskForRMSyncFence;
//! Next push must specify this NvRmSync as the wait fence.
void *waitNvRmSync;
//! Syncpoint associated with the channel. Should use only if hasValidSyncpoint is true.
//! syncpointInfo packages the syncpoint ID and value in the lower and upper halves of the
// 64-bit integer respectively so that it can be read and written atomically.
NvU64 syncpointInfo;
NvU64 syncpointGpuVa;
NvBool hasValidSyncpoint;
// Set to true if we have allocated the sync point and managing the threshold ourselves.
NvBool hasUserManagedSyncpoint;
// Force each submission to imply work completion. I.e the tracking semaphore value associated
// with this submission can't be released until all work associated with this submission
// has been completed
NvBool forceWorkCompletion;
//! Set when a channel has not submitted any work since its creation
NvU32 inactive;
};
辅助结构体
CUcrossCtxSema_st
/**
* \brief Cross Context Semaphore
*
* \detailDescription This semaphore is used for synchronization across
* contexts.
*/
struct CUcrossCtxSema_st
{
//! the owning manager
CUchannelManager *channelManager;
//! the index of this in the channel manager array
NvU32 index;
//! last-use counter to create a LRU queue
NvU64 lastUseCounter;
//! data related to this tracking semaphore
CUtrackingSemaData trackingSemaphoreData;
};
CUchannelInitParams_st
/*
* \brief channel creation parameters
*
* This structure describes the different attributes of the channel and the channel
* internal structure is created based on these values.
*/
struct CUchannelInitParams_st
{
//! owning channel manager
CUchannelManager *channelManager;
//! owning channel pool
CUchannelPool *channelPool;
//! type of channel
CUchannelType channelType;
//! flush unit that this channel linked with
CUchannelFlushUnit *channelFlushUnit;
//! Force this channel to be bound to PBDMA0 on a dual PBDMA gpu, ignore otherwise
NvU32 pbdmaIndex;
};
CUchannelStreamData_st
/*
* channel stream data
* TODO: move this to be tracked with the stream manager
*/
struct CUchannelStreamData_st
{
//! The value of beginPushCount at the last time work was submitted on this channel
NvU64 beginPushCountOfLastWork;
//! The value of beginPushCount at the last time that the NULL
NvU64 beginPushCountOfLastNullStreamAcquire;
//! The value of beginPushCount at the last time that the barrier
// stream was acquired on this channel
NvU64 beginPushCountOfLastBarrierStreamAcquire;
//! the last stream to submit work on this channel
CUIstream *streamOfLastWork;
};
CUtrackingSemaData_st
/**
*
* \brief channel tracking semaphore structure
*
* \detailDescription
* - this semaphore tracks the consumption of methods by the front end.
* it is extended to a 64-bit value to simplify < and > comparisions.
* - this semaphore can be used to track the completion of launches by
* tracking the consumption of WaitForIdle method
* following Launch methods (in a CUctxMarker)
*/
struct CUtrackingSemaData_st
{
//! the channel manager this was created against
CUchannelManager *channelManager;
// the last semaphore value which the CPU has issued by pushing
//! Updated atomically under the push lock so that it can be read
//! atomically w/o holding the push lock.
//! This really should be accessed only under the push mutex, but it is
//! accessed speculatively all over the place, e.g. in
//! streamManagerSelectChannel().
NvU64 valueLastIssuedByCpu;
//! the last semaphore value which has been sent down to the GPU
//! if queuing is disabled, then this will always equal
//! valueLastIssuedByCpu. it is the responsibility of the channel
//! DM to update this value
//! Updated atomically under the push lock so that it can be read
//! atomically w/o holding the push lock when checking whether a particular
//! value has been flushed.
NvU64 valueLastIssuedToGpu;
//! the last semaphore value which the GPU has completed executing
//! - this is to be updated only through channelUpdateTrackingSemaphore
//! - the channel is idle only when valueLastIssuedByCpu is equal to
//! valueFinishedByGpu
//! Updated and read atomically (see channelUpdateTrackingSemaphore)
//!
//! The value tracked here is 64bit but hw semaphores are only 32bit.
//! The bottom 32bits of this value track the last value that was seen
//! completed by the hardware and channelUpdateTrackingSemaphore() detects
//! when the hw semaphore overflows. This relies on
//! channelUpdateTrackingSemaphore() being called more often than the
//! overflow of the hw semaphore happens.
NvU64 valueFinishedByGpu;
//! the semaphore which we push the semaphore release methods into. we
//! push the lower 32-bits of valueLastIssuedByCpu
CUsema* semaphore;
//! Pointer to the mutex protecting the semaphore, used to assert that the
//! right locks are held.
CUImutex *protectedByMutex;
//! Indicates if we need to issue a non-stalling interrupt after the release of this semaphore
NvBool mustInterrupt;
};
CUchannelBlockingSyncData_st
/*
* \brief channel blocking-sync structure
*
* This strucuture is accessed if the blocking sync is enabled on the channel.
*
* \additionalNotes
* Accessed from CU_SW_SYNC_UNIT(marker)
*
*
*/
struct CUchannelBlockingSyncData_st
{
//! Is blocking-sync enabled on this channel? If not, the rest
//! of the variables are never referenced.
NvBool enabled;
//! Does blocking-sync use interrupts (as compared with WDDM's KMD events)?
NvBool useInterrupt;
//! Does blocking-sync use non-stalling interrupts (implies useInterrupt)
NvBool useNonStallingInterrupt;
//! Does blocking-sync use monitored fence (the default for WDDM HW Scheduling)
NvBool useMonitoredFence;
//! Does blocking-sync use syncpoint awaken.
NvBool useSyncpointAwaken;
//! Should we push an awaken after the next tracking semaphore release?
NvBool unflushedAwaken;
//! A monotonically-increasing index of the awakens which we have done
NvU64 awakenLastIssuedByCpu;
//! The last of those awakens which the GPU has finished
NvU64 awakenLastFinishedByGpu;
//! Cached value of channelBlockingSyncAreAwakensRunningLow() that can be
//! read w/o holding the channelManager->blockingSyncMutex.
//! Should be always written with atomic ops under the blockingSyncMutex
//! lock.
NvU32 awakensRunningLow;
//! The channel tracking semaphore value released just before the last
//! CU_CHANNEL_AWAKENS_MAX awakens were pushed
//! - consider an awaken with index awakenIndex.
//! if it is the case that
//! awakenLastFinishedByGpu <= awakenIndex < awakenLastIssuedByCpu
//! then awakenTrackSemVal[awakenIndex % CU_CHANNEL_AWAKENS_MAX] gives
//! the tracking semaphore value immediately *before* the awaken was pushed
//! - to keep consistency, we allow at most
//! CU_CHANNEL_AWAKENS_MAX - CU_CHANNEL_AWAKENS_BUFFER awakens
//! to be in-flight at once. should we try to push too many,
//! we will block on the oldest
NvU64 awakenTrackSemVal[CU_CHANNEL_AWAKENS_MAX];
//! Condition variable notified when blocking sync is updated
cuosCV progressCV;
};
CUchannelDMAL_st
这是硬件操作
/*
* \brief channel driver model abstraction layer
*
* \detailDescription These are the interface of functions which should be defined in all the dmal layers.
*
* \additionalNotes
* Each of this function should be implemented in all the dmal layers.
*
*
*/
struct CUchannelDMAL_st
{
//! Initializes the Channel
CUresult (*Init)(CUnvchannel *channel);
//! Deinitialzes the Channel
void (*Deinit)(CUnvchannel *channel);
//! get the put pointer
void (*GetPutPointer)(CUnvchannel *channel, CUnvCurrent **pnvCurrent, NvU32 spaceRequested, FLAG_SET(CUIpushFlags) flags);
//! set the put pointer
void (*SetPutPointer)(CUnvchannel *channel, CUnvCurrent *nvCurrent, NvBool *needsFlush);
//! check for any errors on the channel
CUresult (*CheckErrors)(CUnvchannel *channel, NvBool peekOnly);
//! reset the channel
CUresult (*Reset)(CUnvchannel *channel);
//! call the DM-specific method to flush any pending GPFIFO entries to the GPU
CUresult (*GpfifoAdvanceGpuPut)(const CUgpfifoFlushItem *flushItem);
//! do a blocking wait using the last submitted syncpoint in the channel
CUresult (*BlockingWaitForSyncpointCompletion)(const CUnvchannel *channel);
//! Bind a notifier to a given channel
CUresult (*BindNotifier)(CUnvchannel *channel, CUnotifier *notifier);
//! Get RM handles for this client/channel
void (*GetRmHandles)(CUnvchannel *channel, NvU32 *rmClient, NvU32 *rmChannel);
//! Get RM handle for the context share (subcontext) used by this channel, or
/**< 0 if the channel does not use a context share. */
NvU32 (*GetRmContextShareHandle)(CUnvchannel *channel);
//! Get RM handle for the channel group this channel resides in
CUresult (*GetRmChannelGroupHandle)(CUnvchannel *channel, NvU32 *rmChannelGroup);
//! Get platform specific parameters that need to be passed to UVM when registering
//! this channel so that faults can be serviced on it.
NvBool (*GetUvmPlatformParams)(CUnvchannel *channel, UvmChannelPlatformParams *platformParams_out);
//! Get the descriptor necessary to allocate the debug object on this channel.
//! CUDA_SUCCESS will be returned on configurations where the debug object is not available.
//! It returns CUDA_ERROR_UNKNOWN if an unexpected error occurs.
CUresult (*GetDebugObjectDesc)(CUnvchannel *channel, CUdebugObjectDesc *desc);
//! Allocates the debug object.
//! CUDA_SUCCESS will be returned on configurations where the debug object is not available.
//! It returns CUDA_ERROR_OUT_OF_MEMORY if no memory was available for the allocation.
//! It returns CUDA_ERROR_UNKNOWN if any other unexpected error occurs.
CUresult (*AllocateDebugObject)(CUnvchannel *channel);
//! Destroys the debug object.
//! CUDA_SUCCESS will be returned on configurations where the debug object is not available.
//! It returns CUDA_ERROR_UNKNOWN if any unexpected error occurs.
//! Behavior is not defined if the debug object has not been allocated.
//! Therefore, callers must verify that the debug object exists.
CUresult (*DestroyDebugObject)(CUnvchannel *channel);
//! Sets the debug object exception mask.
//! CUDA_SUCCESS will be returned on configurations where the debug object is not available
//! or setting the debug object exception mask is not available.
//! It returns CUDA_ERROR_UNKNOWN if any unexpected error occurs.
//! Behavior is not defined if the debug object has not been allocated.
//! Therefore, callers must verify that the debug object exists.
CUresult (*SetDebugExceptionMask)(CUnvchannel *channel, CUdebugExceptionMask exceptionMask);
//! RM function call to retrieve ESR information to parse rich error information
CUresult (*GetKernelLaunchError)(CUnvchannel *channel, CUresult *richErrorCode);
//! RM function call to retrieve MMU fault information
CUresult (*GetKernelLaunchMmuFaultInfo)(CUnvchannel *channel, CUmmuFaultInfoSm *faultInfo, NvU32 *numSm);
//! RM function call to clear all ESRs for the channel
CUresult (*ClearAllSmErrors)(CUnvchannel *channel);
//! Get the last submitted sync to wait on
CUresult (*GetLastSubmittedSync)(const CUnvchannel *channel, void** lastSubmittedSync);
//! Encode a flush of the pending remote write in the specified channel
CUresult (*EncodeFlushRemoteWrites)(CUnvchannel *channel, CUnvCurrent **nvCurrent);
//! Wait until the progress fences of the specified channels reach the requested value
CUresult (*WaitForProgressFences)(NvU32 channelCount, CUnvchannel **channels, NvU64 *fencesValue);
//! Signal the external semaphore on the specified channel
CUresult (*SignalExternalSemaphore)(CUnvchannel *channel, const CUextSemaphore *extSem, NvU64 fenceValue);
//! Wait on the external semaphore on the specified channel
CUresult (*WaitExternalSemaphore)(CUnvchannel *channel, const CUextSemaphore *extSem, NvU64 fenceValue, NvU32 timeoutMs);
};
CUchannelFlushUnit_st
/**
* \brief Channel Flush Unit
*
*/
struct CUchannelFlushUnit_st
{
//! the head of channel list in this flush unit
CUnvchannel *channelHead;
//! channel pool holding this flush unit
CUchannelPool *channelPool;
//! HW TSG Group ID
NvU32 tsgGroupID;
//! The number of channels in this flush unit.
NvU32 channelCount;
//! node for this flush unit in the queued dependencies graph
CUchannelGraphNode *queuedDepNode;
//! type of the channels that this flush unit linked with
CUchannelType channelType;
//! Was this registered with the (WDDM) KMT Dag?
NvBool kmtDagRegistered;
//! The number of outstanding DAG nodes (of type SEMAPHORE) that are referencing channels from this flush unit
//! - we have to wait until this count is zero before we can free this flush unit
NvU32 kmtDagUseCount;
//! Queue of deferred Render/Signal/Wait commands
//! - items may only be enqueued while holding the channel manager's push lock and the global WDDM DAG lock
//! - items may only be dequeued while holding the global WDDM DAG lock
CUkmtDagNodeFlushUnitWDDM *kmtDagNodesHead;
CUkmtDagNodeFlushUnitWDDM *kmtDagNodesTail;
struct
{
CUchannelFlushUnitWDDM *wddm; //! wddm specific channel Flush Unit structure.
CUchannelFlushUnitWDDMHWSCHED *wddmhwsched;
CUchannelFlushUnitRM *rm;//! rm specific channelFlush Unit structure.
CUchannelFlushUnitMRM *mrm; //!mrm specific channnelFlush Unit structure.
CUchannelFlushUnitMPS *mps; //! mps specific channelFlush Unit Structure.
} dm;
CUchannelFlushUnitDMAL dmal; //! list of function pointers which are defined in each of the dmal layer.
};
/**
* \brief ChannelFlushUnit Device Mode Abstraction Layer
*
* These are the list of dmal functions which will work on the
* CUchannelFlushUnit.
*
*/
struct CUchannelFlushUnitDMAL_st
{
//! channel Flush Unit Init function
CUresult (*Init)(CUchannelFlushUnit *channelFlushUnit);
//! channel Flush Unit Deinit function
void (*Deinit)(CUchannelFlushUnit *channelFlushUnit);
//! Track memory (it will be "paged in" for the next flush)
CUresult (*TrackMemObj)(CUchannelFlushUnit *channelFlushUnit, CUmemobj *memobj, NvBool isWriteable);
//! Track memory required that is required for all flushes
CUresult (*TrackMemInternal)(CUchannelFlushUnit *channelFlushUnit);
//! Reset memory tracking
void (*ResetMemoryTracking)(CUchannelFlushUnit *channelFlushUnit);
//! Enforce that flushUnitToWait will not be flushed to hardware until
//! it is guaranteed that channelToSignal's trackSemVal will arrive
CUresult (*WaitForTrackSemValSubmit)(CUchannelFlushUnit *fuToWait, CUnvchannel *chToSignal, NvU64 trackSemVal);
//! Enforce that waiter will not be flushed to hardware until the
//! specified value can be acquired from waitee.
void (*WaitForCuSemaValAcquire)(CUchannelFlushUnit *waiter, volatile NvU32 *payload, NvU32 acquiredValue);
//! TSG master channel needs to wait for slave channels in WDDM
void (*MasterWaitForSlave)(CUchannelFlushUnit *channelFlushUnit);
//! Return RM handle for TSG
CUresult (*ChannelGroupRMHandle)(CUchannelFlushUnit *channelFlushUnit, NvU32 *handle);
//! Duplicate TSG handle (primarily for MPS-Volta client contexts to duplicate server TSG handle)
CUresult (*ChannelGroupDuplicateRMHandle)(CUchannelFlushUnit *channelFlushUnit, NvU32 targetClient, NvU32 targetParent, NvU32 targetObject);
//! check for any errors on the channelFlushUnit
CUresult (*CheckErrors)(CUchannelFlushUnit *channelFlushUnit, NvBool peekOnly);
//! return the sub-context id for this flush unit.
NvU32 (*GetContextShareId)(CUchannelFlushUnit *channelFlushUnit);
};
Channel Func
创建销毁
/**
*
* \brief Create Channels in the channel manager
*
* \detailDescription This function creates the channel pools of
* type CU_CHANNEL_TYPE_COMPUTE, CU_CHANNEL_TYPE_ASYNC_MEMCPY,
* CU_CHANNEL_TYPE_HOST_OPS_POOL in the channel manager. The function expects that
* the channelManager is valid.
*
* \param[in] channelManager - the channel manager
*
* \additionalNotes
* Set the pool size to 1, if the context is sub context and mps server.
* \additionalNotes
* The function creates a hostOpsPool with 1 channel on mobile platforms.
* \additionalNotes
* The pool size is 1 for async channels for WDDM platfrosm.
* \additionalNotes
* There should be no async engine available or if any async engine is available
* then asyncHtoDPool and asyncDtoHPool should be filled.
*
* \implDetails The function sets the poolSize to hal specific pool size.
* The function then creates the compute channel pool. It then creates a copy
* pool for each of the async engines. The function creates separate channel pool
* for block linear copy if the device needs it.
*
* \endfn
*/
// manages the pools of channels in the channel manager
CUresult channelManagerCreateChannels(CUchannelManager *channelManager);
/**
* \brief Destroys the channels in the channel manager
*
* \detailDescription The function destroys all the different channel pools
* in the channel manager. The function expects that the channelManager is valid.
*
* \param[in] channelManager - the channel manager
*
* \additionalNotes
* The function sets all the pools (computePool, hostOpsPool, asyncCEPool[],asyncMpsClientReserve to NULL.
*
* \implDetails The function channelManagerDestroyChannelPool destroys all the channel
* pools which are not null in the channel manager.
*
* \endfn
*/
void channelManagerDestroyChannels(CUchannelManager *channelManager);
/**
* \brief Wake Up the ChannelManager Isr (after DPC creation in case it is ready)
*
* \detailDescription The function wakes up or signals the channelManger's isr event.
*
* \param channelManager - the channel manager
*
* \additionalNotes
* If the os is Darwin, then clear the event since too many calls to event signal can block.
*
* \implDetails The function calls the function cuosEventSignal on the channelManager's event
* channelManager->isrEvent.
* \endfn
*/
/*
* channel manager interface
*/
/**
* \brief channelManagerCreate function creates a channel manager for the context.
*
* \detailDescription This function allocates memory for a new channelManager and
* intitializes it. It also initializes other mutex i.e pushMutex, blockingSyncMutex and
* markerMutex. The function expects that the channelManager is valid.
*
* \param[out] channelManager - the channel manager created.
* \param[in] ctx - the channel manager's context
*
* \return CUresult - The result of the function
*
* \retval
* CUDA_SUCCESS
* CUDA_ERROR_OUT_OF_MEMORY
* CUDA_ERROR_OPERATING_SYSTEM
*
* \additionalNotes
* The function returns CUDA_ERROR_OPERATING_SYSTEM if the event channelManager->isrEvent create fails.
*
* \implDetails This functions first allocates the channelManager and initializes it. It then
* creates the channel graph, dpcManager and crross context semaphore. It then registers the
* handler channelManager->ctx->intHandler with the event channelManager->isrEvent. It then
* does a synchronize on the channel manager's marker on wait spin.
*
* \endfn
*
*/
CUDA_TEST_EXPORT CUresult channelManagerCreate(CUchannelManager **channelManager, CUctx *ctx);
/**
*
* \brief Destroy the channel Manager.
*
* \detailDescription This function deinitializes all the internal structures of the
* channel manager and deallocates it. It then frees the memory of the channel manager.
* The function expects that the channelManager is valid.
*
* \param[in] channelManager - channel Manager to be destroyed.
*
* \return void
*
* \additionalNotes
* The event (isrEvent) of the channel manager should not fail.
*
* \implDetails channelManagerDestroy function deinitializes all the mutex in the
* channelManger structure and frees the memory.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelManagerDestroy(CUchannelManager *channelManager);
功能
/**
*
* \brief Sets the preemption mode for the compute channels.
*
* \detailDescription The function sets the preemption mode for the compute channels. The
* function expects that the channelManager is valid.
*
* \param[in] channelManager - channel Manager
* \param[in] mode - the compute preemption mode to set.
*
* \return CUresult - The result of the function.
*
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_UNKNOWN,
*
*
* \additionalNotes
* The function does not do anything for pre pascal chips.
*
* \implDetails - The function gets the compute pool from the channel manager. It then checks
* whether the device is pascal + or not. It then calls the dmal function SetPreemptinMode on
* the compute pool and the mode.
*
* \endfn
*/
CUresult channelManagerSetComputePreemptionMode(CUchannelManager *channelManager, CUcomputePreemptionMode mode);
/**
*
* \brief This function preempts the Compute channel group.
*
* \detailDescription -The function preempts the compute channel group and does the tsg
* preemption. The function expects that the channelManager is valid.
*
* \param[in] channelManager - channel manager
*
* \return CUresult - THe result of the function.
*
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_UNKNOWN
*
* \additionalNotes
* This function only does preemption for Kepler+ chips.
* \additionalNotes
* This function does tsg preemption only for compute pool
*
* \implDetails The function first checks whether the device is kepler plus. It then calls
* the dmal function for preempt.
*
*/
CUresult channelManagerPreemptComputeChannelGroup(CUchannelManager *channelManager);
第二部分:
**
*
* \brief Flushes all the work on the channels to the gpu.
*
* \detailDescription The function flushes all the channels in the channel manager. The
* function expects that the channelManager is valid.
*
* \param channelManager - The channel Manager on which to operate.
*
* \return CUresult - The result of the function.
*
* \retval
* CUDA_SUCCESS
*
* \additionalNotes
* The function only returns CUDA_SUCCESS.
*
* \implDetails The function flushes all the channels in the channel manager.
*
* \endfn
*/
CUresult channelManagerFlushAllChannels(CUchannelManager *channelManager);
/**
*
* \brief Get Pending Marker for the last work(streamed or non-stream) pushed.
*
* \detailDescription This function gets the last work recorded in the channelManager
* and returns a pointer to the marker tracking it. The function expects that
* channelManager and marker and valid.
*
* \param marker - marker corresponding to the last work pushed.
* \param channelManager - the channel manager
*
* \return void
*
* \additionalNotes
* marker->channelManager and channelManager should be same.
* \additionalNotes
* ctxMarkerAppendEntry() should always succeed.
* \additionalNotes
* Thread Safe, This function takes the pushMutex and markerMutex lock.
* \additionalNotes
* WAR bug 1835200, If the barrierStream is active, The function should not need to add the
* individual semaphore into the marker and instead just acquire the barrier stream marker
* directly.
*
* \implDetails
* channelManagerGetPendingWorkMarker first locks the pushMutex and markerMutex lock. It
* then loops through all the channels in the channel Manager, for each of the channel if
* there is any unfinished work, it creates a marker entry for the channel and appends
* it to the markerLastWork. It then loops through all the cross
* context semaphores and appends all the unfinished crossCtxSema into the
* markerLastWork. It then appends all pending QMDs into the marker. It then unlocks the
* markerMutex and pushMutex.If this context has a barrierStream, the function adds the
* barrierStreamMarker to the markerLastWork.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelManagerGetPendingWorkMarker(CUctxMarker *marker, CUchannelManager *channelManager);
/**
* \brief Check the error notifier on each of the channels in the manager.
*
* \detailDescription channelManagerCheckErrors function checks for any errors in all the
* channels in the channel pool. If any channel has encountered an error, this function
* sets the error value in the context and returns it. The function expects that
* the channelManager is valid.
*
* \param[in] channelManager - the channel manager.
* \param[in] peekOnly - It only check whether any of the channel has errors.
*
* \return CUresult - the result of the function.
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_UNKNOWN,(On NvRmGpu call failures)
* CUDA_ERROR_LAUNCH_TIMEOUT, (If the error notifier value fields is not 0, then this is set)
* CUDA_ERROR_LAUNCH_FAILED(If the notifier value fields are not 0, then it sets this error)
*
* \additionalNotes
* On mrm device, all channels share same error notifier so the errors are checked only on the
* channelFlushUnitArray[0]'s channel in the computePool.
* \additionalNotes
* For mrm device, the function expects that the channel->computePool->channelFlushUnitArray is
* valid.
*
* \detailDescription The function first checks for any async error in the channel manager. It
* then checks for errors in the different pools i.e computePool. hostOpsPool, asyncHtoDPool,
* asyncDtoHPool.
*
*/
CUDA_TEST_EXPORT CUresult channelManagerCheckErrors(CUchannelManager *channelManager, NvBool peekOnly);
/**
* \brief Get the channel Manager's DPC manager
*
* \detailDescription The function returns the channel Manager's dpc manager. It expects
* that the channelManager is valid.
*
* \param channelManager - the channel Manager
*
* \return CUIdpcManager - the channel manager's dpc manager.
*
* \implDetails The function returns channelManager->dpcManager.
*
* \endfn
*/
CUIdpcManager *channelManagerGetDpcManager(CUchannelManager *channelManager);
/**
*
* \brief Returns true if the channel Manager has async memcpy channels
*
* \detailDescription The function returns true if the channelManager owns asyc memcpy
* channel pools(both asyncHtoDPool and asyncDtoHPool). The function expects that the
* channelManager is valid.
*
* \param channelManager - the channelManager
*
* \return NvBool - Whether the channel manager has async memcpy channels.
*
* \retval false, true
*
* \implDetails The function checks whether the channelManager->asyncHtDPool is not NULL
* and it has a channel head. It also checks whether
* the channelManager->asyncDtoHPool is not NULL and it has a channel head.
*
*/
NvBool channelManagerHasAsyncChannels(CUchannelManager *channelManager);
/**
*
* \brief Returns true if the channel Manager has async memcpy channels and if these channels
* are using WDDM Packet Scheduling
*
* \detailDescription The function returns true if the channelManager owns asyc memcpy
* channel pools(both asyncHtoDPool and asyncDtoHPool) and if one of these pool is using WDDM
* Packet Scheduling (Packet scheduling is the old scheduling model before HwScheduling).
* The function expects that the channelManager is valid.
*
* \param channelManager - the channelManager
*
* \return NvBool - Whether the channel manager has async memcpy channels and at least one of them
* is using WDDM Packet Scheduling
*
* \retval false, true
*
* \implDetails The function checks first if one of the pool is not NULL then perform a dmal
* check. This function is only looking at HtoD and DtoH since Async CE created
for P2P or P2P related WARs needs to always be handled as a special case.
*
*/
NvBool channelManagerHasWddmPacketSchedulingAsyncChannels(CUchannelManager *channelManager);
/**
* \brief Returns the LRU channel from the channelManager
*
* \detailDescription The function returns a channel depending on the channelUse parameter
* and the useLru flag. If the userLru flag is set it returns the LRU channel. The function
* expects that channelManager to be valid.
*
* \param channelManager - the channel manager
* \param channelUse - the channel usage
* \param useLru - Whether the lru channel should be picked from the channel pool.
*
* \additionalNotes
* The function returns NULL, if the channel pool is not present in the channel manager.
*
*
* \implDetails The functions sets the poolCount to 1. It gets the channel pool depending
* on the channelUse parameter. It the \p useLru is set to true, it calls the
* channelPoolArrayGetLruIdleOrLruChannel to get a channel from the channel pool. Otherwise
* it returns the poolArray[0]->channelArray[0].
*
* \endfn
*/
CUDA_TEST_EXPORT CUnvchannel *channelManagerGetChannelWithFlags(CUchannelManager *channelManager, CUchannelUse channelUse, NvBool useLru);
/**
* \brief Returns the channel from the channel manager
*
* \detailDescription - The function returns the channel in the channel Manager. The
* function expects that the channelManager and the channelUse are valid.
*
* \param channelManager - the channel manager
* \param channelUse - the channel usage
*
* \return
* \implDetails - The function calls the channelManagerGetChannelWithFlags with useLru
* parameter as NV_FALSE.
*
* \endfn
*/
CUDA_TEST_EXPORT CUnvchannel *channelManagerGetChannel(CUchannelManager *channelManager, CUchannelUse channelUse);
/**
* \brief Get the selected Channel for a push
*
* \detailDescription The function selects the Most recently used channel in the stream.
* If the channel is used by other stream, then proceed to full channel selection. A
* pool is selected depending on the channelUse. If there is only one channel or it
* is the null stream or the barrier stream, pick the channel head. It is possible
* that the pool picked does not match the channel use, in those cases reuse the
* same channel to avoid channel switch on the * engine. The last option is to pick
* up the least recently used channel in the pool. The function expects that the
* channelManager and desc to be valid.
*
* \param[in] channelManager The channel manager
* \param[in] desc StreamPush descriptor
* \param[in] stream the stream on which to push work.
*
* \return CUnvchannel The channel on which the work shall be pushed.
*
* \additionalNotes
* The function expects that the pool to be valid.
* \additionalNotes
* If the pool contains a single channel or if the stream is null stream or barrier stream,
* then select the first channel in the pool (pool->channelArray[0]).
*
* \endfn
*/
CUDA_TEST_EXPORT CUnvchannel *channelManagerSelectChannelForPushWithDesc(CUchannelManager *channelManager, CUchannelPushDesc *desc, CUIstream *stream);
/**
* \brief Get the selected Channel for a push, depending on the Channel Use
*
* \detailDescription The function selects the channel from the stream and channel manager
* depending on the channelUse.
*
* \param[in] channelManager The channel manager
* \param[in] channelUse The work for which the channel shall be used.
* \param[in] stream the stream on which to push work.
*
* \return CUnvchannel The channel on which the work shall be pushed.
*
* \additionalNotes
* refer channelManagerSelectChannelWithDesc
* \endfn
*/
CUnvchannel *channelManagerSelectChannelForPush(CUchannelManager *channelManager, CUchannelUse channelUse, CUIstream *stream);
/**
* \brief Set the Blocking Sync timeout.
* \private
*/
void channelManagerSetBlockingSyncTimeout(CUchannelManager *channelManager, NvU32 timeoutMsec);
第三部分:
/**
* \brief Blocks flushing to device until a semaphore value can be acquired
*
* \detailDescription The function blocks until the semaphore acquire is done till the
* value.
*
* \param[in] channel - the channel.
* \param[in] payload - host address of semaphore.
* \param[in] value - the value to be acquired.
*
* \additionalNotes
* \noteReentrant{channelManager->pushMutex, channel}
*
* \implDetails The function takes the pushMutexLock and then calls the function
* channelBlockSubmitUntilSemaAcquire_UnderLock
* \endfn
*/
void channelBlockSubmitUntilSemaAcquire(CUnvchannel *channel, volatile NvU32 *payload, NvU32 value);
/**
* \brief Blocks flushing to device until a semaphore value can be acquired.
*
* \detailDescription The function blocks until the semaphore acquire is done till the value.
*
* \param[in] channel - the channel
* \param[in] payload - host address of semaphore.
* \param[in] value - the value to be acquired.
*
* \additionalNotes
* The function asserts if the mutex is not held by the same thread
*
* \implDetails The function asserts if the pushMutexLock is not held by the same thread. It
* then calls the dmal function WaitForCuSemaValAcquire.
*
* \endfn
*/
void channelBlockSubmitUntilSemaAcquire_UnderChannelLock(CUnvchannel *channel, volatile NvU32 *payload, NvU32 value);
/**
* \brief Flushes all pending pushes to the device
*
* \detailDescription The function flushes all the pending pushes to the device. The function
* also clears the dependency that any other channels had on this channel. This function
* calls the channelFlush_UnderLock.
*
* \param[in] channel - the channel to be flushed
*
* \return
*
* \implDetails The function takes the channelManager->pushMutex lock and calls the
* channelFlush_UnderLock function.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelFlush(CUnvchannel *channel);
/**
* \brief Tracks the QMD on the channel
* \detailDescription The function tracks the qmd as part of the current push in the
* channel manager.
*
* \param[in] channel - the channel on which to operate
* \param[in] qmd - the QMD to be tracked.
*
* \additionalNotes
* The function expects that the channelManager->currentPush.active is valid.
* \additionalNotes
* The function expects that channelManager->currentPush.channel == channel
* \additionalNotes
* The funciton expects that channelManager->currentPush.qmd is empty.
*
* \implDetails The function sets the qmd to channelManager->currentPush.qmd and calls the
* function channelManagerAddActiveQmdList.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelTrackQMD(CUnvchannel *channel, CUqmd *qmd);
/**
* \brief Tracks the task on the channel
*
* \detailDescription The function tracks the task by adding the task to the channel's queued
* task list. The function expects that the channel is valid.
*
* \param[in] channel - the channel on which to operate
* \param[in] task - the task which should be tracked.
*
* \additionalNotes
* The function returns early if the task is NULL.
* \additionalNotes
* The function expects that the task->channelQueuedTaskListNext is NULL.
*
* \implDetails The function first calls the cuiTaskRetains which retains the task. It then
* adds task in the channel list.
*
* \endfn
*/
void channelTrackTask(CUnvchannel *channel, CUtask *task);
/**
* \brief Tracks the mem on the channel
*
* \detailDescription This function tracks the memList on the channel.
*
* \param[in] channel - the channel on which to track
* \param[in] memList - the memList to track
*
* \additionalNotes
* The function expects that the channelManager->currentPush.active is valid.
* \additionalNotes
* The function expects that the channelManager->currentPush.channel == channel.
*
* \implDetails The function sets the channelManager->ctx->streamManager->currentPush.memList
* to memList.
*
* \endfn
*/
void channelTrackMem(CUnvchannel *channel, CUmemTrackList memList);
/**
* \brief the channel tracks the memobj
*
* \detailDescription The function tracks the memobj on the channel.
*
* \param channel
* \param memobj - the memobj to be tracked
* \param access - read-only or read-write
*
* \additionalNotes
* The function expects that the channelManager->currentPush.active is valid.
* \additionalNotes
* The function expects that the channelManager->pushMutex lock to be held.
*
* \implDetails The function calls the dmal function channelFlushUnit->dmal.TrackMemobj on
* the channel->channelFlushUnit. If the trackMemobj dmal function
* fails, the function sets the status of the dmal function to status using
* channelManagerSetAsyncError.
*
* \endfn
*/
void channelTrackMemobj(CUnvchannel *channelManager, CUmemobj *memobj, CUItrackedMemobjAccess access);
/**
* \brief Push the methods to do an awaken (interrupt) on a channel
*
* \detailDescription It pushes a blocking sync awaken into the channel. It returns
* early if the blockingSync is not enabled.
*
* \param[in] channel - the channel on which to operate
* \param[in] nvCurrent - the current pointer in the push buffer.
*
* \additionalNotes
* Return early, if the channel->blockingSync is not enabled.
*
* \implDetails If the device supports syncpoint completion event, the function just
* sets the unflushedAwaken flag to true and returns. If the device doesn't support
* syncpoint completion event, the function pushes pushes a noop to make sure that we
* do flush the channel at the end of this push. It then locks the blockingSyncMutex,
* updates the awakenTrackSemVal with trackinGSemaGetNextReleaseValue. It increments
* the awakenLastIssuedByCpu by 1 and sets the unflushedAwaken to true. It then updates
* the awakenRunningLow with the channelBlockingSyncAreAwakensRunnningLow(channel).
*
* \endfn
*/
CUDA_TEST_EXPORT void channelPushBlockingSyncAwaken(CUnvchannel *channel, CUnvCurrent **nvCurrent);
第四部分:
/**
*
* \brief The function pushes an acquire of the marker onto the channel
*
* \detailDescription The items in the marker are converted into semaphore acquires and included in the push
*
* \param[in] channel - the channel where the push is happening
* \param[inout] pnvCurrent - the position in the push
* \param[in] markerToAcquire - the marker to acquire
* \param[in] flags - push flags
*
* \additionalNotes
* The function takes channel to avoid pushing channel tracking semaphores for its own channel
*
* \additionalNotes
* The funciton assumes that the marker has already been simplified
*
* \implDetails This function allows launches to be pipelined with const bank programming
* by allowing a stream's marker to be acquired in the middle of a push
* \endfn
*/
void channelPushAcquireMarker(CUnvchannel *channel, CUnvCurrent **pnvCurrent, CUctxMarker *markerToAcquire, NvU32 flags);
/**
* \brief The function returns whether this push can be continued with another default-sized push.
*
* \detailDescription Based on how much has already been written in the current push, this function checks to make sure
* that there is enough space in the pushbuffer for a further default-sized push. The push is not ended, and no additional GPFIFO
* entries is used.
*
* \param[in] channel - the channel of the current push
* \param[in] nvCurrent - the position of the current push
*
* \return NvBool - true if another default-sized push can fit, false otherwise
*
* \additionalNotes
* This function returns false if wrap-around is required to satisify the pushbuffer space requirement.
*/
NvBool channelCanContinue_UnderLock(CUnvchannel* channel, CUnvCurrent *nvCurrent);
/**
* \brief The function synchronizes the compute engine with the gpu host unit
*
* \detailDescription The function pushes a compute release of the next valid sempahore value and an acquire
* of that same value onto the gpu host unit. No attempt is made to track whether the synchronization is really necessary
* The push is not ended, and no additional GPFIFO entries are used.
*
* \param[in] channel - the channel of the current push
* \param[in] nvCurrent - the position of the current push
* \param[in] omitLeadingComputeMembar - if true, omits the membar issed by compute before the semaphore release
*
* \endfn
*/
void channelSynchronizeComputeWithGpuHost(CUnvchannel *channel, CUnvCurrent **pnvCurrent, NvBool omitLeadingComputeMembar);
/**
* \brief Pushes an atomic "wait-and-set" HOST operation on the given channel.
*
* \detailDescription Pushes a set of HOST methods on the channel that forces
* the channel to acquire on the compare value in the given address and
* atomically update the address with the new value. The gaurantee given is
* that no other channel with the same TSG is can simultaneously acquire the
* same value and race on setting it. The operation is as the following:
* while(atomicCAS(address, compare, value) != compare) channel_yield();
* This is useful for implementing HOST managed locks.
*
* \param[in] channel Channel we push this operation onto
* \param[inout] pnvCurrent current Push buffer pointer to push work in
* \param[in] address Address to acquire compare value on and write value to
* \param[in] compare Value to compare against
* \param[in] value Value to write
*/
void channelAtomicWaitAndSet32(CUnvchannel *channel, CUnvCurrent **pnvCurrent, NvU64 address, NvU32 compare, NvU32 value);
/**
*
* \brief The function returns true if the channels manages it's Syncpoint
* in userspace.
*
* \detailDescription The function returns true if the channel is capable
* of managing the syncpoint associated with it. Managing a syncpoint
* involves a channel being able to directly issue Acquire and release methods
* over the syncpoint without having to rely on nvgpu.
*
* \param[in] channel - The channel we want to check for support
*
* \return NvBool - the channel supports user space managed syncpoint.
*
* \retval true, false
*
* \implDetails The function returns true if channelManager->hasUserManagedSyncpoint
* is set to true.
*
* \endfn
*/
NvBool channelHasUserManagedSyncpoint(CUnvchannel *channel);
第五部分:
/**
* \brief Wake Up the ChannelManager Isr (after DPC creation in case it is ready)
*
* \detailDescription The function wakes up or signals the channelManger's isr event.
*
* \param channelManager - the channel manager
*
* \additionalNotes
* If the os is Darwin, then clear the event since too many calls to event signal can block.
*
* \implDetails The function calls the function cuosEventSignal on the channelManager's event
* channelManager->isrEvent.
* \endfn
*/
CUDA_TEST_EXPORT void channelManagerWakeUpIsr(CUchannelManager *channelManager);
/**
* \brief To do a blocking wait on the marker
*
* \detailDescription This function does a blocking wait on the marker which is passed.
*
* \param marker - the marker on which to operate.
*
* \return CUresult - the result of the function.
*
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_NOT_READY
*
* \additionalNotes
* If NvRmSync is not used it is required either that marker is completed or DPC will
* be woken after completion of non-tracking semaphore entries.
* \additionalNotes
* returns CUDA_SUCCESS if marker has complete and ECC check is done as well.
* \additionalNotes
* returns CUDA_ERROR_NOT_READY if the marker is not yet completed and a busy-wait is required
* \endfn
*/
CUDA_TEST_EXPORT CUresult channelManagerAttemptBlockingWait(CUctxMarker *marker);
/**
* \brief Push a syncpoint completion event for a marker.
*
* \param marker - the marker on which to operate.
*
* \return CUresult - the result of the function.
*
* \additionalNotes
* Returns CUDA_SUCCESS and skips pushing the event if the marker is complete and ECC
* check passed.
*
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_NOT_SUPPORTED
*
* \implDetails The function calls channelAcquireMarker to acquire all the marker
* entries and push a syncpoint release. It then triggers syncpoint completion
* event with the device dmal function deviceTriggerSyncpointCompletionEventOnFd.
*
* \endfn
*/
CUresult channelManagerPushSyncpointCompletionEvent(CUctxMarker *marker);
/*
* channel pool private methods
*/
/**
* \private
* \brief [Internal]create pool of Channels
*
* \detailDescription - The function creates a channelPool in the channel manager. The type
* of the channel pool depends on the channelType. The function expects that the
* channelManager and the channePool to be valid.
*
* \param[in] channelManager - the channel pool's channel manager
* \param[out] channelPool - the channel pool created
* \param[in] size - the number of channels in the channel pool.
* \param[in] channelType - the channel type
*
* \return CUresult - the result of the function.
*
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_OUT_OF_MEMORY
*
* \additionalNotes
* The function allocates memory for the pool.
* \additionalNotes
* Calls channelPoolDestroy on allocation failure.
*
* \implDetails The function allocates memory for the pool. It sets the default values to
* the channel pool structure with size, channelManager and channelType. It initializes the
* channel pool with dmal function. It also sets the maxChannelFlushUnitCount.
*
* \endfn
*/
CUresult channelPoolCreate(
CUchannelManager *channelManager,
CUchannelPool **channelPool,
NvU32 poolSize,
CUchannelType channelType);
/**
* \private
* \brief [Internal] creates the channels in the channel pool
*
* \detailDescription This functions creates the channels with the default
* parameters and channelType and associates the channels with the channelFlushUnit.
* It also registers all the channels to uvm and schedules the channel in the channel
* pool. The function expects that the channelPool is valid.
*
* \param[in] channelPool - channel Pool
* \param[in] channelType - the type of channel
*
* \additionalNotes
* This function expects that the channelPool->channelFlushUnitArray is valid.
* \additionalNotes
* If the channelInit function fails and the channel type is async and device type is wddm,
* then it does not destroy the channels.
* \additionalNotes
* Initialize the channel except if the device is MPS.
*
* \implDetails The function initializes the default values for channelParams, channelType,
* channelPool and the channelManager. It loops through the channelCount, created and initializes
* all the channels and writes them into the channelPool->channelArray. It then calls the channelSchedule
* on the channelPool and creates the gpuEvent channelPoolCreateGpuEvent.
*
* \endfn
*/
CUresult channelPoolCreateChannels(CUchannelPool *channelPool, CUchannelType channelType);
/**
* \private
* \brief [Internal] Destroy the channel Pool
*
* \detailDescription - This function destroys the channelPool. It expects that the
* channelPool is valid.
* \param[in] channelPool - channel Pool
*
* \implDetails - The function memsets the channlPool to 0 and then frees the memory of
* the channelPool.
*
* \endfn
*/
void channelPoolDestroy(CUchannelPool *channelPool);
/**
* \private
* \brief[Internal] Destroy the channels in the channel Pool
*
* \detailDescription This function deregisters all the channels from uvm
* and destroys all the channels. The function expects that the channelPool is valid.
*
* \param[in] channelPool - the channel pool on which to operate.
*
* \additionalNotes
* The channel should be removed from the linked list.
*
* \implDetails The function loops through all the channels in the pool deregisters each of
* the channels from the UVM, removes each of the channel from the channelPool and
* destroys each of the channel.
*
* \endfn
*/
void channelPoolDestroyChannels(CUchannelPool *channelPool);
/**
* \private
* \brief Create a Gpu Event for the channel pool.
*
* \detailDescription This function creates a gpu event for the channelPool. It also
* registers the channelPool's interrupt service routine.
*
* \param[in] channelPool - the channel pool
*
* \return CUresult - the result of the function.
*
* \retval
* CUDA_SUCCESS,
* CUDA_ERROR_OUT_OF_MEMORY,(gpuEventCreate)
* CUDA_ERROR_UNKNOWN (gpuEventCreate)
*
* \additionalNotes
* This function creates the gpuEvent only if the DMAL supports it.
* \additionalNotes
* If any of the internal function fails, it calls the channelPoolDestroyGpuEvent to destroy
* the gpuEvent.
*
* \implDetails The function loops through all the channels in the channel pool and sets
* the channel->blockinSync.enabled, channel->blockingSync.useInterrupt, channel->blockingSync.useNonStallingInterrupt.
* \endfn
*
*/
CUresult channelPoolCreateGpuEvent(CUchannelPool *channelPool);
/**
* \private
* \brief Destroys the Gpu Event for the channel Pool.
*
* \detailDescription The function deregisters the interrupt handler routine and
* destroys the gpu event.
*
* \param[in] channelPool - the channel pool
*
* \additionalNotes
* The function sets the channelPool->isr and the channelPool->gpuEvent to NULL.
*
* \implDetails - The function checks whether the channelPool->isr is not NULL. If it
* is not NULL, it then deregisters the inthandler channelPool->isr. It then checks if the
* channelPool->gpuEvent is not NULL, then it destroys the channelPool by calling the
* function gpuEventDestroy.
*
* \endfn
*/
void channelPoolDestroyGpuEvent(CUchannelPool *channelPool);
/**
* \private
* \brief Create Channel flush units
*
* \detailDescription This function creates channel Flush Units of maxChannelFlushUnitCount
* in the channel pool. The function allocates memory for each of the channel flush unit and
* initializes it. The function expects that the channelPool is valid.
*
* \param[in] channelPool - the channel pool.
* \param[in] channelType - the channel type of each of the channel Flush Unit.
*
* \additionalNotes
* The function expects that the channelManager and ctx are valid.
* \additionalNotes
* The function acquires and releases lock before Initializing the channelFlushUnit if the
* ctx is mps client and it has a subCtx.
* \additionalNotes
* The tsgid is set only if the channel type is compute and device supports tsg.
*
* \implDetails The function creates an array of channel flush units with
* maxChannelFlushUnitCount elements, allocates memory and initializes the channelFlushUnit
* with the default values. It then calls the channelFlushUnitInitDmal which initializes the
* channelFlushUnit->dmal structure with the DMAL function pointers. It then calls the dmal
* function (channelFlushUnit->dmal.Init) to initialize the channelFlushUnit. It allocates a
* node for this channel flush unit in the queued dependency graph.
*
* \endfn
*/
CUresult channelPoolCreateChannelFlushUnits(CUchannelPool *channelPool, CUchannelType channelType);
/**
*
* \private
* \brief Destroys all the channelFlushUnits in the channel Pool.
*
* \detailDescription The function de-initializes each of the channelFlushUnits and removes the
* node from the channelGraph. It frees the memory for the channelFlushUnit. It expects
* that the channelPool is valid.
*
* \param[in] channelPool - the channel pool
*
* \additionalNotes
* The function frees the memory for the channelFlushUnit.
*
* \additionalNotes
* The function sets the channelFlushUnit->queuedDepNode to NULL.
*
* \implDetails The functions loops through all elements of channelPool->channelFlushUnitArray,
* it de- initializes each of the channelFlushUnits. It destroys the node of the channelFlushUnit
* if it is present in the queued dependency graph. It removes the channelFlushUnit from the linkedlist
* and frees the memory for the channelFlushUnit.
*
*
* \endfn
*/
第六部分:
/*
* channel private methods
*/
/**
*
* \brief Channel init function [Used only by tools]
*
* \detailDescription This function initializes a preallocated internal channel structure
* it based on the channel parameters \p params. It also adds the channel into channelFlushUnit
* list. It also initializes the semaphore for the channel.
*
* \param[out] channel - channel which will be created.
* \param[in] params - Channel should be created with these parameters
*
* \return CUresult - the result of the function
*
* \retval
* CUDA_SUCCESS
* CUDA_ERROR_OUT_OF_MEMORY
*
* \implDetails
* \endfn
*/
CUresult channelInit(CUnvchannel *channel, const CUchannelInitParams *params);
/**
* \brief Destroy the channel [Used only by tools]
*
* \detailDescription The function de-initializes the channel and destroys
* the gpfifo. It frees the memory for the channel. The function
* expects that the channel is valid.
*
* \param[in] channel - the channel to be destroyed.
*
* \additionalNotes
* Frees the memory for the channel.
*
* \implDetails channelDestroy de-initializes the channel->gpfifo and the
* trackingSemaphoreData. It memsets the channel to 0 and frees
* the channel.
*
* \endfn
*/
void channelDestroy(CUnvchannel *channel);
/**
* \private
* \brief Initialize the engine state on the channel. [Used by tools]
*
* \detailDescription The function initializes the engine state
* depending on the channel->type. The function expects the channel
* to be valid.
*
* \param[in] channel - the channel
*
* \return CUresult - the result of the function.
*
* \retval
* CUDA_SUCCESS,
*
* \additionalNotes
* The channel->type shall not be invalid.
*
* \implDetails - The function checks for the channel->type and accordingly calls the
* channelInitCompute or channelInitAsyncMemcpy to initialize the compute or memcpy engine.
*
* \endfn
*/
CUresult channelInitEngineState(CUnvchannel *channel);
/**
*
* \brief Wait on a marker on the channel
*
* \detailDescription The function waits on the markerToAcquire in the channel.
*
*
* \param[in] channel - the channel
* \param[in] markerToAcquire - the marker to wait upon.
* \param[in] flags - the flag
*
* \additionalNotes
* Return early if the marker->numEntries is 0 and the flag is not CUI_PUSH_ADD_GPU_L2_FLUSH
* \additionalNotes
* The function is thread safe and takes a lock on pushMutex.
* \additionalNotes
* The function returns early if the marker has only one entry of type CU_CTX_MARKER_ENTRY_TYPE_CHANNEL_TRACKING_SEMAPHORE. The entry's channel is same as the channel passed.
*
* \return NvBool - true if the marker needs to be acquired during the push (because CUI_PUSH_SKIP_STREAM_ACQUIRE was passed). False otherwise.
*
* \implDetails The function returns early if there is no pending work unless the flag is
* set to l2 flush. It also returns early if there is a acquire on the same channel.
* It then takes a lock on the pushMutex and calls the function channelAcquireMarker_UnderLock. It then sets the nextPushMustAcquireTrackingSem to false.
*
* \endfn
*/
CUDA_TEST_EXPORT NvBool channelAcquireMarker(CUnvchannel *channel, CUctxMarker *markerToAcquire, NvU32 flags);
/**
* \brief It removes cyclic dependency between two channels by flushing one of the
* channels. (No pushmutex lock should be used.)
* \detailDescription The function flushes the channel, if the channel to acquire can cause
* a cycle in the Queued dependency Graph. It then gets the channel from the entry and update
* the QDG and do a synchronization.
*
* \param[in] channel - the channel
* \param[in] entry - the marker entry which should be acquired.
*
*
* \implDetails The function gets the channelToAcquire and trackSemValToAcquire from the
* entry. If the channelToAcquire is true, it then checks whether it can cause a cycle, and
* calls the channelFlush_UnderLock. It then again gets the channel and semaphore value from
* the entry and then calls the channelAcquireMarker_UpdateQdgAndDoDmSync.
* \endfn
*/
CUDA_TEST_EXPORT void channelPreAcquireMarkerEntry(CUnvchannel *channel, CUctxMarkerEntry *entry);
/**
* \brief Wait on the marker Entry.
*
* \detailDescription The function does a asemaphore acquire on the marker entry depending
* on the type of the marker entry.
*
* \param[in] channel - the channel on which to operate.
* \param[out] nvCurrent - the pointer in the gpfifo.
* \param[in] entry - the marker entry
* \param[in] flags - the end push flag and the only valid flag is CUI_PUSH_ACQUIRE_OWN_TRACKING_SEM.
*
* \return void
*
* \additionalNotes
* The function expects that the channel->pushMutex is already held.
* \additionalNotes
* Call the channelPreAcquireMarkerEntry function to remove any dependency between the channels.
*
* \implDetails The function checks the type of entry->type and depending on that it calls
* the semaphoreAcquire on that marker entry. For CU_CTX_MARKER_ENTRY_TYPE_CHANNEL_TRACKING_SEMAPHORE,
* CU_CTX_MARKER_ENTRY_TYPE_CROSS_CTX_SEMAPHORE it calls the hal function semaphoreAcquire. For
* CU_CTX_MARKER_ENTRY_TYPE_PENDING_QMD, it calls the channelAcquirePendingQmd_UnderLock. Similarly for the other entry types.
*
*/
CUDA_TEST_EXPORT void channelAcquireMarkerEntry_UnderLock(CUnvchannel *channel, CUnvCurrent **nvCurrent, CUctxMarkerEntry *entry, NvU32 flags);
/**
* \brief Update Channel tracking Semaphore
*
* \detailDescription The function updates the trackingSemaphoreData of the channel.
*
*
* \param[in] channel - the channel on which to update tracking semaphore.
*
* \implDetails The function calls the trackingSemaDataUpdate function to update the channel->trackingSemaphoreData.
* \endfn
*/
void channelUpdateTrackingSemaphore(CUnvchannel *channel);
/**
* \brief Retrieve the marker of the current push
*
* \detailDescription The function creates a marker with one entry which
* hold either a qmd or the channel and returns that marker. The
* function expects that markerToRelease is valid.
*
* \param[in] channel - the channel to operate
* \param[in] pnvCurrent - A pointer to the cua pushbuffer.
* \param[in] markerToRelease - the marker to release.
*
* \additionalNotes
* The function expects that the channelManager->currentPush.active is valid.
* \additionalNotes
* The function expects that the channelManager->currentPush.channel is same as the channel.
* \additionalNotes
* The function expects that markerToRelase's channelManager is same as the channel's
* channelManager
* \additionalNotes
* The function fills the maker with nothing and returns, if the channel manager has an async error.
* \additionalNotes
* The function expects that when a channel that disallow the use of marker with QMD semaphores, a pointer the current pushbuffer must be passed for rewritting (can be NULL otherwise).
*
*
* \implDetails The function returns a marker with 1 entry. If the
* currentPush is a qmd it updates the marker with
* CU_CTX_MARKER_ENTRY_TYPE_PENDING_QMD else it updates a marker entry
* with CU_CTX_MARKER_ENTRY_TYPE_CHANNEL_TRACKING_SEMAPHORE.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelGetCurrentPushMarker(CUnvchannel *channel, CUnvCurrent **pnvCurrent, CUctxMarker* markerToRelease);
/**
* \brief Push an Acquire for a pending qmd.
* \detailDescription channelAcquirePendingQmd_UnderLock function pushes a semaphore acquire for a pending
* qmd.
*
* \param[in] channel - the channel on which to operate.
* \param[in] pnvCurrent - the pointer in the gpfifo.
* \param[in] qmd - the qmd.
* \param[in] launchId - the launch id , [0,2^64]
* \param[in] channelToAcquire - the channel that the pending QMD was using.
*
* \additionalNotes
* The function returns early if the launchId is not same as qmd->launchId.id
*
* \implDetails The function calls the channelManagerRemoveActiveQmdList_UnderLock
* function with qmd which removes the qmd from the active list. It then calls
* the hal semaphoreAcquire on the pnvCurrent, qmd->launch.channel and the semaphore
* of qmd->semaphore.
*
* \endfn
*/
void channelAcquirePendingQmd_UnderLock(CUnvchannel *channel, CUnvCurrent **pnvCurrent, CUctxMarkerPendingQMD *pendingQmd);
/**
* \brief Tracks a list of memobjs on the channel.
*
* \detailDescription The function tracks all the memobj in the memList with the same
* channel.
*
* \param[in] channel - the channel on which to operate.
* \param[in] memList - the memory list which should be tracked.
*
* \return void
*
*
* \additionalNotes
* Returns early, if there is a pending async error.
* \additionalNotes
* The function tracks all the memobj if the memList is NULL and the api is not OpenCl.
* \additionalNotes
* \noteReentrant{memmgrLock, memmgr}
*
* \implDetails The function first checks whether the memList is NULL, it then loops through all
* the memblock in the memmgr. If the memblock is mapped to device, then track the memblock's
* memobj. If the memList is valid, then it loops through all the memobjs in the memList and
* tracks all the memobjs.
*
* \endfn
*/
void channelTrackMemList(CUnvchannel* channel, CUmemTrackList memList);
/**
* \brief Update channel tracking semaphore and blocking sync state.
*
* \private
* \endfn
*/
void channelUpdateBlockingSync(CUnvchannel *channel);
/**
* \private
*/
void channelPrint(CUnvchannel *channel); //internal
/**
* \brief
*
* TODO
*/
// Make sure we have enough space. If we don't, wait until we've made enough progress in previously-pushed methods
CUresult channelMustAdvance_Underlock(CUnvchannel* channel, NvBool canFlushChannel, NvU32 requestedSize, FLAG_SET(CUIpushFlags) flags);
第七部分:
/**
* QMD tracking functions
* \private
*/
void channelManagerAddActiveQmdList(CUchannelManager *channelManager, CUqmd *qmd);//Internal
/**
* \brief Remove Active qmd list from the channel manager.
*
* \detailDescription The function removes \p qmd from the active qmd list. The function expects
* that the channelManager and qmd are valid.
*
* \param[in] channelManager - the channel manager
* \param[in] qmd - the qmd launched.
*
* \additionalNotes
* The function expects that the channelManager->markerMutex lock is held.
*
* \implDetails The function removes the qmd from the channelManager->activeQmdList if the qmd->isInActiveList is true. It then sets the qmd->isInActiveList to false.
*
* \endfn
*/
void channelManagerRemoveActiveQmdList_UnderLock(CUchannelManager *channelManager, CUqmd *qmd);
第九部分:
// Increment the tracking semaphore value and return the new value. The
// data->protectedByMutex needs to be held.
/**
* \brief Increment release value of tracking Semaphore
* \detailDescription The function increments the last value issued by cpu.
*
* \param[in] data - tracking Semaphore Data
*
* \return NvU64 - the last value release by the cpu
*
* \retval [0, 65536]
*
* \additionalNotes
* The function expects that data->protectedByMutex is held by the same thread.
*
* \implDetails The function calls the atomic function cuosInterlockedIncrement64 on the
* data->valueLastIssuedByCpu.
* \endfn
*
*/
NvU64 trackingSemaIncrementReleaseValue(CUtrackingSemaData *data);
/**
* \brief Get the last release value
*
* \detailDescription The function returns the last value issued
* by cpu.
*
* \param[in] data - tracking Semaphore Data
*
* \return NvU64 - the last value released by cpu.
*
* \additionalNotes
* The function expects that data->protectedByMutex is held by the same thread.
*
* \implDetails The function atomically reads the data->valueLastIssuedByCpu.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 trackingSemaGetLastReleaseValue(CUtrackingSemaData *data);
/**
* \brief Get the Next Value to be Released
*
* \detailDescription The function returns the next value to be released by Cpu
*
* \param[in] data - tracking Semaphore Data
*
* \return NvU64 - the next value to be released by cpu.
*
* \additionalNotes
* The function expects that data->protectedByMutex is held by the same thread.
*
* \implDetails The function calls the trackingGetLastReleaseValue(data) +1.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 trackingSemaGetNextReleaseValue(CUtrackingSemaData *data);
/**
* \brief Is value the last released value.
*
* \detailDescription The function returns true or false , whether the value
* is the last value released by Cpu.
*
* \param[in] data - tracking Semaphore Data
* \param[in] value - the value to be compared to last value released.
*
* \return NvBool - Checks whether the value is the last
*
* \implDetails The function checks whether the value is the last released value
* by Cpu.
* \endfn
*/
CUDA_TEST_EXPORT NvBool trackingSemaIsLastReleasedValue(CUtrackingSemaData *data, NvU64 value);
/**
* \brief
* \param[in] data - tracking Semaphore Data
*
* \additionalNotes
* The function expects that data->protectedByMutex is held by the same thread.
*
* \implDetails The function calls the cuiMutexAsserLockHeld. It then calls atocmic
* compare and exchangetrackingGetLastReleaseValue(data) +1.
*
* \endfn
*/
void trackingSemaFlushLastReleaseValue(CUtrackingSemaData *data);
/**
* \brief Get the last flushed value to gpu
*
* \detailDescription The function returns the last value issued to the gpu.
*
* \param[in] data - tracking Semaphore Data
*
* \implDetails The function gets the last value flushed by gpu.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 trackingSemaGetLastFlushedValue(CUtrackingSemaData *data);
/**
* \brief the flushed value is value.
*
* \detailDescription The function checks whether the flushed value is value.
*
* \param[in] data - tracking Semaphore Data
* \param[in] value - the value to be compared to flushed value.
*
* \return NvBool - Whether the value has been flushed.
*
* \implDetails The function checks whether the trackingSemaGetLastFlushedValue(data)
* >= value.
*
* \endfn
*/
CUDA_TEST_EXPORT NvBool trackingSemaHasFlushedValue(CUtrackingSemaData *data, NvU64 value);
/**
* \brief Updates the valueFinishedByGpu.
*
* \detailDescription
* \param[in] trackSemData - tracking Semaphore Data
* \param[in] oldValue - the value to be compared to last value released.
* \endfn
*/
CUDA_TEST_EXPORT NvU64 trackingSemaDataUpdateAndGetLastCompleted(CUtrackingSemaData *trackSemData, NvU64 oldValue);
/**
* \brief Get the last Completed Value by Gpu.
*
* \detailDescription The function returns the last value completed by gpu.
*
* \param[in] data - tracking Semaphore Data
*
* \return NvU64 - the last completed value by gpu.
*
* \additionalNotes
* Atomic Read on the data.
*
* \implDetails The function returns the data->valueFinishedByGpu atomically.
*
* \endfn
*/
CUDA_TEST_EXPORT NvU64 trackingSemaGetLastCompletedValue(CUtrackingSemaData *data);
/**
* \brief Get the channel's tracking syncpoint's ID and value.
*
*
* \param[in] channel - Pointer to the channel for which we need the sync point data.
* \param[out] id - Returned syncpoint ID.
* \param[out] value - Returned syncpoint's current tracking value.
*
* \return void
*
* \additionalNotes
* Function expects all input and output arguments to be valid.
*
* \implDetails Atomically reads the syncpointInfo field of the channel and divides it into syncpoint ID and value.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelGetTrackingSyncpointData(const CUnvchannel *channel, NvU32 *id, NvU32 *value);
/**
* \brief Set the channel's tracking syncpoint data.
*
*
* \param[in] channel - Pointer to the channel for which the function will set the sync point data.
* \param[in] id - syncpoint ID to set.
* \param[in] value - syncpoint's tracking value to set.
*
* \return void
*
* \additionalNotes
* Function expects all input and output arguments to be valid.
*
* \implDetails Atomically writes the syncpointInfo field of the channel by combining the input ID and value.
*
* \endfn
*/
void channelSetTrackingSyncpointData(CUnvchannel *channel, NvU32 id, NvU32 value);
/**
* \brief Push syncpoint completion event for a channel.
*
* \param[in] channel - Pointer to the channel for which the function will push syncpoint event.
*
* \return CUresult
*
* \implDetails Extracts the syncpoint information from the channel and calls the device dmal function
* deviceTriggerSyncpointCompletionEventOnFd to trigger syncpoint completion event.
*/
CUresult channelPushSyncpointCompletionEvent(CUnvchannel *channel);
/**
* \brief Makes an inactive channel from the given channel pool active, if there is one.
*
* \param[in] channelPool - Pointer to the pool from which we intend to make one channel active.
*
* \return void
*
* \implDetails Goes over all channels in the given pool and make the first inactive one active,
* using an atomic write. Nothing happens if there is no inactive channel in the pool.
*/
void channelPoolMakeNextChannelActive(CUchannelPool *channelPool);
/**
* \brief Getter for channel->inactive.
*
* \param[in] channel - Pointer to the channel for which we want to check the inactive flag.
*
* \return NvU32
*
* \implDetails atomically read the flag inactive and return the value.
*/
NvU32 channelIsInactive(CUnvchannel *channel);
第十部分:
/*
* \brief Enable/disable host pre-fetching of the current gpfifo entry.
*
* \param[in] channel - the channel.
* \param[in] enableHostPrefetching - a boolean that specifies whether to enable host pre-fetching.
*/
void channelSetHostPrefetching(CUnvchannel *channel, NvBool enableHostPrefetching);
/**
* \brief Returns the number of compute channels that we're going to have
*
* \param[in] ctx - pointer to the context whose channels we should count.
*
* \return NvU32 - the number of compute channels we should create
*
* \implDetails Computes and returns the number of compute channels that we're going to have.
*/
NvU32
channelManagerCalculateComputeChannelCount(CUctx *ctx);
/**
* \brief Returns the number of async channels that we're going to have
*
* \param[in] ctx - pointer to the context whose channels we should count.
*
* \return NvU32 - the number of async channels we should create
*
* \implDetails Computes and returns the number of async channels that we're going to have.
*/
NvU32
channelManagerCalculateAsyncChannelCount(CUctx *ctx);
// channelManagerGetChannelCount: Returns the total number of channels created
static NV_INLINE NvU32 channelManagerGetChannelCount(CUchannelManager *channelManager) {
CU_ASSERT(channelManager != NULL);
return channelManager->channelCount;
}
/**
* \brief [private] Gets the async error in the channel manager
* \private
* \detailDescription The function returns the channelManger->asyncError.
*
* \param channelManager - the channel manager on which to operate
*
* \return CUresult - the async error value of the channelManager
*
* \retval
* CUDA_SUCCESS
*
* \additionalNotes
* The size of the channelManager->asyncError should be same as the unsigned int.
* \additionalNotes
* The function does an atomic read from the channelManager->asyncError.
*
* \implDetails
* The function returns channelManager->asyncError
*
* This should be part of the channel_private.h
* \endfn
*
*/
CUDA_TEST_EXPORT CUresult channelManagerGetAsyncError(CUchannelManager *channelManager);
/**
*
* \brief Sets the async error to the channelManager.
*
* \detailDescription The function sets the channelManger->asyncError with \p asyncError.
*
* \param channelManager - the channel manager
* \param asyncError - the async error to be set.
*
* \return void
*
* \additionalNotes
* the asyncError cannot be CUDA_SUCCESS.
* \additionalNotes
* It does an atomic operation on updating the asyncError.
* \additionalNotes
* This is called only from wddm layer other than within the Unit.
* \additionalNotes
* The size of the channelManager->asyncError should be same as the unsigned int.
*
* \implDetails The function atomically sets the value of asyncError to
* channelManager->asyncError.
*
* \endfn
*/
CUDA_TEST_EXPORT void channelManagerSetAsyncError(CUchannelManager *channelManager, CUresult asyncError);
315

被折叠的 条评论
为什么被折叠?



