Radeon GPU Profiler 使用教程（Overview部分）

颜早早

已于 2024-05-14 09:17:10 修改

阅读量216

点赞数

文章标签： graphic GPU

于 2024-05-14 09:11:11 首次发布

原文链接：https://radeon-gpuprofiler.readthedocs.io/en/latest/#overview-windows

版权

本文是对 Radeon GPU Profiler 的学习记录。重点会记录Overview和Event windows两个部分，这两个部分是RGP的核心控件。

本文涉及的主要内容：Frame summary (DX12 and Vulkan)，Barriers，Context rolls，Most expensive events，Render/depth targets，Pipelines（Pipeline summary，Pipelines，Events），Device configuration

Frame summary (DX12 and Vulkan)

This window describes the structure of a profile from a number of different perspectives.

The System activity section displays a system-level view of sync operations and when command buffers were submitted to the GPU. Speaking in general terms, all profiles contain two types of data: command buffer timing data and SQTT timing data. This pane displays the former, and the rest of RGP displays the latter.

System activity是系统级视图，包括同步操作以及command buffer提交给GPU的时间。总体而言，所有的性能分析都包含两种类型的数据：command buffer的时间数据和SQTT的时间数据。Frame summary显示前者，而RGP的其余部分显示后者。

Along the top, we find a series of controls:

GPU and CPU based frames: Controls how to display frame boundaries, which are also bracketed by black markers. The difference in time between both modes can help to visualize latency between workload submission and execution. The driver provides each command buffer with a frame number, a CPU submit timestamp, a GPU start timestamp, and a GPU end timestamp.

GPU-based frames: Interprets frame boundaries to begin when a present finished on the GPU.

CPU-based frames: Interprets frame boundaries to begin when a present was submitted on the CPU.

Workload views: Provide twelve different ways to view the same data:

Command buffers: Shows a list of all command buffers in a submission. Disabling this will condense all command buffers into a single submission block which also specifies the number of contained command buffers.

Sync objects: Toggles whether to display signals and waits.

Sequential: An alternate view which shows data linearly as opposed to stacked. The dark right-most portion of command buffers and submits indicates execution time on the GPU.

GPU only: A flat view of the data which represents solely GPU work. This helps visualize parallelism among all GPU queues.

CPU submission markers: Draw vertical lines to help visualize when the CPU issued certain types of workloads to the GPU.

Zoom controls: Consistent with the rest of the tool, these allow users to drill down into points of interest. More information can be found under the Zoom Controls section.

顶部有一系列控件：

GPU and CPU based frames（基于GPU和CPU的帧）视图：用于切换如何显示帧边界，帧边界会用黑色标记括起来。观察两种模式之间的时间差能够看出 workload 提交和执行之间的延迟。driver为每个 command buffer 提供了帧编号、CPU提交时间戳、GPU开始时间戳和GPU结束时间戳。
- GPU-based frames：将帧边界解释为在GPU上完成呈现操作时开始。
- CPU-based frames：将帧边界解释为在CPU上提交呈现操作时开始。
Workload 视图：提供了十二种不同的查看数据的方式：
- Command buffers：显示提交中的所有命令缓冲区的列表。禁用此选项将把所有命令缓冲区合并为一个提交块，并指定包含的命令缓冲区数量。
- Sync objects （同步对象）视图：切换是否显示信号和等待。
- Sequential （顺序）视图：与栈视图相反，Sequential是线性显示数据。command buffer 图表中最右侧的颜色较深的区域，代表了command buffer 和提交在GPU上执行的时间。
- GPU only：只显示GPU工作的数据，并且以平面显示。这有助于可视化所有GPU队列之间的并行性。
CPU submission markers（CPU提交标记）：绘制垂直线以帮助可视化CPU何时向GPU发出特定类型的工作负载。
缩放控件：与工具的其余部分一致，这些控件允许用户深入到感兴趣的点。有关缩放控件的更多信息，请参阅缩放控件部分。

In the middle, we find the actual view. Each queue (Graphics, Compute, Copy) gets its own section. The alternating grey and white backgrounds indicate frame boundaries. The blue region indicates which command buffers were profiled with SQTT data, for more detailed event analysis in other sections of the tool. Note that command buffers are visualized using two shades of the same color. The lighter shade represents time spent prior to reaching the GPU, and the darker shade represents actual execution.

Please note that the view is interactive, making it possible for users to select and highlight command buffers, sync objects, and submission points.

Users can correlate between command buffer timing data and SQTT data by right-clicking on a command buffer within the “Detailed GPU events” region. This will bring up a context menu which contains three menu items for finding the first event within the selected command buffer. Selecting one of the menu items will navigate to the appropriate pane and set focus on the specified event.

中间是数据视图本身。每个队列（Graphics, Compute, Copy）都有自己的部分。用灰色和白色背景区分不同帧，蓝色区域表示使用SQTT数据进行性能分析的command buffers，工具的其他部分会有针对这部分command buffers的更详细的事件分析。请注意，相同颜色的command buffers有两种不同的深浅，浅色部分表示达到GPU之前花费的时间，深色部分表示实际执行时间。

请注意，该视图是交互式的，用户可以选中和高亮某个command buffer、同步对象或者提交点。

右键单击“Detailed GPU events（详细GPU事件）”区域内的command buffer，可以对比command buffer的时间数据和SQTT数据。右键会弹出一个菜单，其中包含三个选项，用于查找所选command buffer中的第一个事件。单击选项将导航到相应的窗格，并将焦点设置在指定的事件上。

Along the bottom, we find information about user selections:

Submit time: Specifies when work was issued by the CPU

Submit duration: Specifies the full duration of the submit

Enqueue duration: Specifies how long the work was queued before beginning on the GPU

GPU duration: Specifies how long the GPU took to execute it.

底部显示了有关用户选择的信息

Submit time: CPU发出工作的时间点。
Submit duration: 整个提交花费的时长。
Enqueue duration: 在GPU上开始工作之前等待市场。
GPU duration: GPU执行工作所花费的时长。

Below the queue timings view we find the following summary:

_images/rgp_frame_summary_2.png

This shows an interpretation of queue timings data to determine which processor is the bottleneck. By default, if the GPU is idle more than 5% of the time then the profile is considered to be CPU-bound. This percentage may be adjusted in RGP settings.

Please note that the values displayed for Frame duration and Frame rate are sourced from SQTT data. In other words, they are based on duration and shader clock frequency used in other RGP panes such as Wavefront occupancy.

The Profiling overhead shows the amount of profiling data that was written to video memory by the hardware while gathering the RGP profile. The profiling overhead is also expressed in terms of memory bandwidth used to write the data. The profiling overhead is comprised of both SQTT data and the cache counter data collected while profiling.

The Queue submissions and Command buffers pie charts show the number of queue submissions and command buffers in the frame broken down by the Direct and Compute queues. Compute submissions are colored in yellow and graphics submissions are colored in light blue.The Sync Primitives section counts how many unique signal and wait objects were detected throughout the profile. Please note that only signals and waits from queue operations are included in the profile data. For instance, any Vulkan signals originating from vkAcquireNextImageKHR will not appear since that is not a queue operation.

这显示了对队列时间数据的解释，用来确定哪个过程是瓶颈。默认情况下，如果GPU的空闲时间超过总时间的5%，则认为该性能分析是CPU受限的。这个百分比可以在RGP设置中进行调整。

请注意，显示的“Frame duration”和“Frame rate”值来自SQTT数据。换句话说，它们基于在其他RGP窗格中使用的持续时间和着色器时钟频率，例如Wavefront占用率。

Profiling overhead 显示了在收集RGP分析过程中，硬件写入video内存的数据量。profiling overhead是用内存带宽衡量的，包括了 SQTT 数据和缓存计数器数据。

Queue submissions 和 Command buffers饼图显示了通过 Direct queues 和 Compute queues分解的帧中的队列submissions 和command buffers数量。Compute submissions以黄色表示，graphics submissions 以浅蓝色表示。Sync Primitives（同步原语）部分计算了在整个性能分析过程中检测到的唯一信号和等待对象的数量。请注意，性能分析数据仅包括来自队列操作的信号和等待。例如，任何来自vkAcquireNextImageKHR的Vulkan信号都不会显示，因为它不是队列操作。

The Event statistics pie chart and table show the event counts colored by type. In the above example there are 281 Dispatch and 1,633 DrawIndexedInstanced events. The Instanced primitives histogram shows the number of events that drew N (1 to 16+) instances. In the example above we see that most events drew just a single instance, whereas a lesser number of events drew 2-9 and 16 instances.

Event statistics(事件统计)饼图和表格展示了事件计数，并且用不同颜色区分。在上面的示例中，有281个Dispatch事件和1,633个DrawIndexedInstanced事件。Instanced primitives(实例化基元)直方图显示了绘制N个（1到16个或更多）实例的事件数量。在上面的示例中，我们可以看到大多数事件只绘制了一个实例，而较少数量的事件绘制了2-9和16个实例。

Geometry breakdown gives a summary of the vertices, shaded primitives, shaded pixels, and instanced primitives. In the above example we can see that the GS is being used to expand the number of shaded primitives. Also, looking at the Rendered Primitives histogram we can see that one draw uses between 0 and 1K primitives, and the other draw call uses 11k or more primitives. This makes sense given that the profile is from the D3D12nBodyGravity SDK sample.

Geometry breakdown(几何分解)提供了vertices、 shaded primitives、shaded pixels和实例化primitives的摘要信息。在上面的示例中，我们可以看到GS被用于扩展着色基元的数量。此外，通过查看"Rendered Primitives "直方图，我们可以看到一个绘制调用使用了0到1K个primitives元，而另一个绘制调用使用了11k或更多个primitives。这合理的，因为这是一个D3D12nBodyGravity SDK的profile 。

Barriers

The developer is now responsible for the use of barriers in their application to control when resources are ready for use in specific parts of the frame. Poor usage of barriers can lead to poor performance but the effects on the frame are not easily visible to the developer - until now. The Barriers UI gives the developer a list of barriers in use on the graphics queue, including the additional barriers inserted by the driver.

Note that in older profiles or if the barrier origin isn’t known, all barriers and layout transitions will be shown as ‘N/A’. Using an up-to-date display driver will ensure that this information is available.

开发人员现在需要负责在他们的应用程序中使用屏障（barriers），从而控制资源在帧的特定阶段的准备情况。不恰当地使用barriers可能导致性能下降，但是目前来说，barriers在一帧中造成的影响并不容易被开发人员看。屏障界面（Barriers UI）为开发人员提供了在图形队列中使用的屏障的列表，其中也包括driver插入的额外屏障。

请注意，如果用的是旧的配置文件，或者如果屏障的来源未知，所有的屏障和布局转换将显示为 'N/A'。使用最新的显示驱动程序将确保这些信息可用。

The summary at the top left of the UI quickly lets the developer know if there is an issue with barrier usage in the frame. When calculating the percentage, only portions of a barrier’s duration which are not overlapped by one or more events from any queue are taken into consideration. For instance, if a barrier has a duration of 100 ns, but 80 ns of that barrier’s duration are overlapped by other events (on the same queue or on a different queue), then only 20 ns of that particular barrier contributes to the percentage calculation. In the case shown above, the barrier usage is taking up 0% of the frame.

This summary also displays the average number of barriers per draw or dispatch and the average number of events per barrier issue.

界面左上方的摘要，概述了有关帧中屏障使用是否存在问题。在计算百分比时，仅考虑未被任何队列的一个或多个事件重叠的屏障的时长。例如，如果一个屏障的时长为100纳秒，但其中80纳秒的屏障持续时间与其他事件（在同一队列或不同队列上）重叠，则这个特定屏障只有20纳秒被用于百分比计算。在上述情况中，屏障使用量占帧的0%。

这个摘要还会显示每个绘制或调度的平均屏障数量以及每个屏障问题的平均事件数量。

The table shows the following information:

Event Numbers - ID of the barrier - selecting an event in this UI will select it on the other Events windows

Duration - Lifetime of the barrier

Drain time - This is the amount of time the barrier spends waiting for the pipeline to drain, or work to finish. Once the pipeline is empty, new wavefronts can be dispatched

Stalls - The type of stalls associated with the barrier - where in the graphics pipe we need the work to drain from

Layout transitions - A blue check box indicates if the barrier is associated with a layout transition. There are six columns indicating the type of layout transition. These are described in the Layout transition section below.

Invalidated - A list of invalidated caches

Flushed - A list of flushed caches

Barrier type - Whether the barrier originated from the application or from the driver (or ‘N/A’ if unknown)

Reason for barrier - In the case of driver-inserted barriers, a brief description of why this barrier was inserted

The rows in the table can be sorted by clicking on a column header.

NOTE: Selecting a barrier in this list will select the same event in the other Event windows.

The user can also right-click on any of the rows and navigate to the Wavefront occupancy, Event timing, Instruction timing or Pipeline state panes and view the event represented by the selected row in these panes, as well as in the side panels. The user can also see the parent command buffer in the Frame summary pane or navigate to the Render/depth targets view and view the event in the timeline.

Below is a screenshot of what the right-click context menu looks like:

表格包括以下信息：

Event Numbers - 屏障的ID - 在这个界面上选中的事件在其他事件窗口上也会选中
Duration - 屏障的持续时长
Drain time - 屏障等待pipeline 排空或工作完成的时间。一旦pipeline 为空，就可以分派新的wavefronts
Stalls（停顿 ） - 与屏障相关的停顿类型，这个类型表示我们 where in the graphics pipe we need the work to drain from
Layout transitions（布局转换） - 蓝色复选框表示屏障是否与布局转换相关。有六列指示布局转换类型。这些在下面的布局转换部分中有描述。
Invalidated - 无效缓存的列表
Flushed - 刷新的缓存的列表
Barrier type - 屏障是由应用程序还是由driver发起的（或者如果未知，则为'N/A'）
Reason for barrier - 对于driver插入的屏障，简要描述为什么插入这个屏障

表格中的行可以通过点击列标题进行排序。

注意：在此列表中选择一个屏障将在其他事件窗口中选择相同的事件。

用户还可以右键单击任意行，并导航到 Wavefront occupancy、Event timing、 Instruction timing 、Pipeline state panes这些窗格，并在这些窗格以及侧面板中查看所选行代表的事件。用户还可以在Frame summary窗格中查看父命令缓冲区，或导航到Render/depth targets视图，并在时间轴中查看事件。

以下是右键单击上下文菜单的屏幕截图：

Layout Transitions

The following Layout Transition columns are shown in the Barriers table:

Depth/Stencil Decompress: This barrier is emitted when a depth/stencil surface is decompressed. Depth/stencil surfaces are often stored compressed to reduce bandwidth to and from the color and depth hardware units.

HiZ Range Resummarize: This barrier is emitted when a depth/stencil buffer, which has corresponding hierarchical Z-buffer data, is modified. This barrier ensures that the modified data is reflected into the hiZ-buffer, allowing for correct culling and depth testing.

DCC Decompress: This barrier is emitted when Delta Color Compression compressed color data needs to be decompressed.

FMask Decompress: This barrier is emitted when FMask data is decompressed. FMask is used to compress MSAA surfaces. These surfaces must be decompressed before they can be read by texture hardware units.

Fast Clear Eliminate: This barrier is emitted when the driver performs a fast clear. For fast clears, a barrier is needed to read the clear color before filling the render target. Clearing to specific values (typically 0.0 or 1.0) may allow the GPU to skip the eliminate operation.

Init Mask RAM: This barrier is emitted when the driver uses a shader to initialize memory used for compression.

See Getting the Most Out of Delta Color Compression - AMD GPUOpen for more information on what may cause a DCC Decompress or what “clear” values can be used to skip Fast Clear Eliminates.

Context rolls

NOTE: This UI is only available for DirectX and Vulkan profiles.

Context rolling is a hardware feature specific to the RDNA and GCN graphics architecture and needs to be taken into consideration when optimizing draws for AMD GPUs. Each draw requires a set of hardware context registers that describe the rendering state for that specific draw. When a new draw that requires a different render state enters the pipeline, an additional set of context registers is required. The process of assigning a set of context registers is called context rolling. A set of context registers follows the draw through the graphics pipeline until it is completed. On completion of the draw, that associated set of registers is free to be used by the next incoming draw.

On RDNA and GCN hardware there are 8 logical banks of context registers, of which only seven are available for draws. The worst-case scenario is that 8 subsequent draws each require a unique set of context registers. In this scenario the last draw has to wait for the first draw to finish before it can use the context registers. This causes a stall that can be measured and visualized by RGP. On RDNA2 hardware, while there are still 8 banks of context registers, one entire bank, typically bank 2, is reserved by the hardware and will typically appear completely empty in the Context rolls pane.

注意：此用户界面仅适用于DirectX和Vulkan的profiles。

Context rolling是RDNA和GCN图形架构特有的硬件功能，在优化AMD GPU的绘制时需要考虑Context rolling。每个绘制操作都需要一组描述该特定绘制操作的渲染状态的硬件context registers（上下文寄存器）。当一个新的绘制操作进入pipeline时，如果它需要的是不同的渲染状态，就需要一组新的context register。分配一组context register的过程称为Context rolling。context register是跟随整个pipeline的绘制的，直到这次绘制完成为止。在绘制操作完成后，相关的registers 才被释放，准备提供给下一个绘制操作使用。

在RDNA和GCN硬件上，context register以8 logical banks为一组，其中只有七个可用于绘制操作。最坏情况是，连续的8个绘制操作都各自需要一组context register。在这种情况下，最后一个绘制操作必须等待第一个绘制操作完成后才能使用context register，从而这会导致一次停顿，可以由RGP进行测量和可视化。在RDNA2硬件上，虽然context register仍然8 logical banks为一组，但其中一个寄存器，通常是bank 2（第2个），由硬件保留，并且在 Context rolls窗格中通常完全为空。

In the example above, a DirectX 12 application, we can see that there are 223 context rolls in the frame and none of them are redundant. The Radeon GPU Profiler compares the context register values across state changes to calculate if the context roll was redundant. Redundant context rolls can be caused by the application and the driver. Ineffective draw batching can be a cause on the application’s end.

In addition, the meter shows the number of context rolls as a percentage of the number of draw calls, giving a visual indication of how efficient the frame is with regards to changing state. A lower percentage indicates that, on average, more draw calls are sharing state across the frame. This meter also shows a breakdown of Active vs. Redundant context rolls.

The chart to the right shows the number of events in each context.

The table underneath shows the state from the API’s perspective, and which parts of the state were involved in context rolls. The first column indicates how many context rolls it was involved in. The second column indicates how many of these changes were redundant with respect to the state (the state was written with the exact same value or another piece of state was changed). The next column indicates the number of context rolls that were completely redundant (the whole context was redundant, not just the state). The final column shows the number of context rolls of this state where this was the only thing that changed in the event.

在上面这个DirectX 12应用程序的示例中，我们可以看到这一帧中有223个 context rolls，其中没有一个是冗余的。Radeon GPU Profiler会比较状态更改时的context register的值，以计算context roll是否是冗余的。冗余的context roll既可能是应用程序，也可以是driver引起的。应用程序端的Ineffective draw batching(无效绘制批处理)可能引起这种冗余。

此外，图表中有context roll数量占draw call数量的百分比，表明了这一帧在状态更改方面是否效率。更低的百分比表示更多的draw calls 在帧于帧之间共享状态。图表中还区分了活跃状态的context rolls与冗余的context rolls。

右侧的图表显示了每个上下文中的事件数量。

下方的表格是API角度的分析，以及哪些状态涉及到context rolls。第一列表示涉及到的context rolls的数量。第二列指示其中多少次更改是相对于状态而言是冗余的（状态被写入相同的值或者其他状态被改变）。接下来的列指示完全是冗余的上下文滚动数量（整个上下文都是冗余的，而不仅仅是状态）。最后一列显示了在事件中只有状态发生了变化的context rolls数量。

Selecting an API-state shows all the draw calls in the second table, called the Events table, that rolled context due to this state changing, with or without other states changing too.

The Filter API-states… field in the top-right corner of the state table filters the state tree in real-time as you type. Only the state containing the filter text string will be shown.

NOTE: Selecting an event in this list will select the same event in the other Event windows.

The user can also right-click on any of the rows and navigate to Wavefront occupancy, Event timing or Pipeline state panes and view the event represented by the selected row in these panes, as well as in the side panels. Below is a screenshot of what the right-click context menu looks like.

选择一个API状态后，第二个表格（称为事件表）会显示由于该状态的变化而导致上下文滚动的所有绘制调用，无论其他状态是否发生了变化。

状态表格右上角的“Filter API-states…”字段会实时过滤状态树，根据您输入的内容进行筛选。只有包含筛选文本字符串的状态会显示出来。

注意：在此列表中选择一个事件将会在其他事件窗口中选择相同的事件。

用户还可以右键单击任何行，并导航到Wavefront occupancy, Event timing 或者 Pipeline state窗格，在这些窗格以及侧边栏中查看所选行所代表的事件。以下是右键单击上下文菜单的截图。

_images/rgp_context_rolls_3.png

NOTE: When selecting events on the event panes and using the right-click context menu to jump between panes, the option to “View in context rolls” will only be available if the selected event is currently present in the events table on the context rolls pane.

In the events panes selecting the “context rolls” option from the “Color By” drop down box in the Wavefront occupancy event timeline or the Event timing pane shows all events that have had their context rolled from the previous event.

注意：当在 event 窗格中选择事件并使用右键单击context菜单在窗格之间跳转时，只有context rolls窗格的事件表中当前存在所选事件时，“View in context rolls”选项才可用。

在event窗格中，从 Wavefront occupancy的event timeline或 Event timing 窗格的“Color By”下拉框中选择“context rolls”选项，会显示所有已经从上一个事件进行context rolls的事件。

Most expensive events

The Most Expensive events UI allows the developer to quickly locate the most expensive events by duration. At the top of the window is a histogram of the event durations. The least expensive events are to the left of the graph and the most expensive to the right. A blue summary bar with an arrow points to the bucket that is the most costly by time. The events in this bucket are most in need of optimization. The double slider below the chart can be used to select different regions of the histogram. The summary and table below will update as the double slider’s position is changed. In the example below we can see that the most expensive 5% of events take 51% of the frame time.

Below the histogram is a summary of the frame. In this case, the top 15% of events take 99% of the frame time, with 52% of the selected region consisting of graphics events and 48% async compute events.

The table below the summary shows a list of the events in the selected region with the most expensive at the top of the list.

"Most Expensive events "界面允许开发者快速定位持续时间最长的事件。窗口顶部是事件持续时间的直方图。最不耗时的事件位于图表的左侧，最耗时的事件位于右侧。一个带有箭头的蓝色摘要条指向时间成本最高的部分，这部分的事件最需要进行优化。图表下方的双滑块可以用于选择直方图的不同区域。随着双滑块的位置变化，摘要和表格会进行更新。在下面的示例中，我们可以看到最昂贵的5%事件占据了帧时间的51%。

直方图下方是帧的摘要。在这种情况下，前15%的事件占据了帧时间的99%，所选区域的52%由图形事件组成，48%由异步计算事件组成。

摘要下方的表格显示了所选区域中事件的列表，其中最昂贵的事件位于列表的顶部。

NOTE: Selecting an event in this list will select the same event in the other Event windows.

The user can also right-click on any of the rows and navigate to Wavefront occupancy, Event timing or Pipeline state panes and view the event represented by the selected row in these panes, as well as in the side panels. Below is a screenshot of what the right-click context menu looks like.

The API Shader Stage Control shown in the last column of the table indicates which API shader stages are active in the pipeline used by the given event.

注意：在此列表中选择一个事件将会在其他事件窗口中选择相同的事件。

用户还可以右键单击任何行，并导航到波前占用、事件定时或管线状态窗格，在这些窗格以及侧边栏中查看所选行所代表的事件。以下是右键单击上下文菜单的截图。

表格中最后一列显示的 API Shader Stage Control 指示了在给定事件使用的管线中哪些API着色器阶段是活跃的。

Render/depth targets

NOTE: This UI is only available for DirectX and Vulkan profiles.

This UI provides an overview of all buffers that have been used as render targets in draw calls throughout the frame.

The screen is split into two sections, a timeline view and a tree view listing:

The graphical timeline view illustrates the usage of render targets over the duration of the frame. Other events like dispatches, copies, clears and barriers are shown at the bottom of this view.

Zoom controls can be used to focus in on a section of the timeline. More information on zoom controls can be found under the Zoom Controls section. Each solid block in this view represents a series of events that overlap and draw to the same render target within the same pass. A single click on one of these highlights the corresponding entry in the tree view.

注意：此用户界面仅适用于DirectX和Vulkan配置文件。

该用户界面提供了在整个帧期间作为render target使用的所有缓冲区的概览。

屏幕分为两个部分，一个是时间轴视图，一个是树状视图列表：

图形时间轴视图展示了render target在整个帧期间的使用情况。底部显示了其他事件，如 dispatches, copies, clears 和 barriers （分派、复制、清除和屏障）。

缩放控件可用于聚焦时间轴的某个部分。有关缩放控件的更多信息，请参阅 Zoom Controls 部分。视图中的每个实心块表示在同一个渲染通道内重叠并绘制到相同渲染目标的一系列事件。单击其中一个块会突出显示树状视图中相应的条目。

This section lists all of the render targets and their properties found in the frame. Based on the active grouping mode it either shows a top-level listing of render targets or passes. The grouping can be configured in two ways:

Group by target The top level consists of all render targets found in the frame, plus per-frame stats. Child entries show per-pass stats for each render target.

Group by pass The top level consists of all passes found in the frame. Child entries show per-pass stats for each render target.

该部分列出了帧中找到的所有render target及其属性。用active grouping mode选择按照render targets 分组还是按照pass分组:

Group by target ：第一层级按照render target 分类，plus per-frame stats。第二层级显示每个render target的逐pass统计数据。
Group by pass ：第一层级按照passes 分类. 第二层级显示每个pass的逐render target统计数据。

Here are the currently available columns:

Legend The color of the render target in the timeline.

Name The name of the render target. Currently this is sequential and based on the first occurrence of each render target in the frame.

Format The format of each render target.

Width Width of the render target.

Height Height of the render target.

Draw calls Number of draw calls that output to this render target.

Compression Indicates whether compression is enabled for this render target or not.

Sample count MSAA sample count of the render target.

Out of order draw calls Number of out of order draw calls issued to this render target. This column is not shown for profiles taken on RDNA GPUs.

Duration The total duration of all the events that rendered to the render target. For example, if 3 events write to a depth buffer the duration will be the sum of these 3 event durations.

The rows in the table can be sorted by clicking on a column header.

以下是当前可用的列:

Legend ：render target在时间轴上的颜色。
Name ：render target的名称。目前，这是基于帧中每个渲染目标的首次出现的顺序。
Format ：每个render target的格式。
Width : render target的宽度。
Height ：render target的高度。
Draw calls ：输出到该render target的draw call数量。
Compression ：是否对该render target启用了压缩。
Sample count ：渲染目标的MSAA采样数。
Out of order draw calls ：发往该渲染目标的无序 draw call数量。 This column is not shown for profiles taken on RDNA GPUs.
Duration：所有渲染到该渲染目标的事件的时长。例如，如果有3个事件写入深度缓冲区，则持续时间将是这3个事件时长的总和。

表格中的行可以通过单击列标题进行排序。

NOTE:

Selecting any item in either the timeline view or the tree view will select the corresponding item in the other view.

Selecting any item in either the timeline view or the tree view will select the earliest event represented by that item in other sections of the tool.

注意：

在时间轴视图或树状视图中选择任何项目都会在另一个视图中选择相应的项目。
在时间轴视图或树状视图中选择任何项目都会在工具的其他部分中选择由该项目表示的最早事件。

Pipelines

This overview pane provides details of the pipeline usage in the profile.

The pane is divided into three sections:

Pipeline summary - Displays a list of each pipeline API configuration found in the profile.

Pipelines - Displays a table with an entry for each pipeline found in the profile and child entries for each shader stage active in the pipeline.

Events - Displays all events that use the selected pipeline in the Pipelines table.

该窗格分为三个部分：

Pipeline summary - profile中找到的每个pipline API 配置的列表。

Pipelines - 将profile中的pipeline整理成表格，子层级显示每个pipeline中活跃的的着色器阶段。

Events - 显示在“Pipelines”表中选中的pipline的所有事件。

Pipeline summary

The pipeline summary section displays all unique pipeline configurations colored by API shader stage.

Unique is defined as having the same active API shader stages

Next to each configuration is a count of how many pipelines in the profile matched the configuration.

pipeline summary部分显示了所有唯一pipeline配置，用颜色区分不同的API着色器阶段。“唯一”指的是具有相同活动的API着色器阶段。每个配置旁边显示了在配置文件中与该配置匹配的管线数量。

Pipelines

The Pipelines section contains a table with an entry for each pipeline found in the profile.

Each entry in the table displays the following information:

Bucket ID - ID to match pipeline to event state bucket used for grouping in other panes.

Hash - 128-bit pipeline hash and API shader hash.

Duration - The pipeline duration is the sum of the durations of all events which use this pipeline (overlapped areas only counted once). The shader stage duration displayed for child items in the table is the sum of the stage-specific shader durations for all events which use this pipeline (overlapped areas are only counted once).

Event count - Number of events which use the pipeline and percentage out of total number of events in profile.

Avg event duration - Average duration of events using this pipeline in the profile.

Occupancy - Occupancy range and per-shader-stage occupancy for each pipeline.

VGPRs - VGPR range and per-shader-stage VGPR usage for each pipeline.

SGPRs - SGPR range and per-shader-stage SGPR usage for each pipeline.

Scratch mem - Yes/No to indicate if the pipeline uses scratch memory.

Wave mode – wave32/wave64 to indicate the mode of the shader. This column only appears for devices that support wave32 vs. wave64.

Stages - The API Shader Stage Control indicating which stages are active for given pipeline.

The Filter pipelines… field can be used to filter items in the list by the API PSO hash. The Pipelines table can be sorted by clicking on a column header.

Right-clicking a pipeline in the pipeline summary section displays a context menu giving the option to “Analyze pipeline in Radeon GPU Analyzer.” Selecting the option saves the pipeline in a binary format and opens the binary file in the Radeon GPU Analyzer. See the section Radeon GPU Analyzer and Radeon GPU Profiler interop for more information.

“Filter pipelines…”字段可用于通过API PSO哈希来筛选列表中的项目。可以通过单击列标题对管线表进行排序。

在 pipeline summary 部分右键单击一个pipeline 会显示一个上下文菜单，有“Analyze pipeline in Radeon GPU Analyzer.”的选项。选择该选项会将管线以二进制格式保存，并在Radeon GPU Analyzer中打开该二进制文件。有关更多信息，请参阅Radeon GPU Analyzer and Radeon GPU Profiler interop 部分。

Below the table, the Bucket ID, API PSO hash and Driver internal pipeline hash for the currently-selected pipeline is displayed. There is also a quick link to view the selected pipeline in the Pipeline state view. This will navigate to the Pipeline state view for the first event associated with the pipeline.

表格下方显示了当前选定pipeline 的 Bucket ID、API PSO哈希和Driver 内部pipeline 哈希。还有一个快速链接，可以在“Pipeline state视图”中查看所选pipeline 。这将导航到与该pipeline相关的第一个事件的“Pipeline state 视图”。

Events

The Events table displays all events which use the currently-selected pipeline in the Pipelines table.

Each entry in the table displays the following information:

Event ID - ID for event

Event - Event text displaying the API or Driver call for event

Duration - Time event spent during frame in profile

The Events table can be sorted by clicking on a column header.

As with all event lists in RGP, the user can right-click to quickly navigate to the event in other panes.

Device configuration

This UI reports the GPU configuration of the system that was used to generate the profile. The Radeon Developer Panel can retrieve profiles from remote systems so the GPU details can be different from the system that you are using to view the data. The clock frequencies refer to the clock frequency running when the capture was taken. The number in parentheses represents the peak clock frequency the graphics hardware can run at.

此用户界面报告了用于生成配置文件的系统的GPU配置。Radeon开发者面板可以从远程系统检索配置文件，因此GPU的详细信息可能与您用于查看数据的系统不同。clock frequency指的是在进行捕获时运行的时钟频率。括号中的数字表示图形硬件可以运行的峰值clock frequency。