UE4 CPU Profiling

1649 篇文章 12 订阅
1623 篇文章 23 订阅


If you are CPU bound in the render thread, it is likely because of too many draw calls. This is a common problem and artists often have to combine draw calls to reduce the cost for that (e.g. combine multiple walls into one mesh). The actual cost is in many areas:

  • The render thread needs to process each object (culling, material setup, lighting setup, collision, update cost, etc.). More complex materials result in a higher setup cost.

  • The render thread needs to prepare the GPU commands to set up the state for each draw call (constant buffers, textures, instance properties, shaders) and to do the actual API call. Base pass draw calls are usually more costly than depth only draw calls.

  • DirectX validates some data and passes the information to the graphics card driver.

  • The driver (e.g. NVIDIA, AMD, Intel, ...) validates further and creates a command buffer for the hardware. Sometimes this part is split in another thread.

Mesh Draw Calls displayed when using the stats commands show the draw calls caused by 3D meshes - this is the number artists can reduce by:

  • Reducing object count (static/dynamic meshes, mesh particles)

  • Reducing view distance (e.g. on the Scene Capture Actor or per object)

  • Adjusting the view (more zoomed out view, moving objects to not be in the same view)

  • Avoiding SceneCaptureActor (needs to re-render the scene, set fps to be low or update only when needed)

  • Avoiding split screen (Always more CPU bound than single view, needs custom scalability settings or content to be more aggressive)

  • Reducing Elements per draw calls (combine materials accepting more complex pixel shaders or simply have less materials, combine textures to fewer larger textures - only if that reduces material count, use LOD models with fewer elements)

  • Disabling features on the mesh like custom depth or shadow casting

  • Changing light sources to not shadow cast or having a tighter bounding volume (view cone, attenuation radius)

In some cases, hardware instancing might be an option (same 3d model, same shader, only few parameters change, hardware needs to support it). Hardware instancing reduces the driver overhead per draw call a lot but it limits the flexibility. We use it for mesh particles and for InstancedFoliage.

SceneRendering.png
CONSOLE: stat SceneRendering

Experience shows that on a high end PC, you can have thousands of draw calls per frame (DirectX11, OpenGL). Newer APIs (AMD Mantle, DirectX12) try to address the driver overhead and can do larger numbers. On mobile, the number is more in the hundreds (OpenGL ES2, OpenGL ES3) but even there the driver overhead can be greatly reduced (Apple Metal).

If you are CPU bound in the game thread, you need to look into what game code is causing this issue (e.g. Blueprints, raycasts, physics, artificial intelligence, memory allocation).

Game.png
CONSOLE: stat Game

The build in CPU profiler can help you to find the functions causing the issue:

DumpFrame.png
CONSOLE: stat DumpFrame -ms=0.1

Here we used a threshold of 0.1 milliseconds to customize the output. After running the command, you can find the result in the log and in the console. The hierarchy shows the the time in milliseconds and the call count. If needed, you can add QUICK_SCOPE_CYCLE_COUNTER to the code to refine the hierarchy further like the following example:

virtual void DrawDynamicElements(FPrimitiveDrawInterface* PDI,const FSceneView* View) override
{
    QUICK_SCOPE_CYCLE_COUNTER( STAT_BoxSceneProxy_DrawDynamicElements );

    const FMatrix& LocalToWorld = GetLocalToWorld();
    const FColor DrawColor = GetSelectionColor(BoxColor, IsSelected(), IsHovered(), false);
    DrawOrientedWireBox(PDI, LocalToWorld.GetOrigin(), ... );
}

If you are not CPU bound, then you are GPU bound.

If you are CPU bound in the render thread, it is likely because of too many draw calls. This is a common problem and artists often have to combine draw calls to reduce the cost for that (e.g. combine multiple walls into one mesh). The actual cost is in many areas:

  • The render thread needs to process each object (culling, material setup, lighting setup, collision, update cost, etc.). More complex materials result in a higher setup cost.

  • The render thread needs to prepare the GPU commands to set up the state for each draw call (constant buffers, textures, instance properties, shaders) and to do the actual API call. Base pass draw calls are usually more costly than depth only draw calls.

  • DirectX validates some data and passes the information to the graphics card driver.

  • The driver (e.g. NVIDIA, AMD, Intel, ...) validates further and creates a command buffer for the hardware. Sometimes this part is split in another thread.

Mesh Draw Calls displayed when using the stats commands show the draw calls caused by 3D meshes - this is the number artists can reduce by:

  • Reducing object count (static/dynamic meshes, mesh particles)

  • Reducing view distance (e.g. on the Scene Capture Actor or per object)

  • Adjusting the view (more zoomed out view, moving objects to not be in the same view)

  • Avoiding SceneCaptureActor (needs to re-render the scene, set fps to be low or update only when needed)

  • Avoiding split screen (Always more CPU bound than single view, needs custom scalability settings or content to be more aggressive)

  • Reducing Elements per draw calls (combine materials accepting more complex pixel shaders or simply have less materials, combine textures to fewer larger textures - only if that reduces material count, use LOD models with fewer elements)

  • Disabling features on the mesh like custom depth or shadow casting

  • Changing light sources to not shadow cast or having a tighter bounding volume (view cone, attenuation radius)

In some cases, hardware instancing might be an option (same 3d model, same shader, only few parameters change, hardware needs to support it). Hardware instancing reduces the driver overhead per draw call a lot but it limits the flexibility. We use it for mesh particles and for InstancedFoliage.

SceneRendering.png
CONSOLE: stat SceneRendering

Experience shows that on a high end PC, you can have thousands of draw calls per frame (DirectX11, OpenGL). Newer APIs (AMD Mantle, DirectX12) try to address the driver overhead and can do larger numbers. On mobile, the number is more in the hundreds (OpenGL ES2, OpenGL ES3) but even there the driver overhead can be greatly reduced (Apple Metal).

If you are CPU bound in the game thread, you need to look into what game code is causing this issue (e.g. Blueprints, raycasts, physics, artificial intelligence, memory allocation).

Game.png
CONSOLE: stat Game

The build in CPU profiler can help you to find the functions causing the issue:

DumpFrame.png
CONSOLE: stat DumpFrame -ms=0.1

Here we used a threshold of 0.1 milliseconds to customize the output. After running the command, you can find the result in the log and in the console. The hierarchy shows the the time in milliseconds and the call count. If needed, you can add QUICK_SCOPE_CYCLE_COUNTER to the code to refine the hierarchy further like the following example:

virtual void DrawDynamicElements(FPrimitiveDrawInterface* PDI,const FSceneView* View) override
{
    QUICK_SCOPE_CYCLE_COUNTER( STAT_BoxSceneProxy_DrawDynamicElements );

    const FMatrix& LocalToWorld = GetLocalToWorld();
    const FColor DrawColor = GetSelectionColor(BoxColor, IsSelected(), IsHovered(), false);
    DrawOrientedWireBox(PDI, LocalToWorld.GetOrigin(), ... );
}

If you are not CPU bound, then you are GPU bound.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值