Good performance is critical to the success of many games. Beloware some simple guidelines for maximizing the speed of your game'sgraphical rendering.
Where are the graphics costs
The graphical parts of your game can primarily cost on two systemsof the computer: the GPU or the CPU. The first rule of anyoptimization is to find
Typical bottlenecks and ways to check for them:
- GPU is often limitedby
fillrate or memorybandwidth. - Does running the game at lowerdisplay resolution make it faster? If so, you're most likelylimited by fillrate on the GPU.
- CPU is often limited by the number ofthings that need to be rendered, also known as "draw calls".
- Check "draw calls"in
RenderingStatistics window; if it's more than severalthousand (for PCs) or several hundred (for mobile), then you mightwant to optimize the object count.
- Check "draw calls"in
Of course, these are only the rules of thumb; the bottleneck couldas well be somewhere else. Less typical bottlenecks:
- Rendering is not a problem, neitheron the GPU nor the CPU! For example, your scripts or physics mightbe the actual problem. Use
Profiler tofigure this out. - GPU has too many vertices to process.How many vertices are "ok" depends on the GPU and the complexity ofvertex shaders. Typical figures are "not more than 100 thousand" onmobile, and "not more than several million" on PC.
- CPU has too many vertices to process,for things that do vertex processing on the CPU. This could beskinned meshes, cloth simulation, particles etc.
CPU optimization - draw call count
In order to render any object on the screen, the CPU has some workto do - things like figuring out which lights affect that object,setting up the shader & shader parameters, sending drawingcommands to the graphics driver, which then prepares the commandsto be sent off to the graphics card. All this "per object" CPU costis not very cheap, so if you have lots of visible objects, it canadd up.
So for example, if you have a thousand triangles, it will be much,much cheaper if they are all in one mesh, instead of having athousand individual meshes one triangle each. The cost of bothscenarios on the GPU will be very similar, but the work done by theCPU to render a thousand objects (instead of one) will besignificant.
In order to make CPU do less work, it's good to reduce the visibleobject count:
- Combine close objects together,either manually or using Unity's
draw callbatching. - Use less materials in your objects,by putting separate textures into a larger texture atlas and soon.
- Use less things that cause objects tobe rendered multiple times (reflections, shadows, per-pixel lightsetc., see below).
Combine objects together so that each mesh has at least severalhundred triangles and uses only one
However, when using many pixel lights inthe
GPU: Optimizing Model Geometry
When optimizing the geometry of a model, there are two basicrules:
- Don't use any more triangles thannecessary
- Try to keep the number of UV mappingseams and hard edges (doubled-up vertices) as low as possible
Note that the actual number of vertices that graphics hardware hasto process is usually not the same as the number reported by a 3Dapplication. Modeling applications usually display the geometricvertex count, i.e. the number of distinct corner points that makeup a model. For a graphics card, however, some geometric verticeswill need to be split into two or more logical vertices forrendering purposes. A vertex must be split if it has multiplenormals, UV coordinates or vertex colors. Consequently, the vertexcount in Unity is invariably higher than the count given by the 3Dapplication.
While the amount of geometry in the models is mostly relevant forthe GPU, some features in Unity also process models on the CPU, forexample mesh skinning.
Lighting Performance
Lighting which is not computed at all is always the fastest!Use
- It is going to run a lot faster (2-3times for 2 per-pixel lights)
- And it will look a lot better sinceyou can bake global illumination and the lightmapper can smooth theresults
In a lot of cases there can be simple tricks possible in shadersand content, instead of adding more lights all over the place. Forexample, instead of adding a light that shines straight into thecamera to get "rim lighting" effect, consider adding a dedicated"rim lighting" computation into your shaders directly.
Lights in forwardrendering
Per-pixel dynamic lighting will add significant rendering overheadto every affected pixel and can lead to objects being rendered inmultiple passes. On less powerful devices, like mobile or low-endPC GPUs, avoid having more than one
If you use pixel lighting then each mesh has to be rendered as manytimes as there are pixel lights illuminating it. If you combine twomeshes that are very far apart, it will increase the effective sizeof the combined object. All pixel lights that illuminate any partof this combined object will be taken into account duringrendering, so the number of rendering passes that need to be madecould be increased. Generally, the number of passes that must bemade to render the combined object is the sum of the number ofpasses for each of the separate objects, and so nothing is gainedby combining. For this reason, you should not combine meshes thatare far enough apart to be affected by different sets of pixellights.
During rendering, Unity finds all lights surrounding a mesh andcalculates which of those lights affect it most.The
As an example, consider a driving game where the player's car isdriving in the dark with headlights switched on. The headlights arelikely to be the most visually significant light sources in thegame, so their Render Mode would probably be setto
Optimizing per-pixel lighting saves both CPU and the GPU: the CPUhas less draw calls to do, and the GPU has less vertices to processand pixels to rasterize for all these additional objectrenders.
GPU: Texture Compression and Mipmaps
Using
Use Texture Mip Maps
As a rule of thumb, always have
The only exception to this rule is when a texel (texture pixel) isknown to map 1:1 to the rendered screen pixel, as with UI elementsor in a 2D game.
LOD and Per-Layer Cull Distances
In some games, it may be appropriate to cull small objects moreaggressively than large ones, in order to reduce both the CPU andGPU load. For example, small rocks and debris could be madeinvisible at long distances while large buildings would still bevisible.
This can be either achieved by
Realtime Shadows
Realtime shadows are nice, but they can cost quite a lot ofperformance, both in terms of extra draw calls for the CPU, andextra processing on the GPU. For further details, seethe
GPU: Tips for writing high-performance shaders
A high-end PC GPU and a low-end mobile GPU can be literallyhundreds of times performance difference apart. Same is true evenon a single platform. On a PC, a fast GPU is dozens of times fasterthan a slow integrated GPU; and on mobile platforms you can seejust as large difference in GPUs.
So keep in mind that GPU performance on mobile platforms andlow-end PCs will be much lower than on your development machines.Typically, shaders will need to be hand optimized to reducecalculations and texture reads in order to get good performance.For example, some built-in Unity shaders have their "mobile"equivalents that are much faster (but have some limitations orapproximations - that's what makes them faster).
Below are some guidelines that are most important for mobile andlow-end PC graphics cards:
Complex mathematical operations
Transcendental mathematical functions (suchas
It is not advisable to attempt to write yourown
Keep in mind that alpha test (discard) operation will make yourfragments slower.
Floating point operations
You should always specify the precision of floating point variableswhen writing custom shaders. Itis
If the shader is written in Cg/HLSL then precision is specified asfollows:
- float
- full 32-bitfloating point format, suitable for vertex transformations but hasthe slowest performance. - half
- reduced 16-bitfloating point format, suitable for texture UV coordinates androughly twice as fast as highp. - fixed
- 10-bit fixedpoint format, suitable for colors, lighting calculation and otherhigh-performance operations and roughly four times fasterthan highp.
If the shader is written in GLSL ES then the floating pointprecision is specified specified as
For further details about shader performance, please readthe
Simple Checklist to make Your Game Faster
- Keep vertex count below 200K..3M perframe when targetting PCs, depending on the target GPU
- If you're using built-in shaders,pick ones from Mobile or Unlit category. They work on non-mobileplatforms as well; but are simplified and approximated versions ofthe more complex shaders.
- Keep the number of differentmaterials per scene low - share as many materials between differentobjects as possible.
- Set
Static property on anon-moving objects to allow internal optimizationslike staticbatching. - Do notuse
Pixel Lights when itis not necessary - choose to have only a single (preferablydirectional) pixel light affecting your geometry. - Do not use dynamic lights when it isnot necessary - choose to bake lighting instead.
- Use compressed texture formats whenpossible, otherwise prefer 16bit textures over 32bit.
- Do not use fog when it is notnecessary.
- Learn benefitsof
OcclusionCulling and use it to reduce amount of visiblegeometry and draw-calls in case of complex static scenes with lotsof occlusion. Plan your levels to benefit from ccclusionculling. - Use skyboxes to "fake" distantgeometry.
- Use pixel shaders or texturecombiners to mix several textures instead of a multi-passapproach.
- If writing custom shaders, always usesmallest possible floating point format:
- fixed
/ lowp - forcolors, lighting information and normals, - half
/ mediump - fortexture UV coordinates, - float
/ highp - avoid inpixel shaders, fine to use in vertex shader for positioncalculations.
- fixed
- Minimize use of complex mathematicaloperations such as
pow, sin, cos etc. inpixel shaders. - Choose to use less textures perfragment.