5. Texture Filtering - Trilinear
— Trilinear - Like Bilinear but with
added blur between mipmap
level
— Don’t use trilinear without
mipmap
— This filtering will remove
noticeable change between
mipmap by adding smooth
transition
— Trilinear filtering is still
expensive on mobile
— Use it with caution
6. Texture Filtering - Anisotropic
— Anisotropic - Make textures look
better when viewed from different
angle, which is good for ground level
textures
— Higher anisotropic level cost higher
7. Texture Filtering
— Use bilinear for balance between performance and visual quality
— Trilinear will cost more memory bandwidth than bilinear and needs to be
used selectively
— Bilinear + 2x Anisotropic most of the time will look and perform better
than Trilinear + 1x Anisotropic, so this combination can be better solution
rather than using Trilinear
— Keep the anisotropic level low
— Using a level higher than 2 should be done very selectively for critical
game assets
– This is because higher anisotropic level will cost a lot more bandwidth and affect
device battery life
8. Always Use Mipmap If Camera Is Not Still
— Using mipmapping will improve
GPU performance
— Less cache miss
— Mipmapping also reduce
texture aliasing and improve
final image quality
— Don’t use it on 2D objects
9. Texture Color Space
— Use linear color space rendering if using
dynamic lighting
— Check sRGB in texture inspector window
— Textures that are not processed as color
should NOT be used in sRGB color space
(such as metallic, roughness, normal map,
etc)
— Current hardware supports sRGB format
and hardware will do Gamma correction
automatically for free
10. Texture Compression
— ASTC may get better quality with
same memory size as ETC or same
quality with less memory size than
ETC
— ASTC may take longer to encode
compared to ETC - use it on final
packaging of the game
— ASTC allows more control in terms of
quality by allowing to set block size -
5x5 or 6x6 is good default
11. Texture Channel Packing
— Use texture channels to pack multiple
textures into one
— Commonly used to pack roughness, or
smoothness, and metallic into one
texture
— Can be applied for any texture mask
— Make good use of alpha channel
13. Avoid Rendering Small Triangles
— The bandwidth and processing cost
of a vertex is typically orders of
magnitude higher than the cost of
processing a fragment
— Make sure that you get many pixels
worth of fragment work for each
primitive
— Use dynamic mesh level-of-detail,
using simpler meshes when objects
are further away from the camera
— Make sure each model which create
at least 10-20 fragments per
primitive
14. Avoid Rendering Long Thin Triangles
— More expensive for the GPU to process when compared
with normal triangles
— GPUs process pixels in quad blocks
— Long thin triangle edges will waste more GPU power to
rasterize
— Adjacent long thin triangles will waste doubly
15. Avoid Duplicating Vertices
— Reuse as many vertices as possible
— Transformed vertex data can be cached to save
computation power
— Avoid duplicating vertices unless it’s necessary
V0
V1
V2
V3
V4
V0
V1
V2
V3
V5
V4 V7
V6
V8
T1 : (V0, V1, V2)
T2 : (V1, V3, V2)
T3 : (V2, V3, V4)
T1 : (V0, V1, V2)
T2 : (V3, V4, V5)
T3 : (V6, V7, V8)
GOOD BAD
16. Instancing
— Render many objects using the same
mesh
— Each instance can have its own
properties
— Reduce the number of draw call and
memory bandwidth
— Check the “Enable GPU Instancing”
option in material
— Then use
UNITY_ACCESS_INSTANCED_PROP() in
shader to access the instance
properties
18. Shader Floating-point Precision
— Use mediump and highp keywords
— Full FP32 of vertex attributes is unnecessary for many
uses of attribute data
— Keep the data at the minimum precision needed to
produce an acceptable final output
— Use FP32 for computing vertex positions only
— Use the lowest possible precision for other attributes
— Don’t always use FP32 for everything
— Don’t upload FP32 data into a buffer and then read it as a
mediump attribute
19. Take Advantage of Early-Z
— Many fragments are occluded by other fragments
— Running fragment shader of occluded fragment is wasting
GPU power
— Render opaque object from front to back
— Occluded fragment will be rejected before shading
— Fragment writing out depth/stencil will go Late-Z path
which rejects occluded fragment after fragment shader
— Fragment using discard or Alpha-to-coverage will be
forced to do Late-Z and may stall the pipeline
Early Frag Op
Fragment
Shader
Late Frag Op
20. Avoid Heavy Overdraw
— Overdraw means one pixel has been rendered more
than once
— Alpha blending overdraw is expensive on mobile
— Use Unity built-in display feature to check the amount
of overdraw
— Use Arm Mobile Studio to check the in-game overdraw
— Brighter area means more overdraw
— Render from front to back order to reduce the
overdraw
— Optimize arrangement of layer, sorting layer, render
queue and camera setting to avoid overdraw
21. Reduce the Amount of Alpha Blending/Tested
Fragments
— Separate transparent mesh from opaque mesh
— Use polygon mesh instead of quad for transparent texture
— Both ways can reduce the amount of transparent
fragments and improve performance
22. Dynamic Branching
— Dynamic branching in shader is not as expensive as most
developers think, but…
— Both sides of branch will be executed and pick one if the
branching area is too small
— Shader compiler will optimize it automatically
— Use dynamic branching when it can skip enough
computation
24. Reduce Render State Switch
— Render state switch is very expensive operation
— Rendering as many primitives as possible before render state(SetPass)
switch
— Don’t just check number of draw calls or batches
— Number of render state switch is also an important index
— Check Tris/SetPass (i.e. 95.2K/34)
— Batch as many draw calls as possible
– Static batching
– GPU Instancing
– Dynamic batching
25. Reduce Frame Buffer Switch
— Bind each frame buffer only once
— Making all required draw calls before
switching to the next frame buffer
— Avoid unnecessary render buffer switch
— Can reduce memory bandwidth
requirement and power consumption
(~100mW for 1GB/s)
— Use Unity frame debugger to check
— Use Arm Mobile Studio to do API level
check
26. Clear Frame Buffer Before
Rendering
— Before rendering, GPU will read frame buffer into
tile memory from external memory
— Minimizing tile loads at renderpass start
— Can cheaply initialize the tile memory to a clear
color value
— Ensure that you clear or invalidate all of your
attachments at the start of each render pass
— Use Unity frame debugger to check
— Use Arm Mobile Studio to do API level check
Doesn’t clear before rendering
Bad for performance
27. Reduce Frame Buffer Write
— After rendering, GPU will write result from tile
memory to external memory
— Minimizing tile stores at renderpass end
— Avoid writing back to external memory
whenever is possible
— Don’t bind depth/stencil buffer if depth/stencil
value is not used
— Use RenderTexture.DiscardContents() to
invalidate frame buffers if you don’t need the
data at next frame
— Use Unity frame debugger to check
— Use Arm Mobile Studio to do API level check
29. Generative
Art
—
Made
with
Unity
Arm Mobile Studio – Free Tool for Mobile Optimization
• https://developer.arm.com/mobile-studio
Arm Guide for Unity Developers
• https://developer.arm.com/solutions/graphics-and-gaming/gaming-
engine/unity/arm-guide-for-unity-developers
모바일 게임 아티스트를 위한 베스트 프랙티스 가이드
• https://blogs.unity3d.com/kr/2020/04/07/artists-best-practices-for-
mobile-game-development/
Arm DevRel
• developer@arm.com