SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Executable Bloat
                              How it happens and how we can fight it

                                   Andreas Fredriksson <dep@dice.se>




Thursday, February 24, 2011
Background
                    • Why is ELF/EXE size a problem?
                     • Executable code takes up memory
                     • Consoles have precious little RAM
                     • More available memory enables richer
                                gameplay
                              • It is a lot of work to fix after the fact
Thursday, February 24, 2011
Software Bloat
                                 Software bloat is a term used to describe the
                                 tendency of newer computer programs to have a
                                 larger installation footprint, or have many
                                 unnecessary features that are not used by end
                                 users, or just generally use more system resources
                                 than necessary, while offering little or no benefit
                                 to its users.



                    • Wikipedia has this definition (emphasis mine)
                    • We are wasting memory we could be using
                              to make better games



Thursday, February 24, 2011
C++ Bloat

                    • Let’s look at a few typical C++ techniques
                     • Some are even enforced in coding
                                standards
                              • Often recommended in literature
                              • Often accepted without question

Thursday, February 24, 2011
Interface Class Hiding
                    • Hiding single class Foo         struct IFoo {
                                                         virtual void func() = 0;
                              behind interface IFoo   };



                              • Intent is to avoid    class Foo : public IFoo {

                                                      };
                                                         virtual void func();

                                including heavy Foo
                                header everywhere
                              • Sometimes used for
                                PRX/DLL loading


Thursday, February 24, 2011
Interface Class Hiding
                              struct IFoo {                      0              IFoo vtbl
                                 virtual void a() = 0;           typeinfo_ptr
                                 virtual void b() = 0;           pure_vcall
                              };                                 pure_vcall
                              class Foo : public IFoo {
                                 virtual void a();
                                 virtual void b();               0
                                                                                 Foo vtbl
                              };                                 typeinfo_ptr
                                                                 Foo_a
                                                                 Foo_b


                    •         One hidden cost - virtual tables
                              •   One for each class
                              •   In PPU ABI cost is higher as Foo_a will be a
                                  pointer to a 4+4 byte ABI struct


Thursday, February 24, 2011
Interface Class Hiding
                    • Additional overhead
                     • Every call site needs vcall adjustment
                     • ctor/dtor needs vptr adjustment
                    • Total SNC overhead, 8 functions: 528 bytes
                     • Likely more, due to callsites
                     • These bytes have no value = bloat
Thursday, February 24, 2011
Excessive Inlining

                    • Typically associated with templates
                     • Templates are almost always inline
                     • Smart pointers
                     • String classes

Thursday, February 24, 2011
Excessive Inlining
                              bool eastl_string_find(eastl::string& str) {
                              !   return str.find("foo") != eastl::string::npos;
                              }
                              bool c_string_find(const char* str) {
                              !   return strstr(str, "foo") != 0;
                              }



                    • Compare two string searches
                     • c_string_find version - 56 bytes (SNC)
                     • eastl_string_find - 504 bytes (SNC)
                    • These bytes add zero value = bloat
Thursday, February 24, 2011
Excessive Inlining
                    • Many other common inlining culprits
                     • operator+= string concatenation
                     • Call by value w/ inline ctor
                    • Hard to control via compiler options
                     • Sometimes you want inlining
                     • Better to avoid pointless cases
Thursday, February 24, 2011
Static Initialization
                              static const eastl::string strings[] = { "foo0", "foo1" };
                              static const Vec3 vecs[] = { { 0.2, 0.3, 0.5 }, { ... } };




                    •         Extreme hidden costs for constructors
                    •         Generates initializer function
                    •         Have seen arrays of this kind generate over 20
                              kb of initializer code in the wild (SNC)
                              •   Array of 10 eastl::strings - 1292 bytes
                              •   Array of 10 vec3 - 368 bytes


Thursday, February 24, 2011
Static Initialization
                              static const char* strings[] = { "foo0", "foo1", ... };
                              static const float vecs[][3] = { { 0.2, 0.3, 0.5 }, ... };




                    • Just don’t do it - prefer POD types
                      • Make sure data ends up in .rodata segment
                      • Adjust code using array accordingly
                    • Alternatively make data load with level
                      • No space overhead when not used
Thursday, February 24, 2011
operator<<
                    • A special case of excessive inlining
                    • Creeps up in formatted I/O
                      • Assert macros
                    • Prefer snprintf()-style APIs if you must
                              format text at runtime
                              • Usually less than half the overhead
                              • Ideally avoid text formatting altogether
Thursday, February 24, 2011
Sorting
                    • STL sort is a bloat generator
                      • Specialized for each type - faster
                                compares..
                              • ..but usually whole merge/quicksort
                                duplicated per parameter type! - often
                                around 1-2kb code
                    • We have 140 sort calls in the code base - up
                              to 280 kb overhead..


Thursday, February 24, 2011
Sorting
                    • Prefer qsort() for small/medium datasets
                     • Adds callback overhead on PPU..
                     • Rule of thumb - qsort < 32-64 elements
                    • Same applies to many other template
                              algorithms
                              • Use only when it really buys something
Thursday, February 24, 2011
Part 2:
                              What you can do



Thursday, February 24, 2011
Accept the Domain
                    •         Console coding is a very specific problem space
                              •   Think and verify before you apply general
                                  desktop C++ advice or patterns
                              •   Bloat is caused by humans, not compilers
                    •         Example: "Virtual functions are essentially free"
                              •   True on x86 architecture (most of the time)
                              •   On PS3 PPU often two cache misses - ~1200
                                  cycle penalty + ELF size bloat already covered


Thursday, February 24, 2011
Day to day
                    • Think about full impact of your changes
                     • .text size impact
                     • .data/.rodata size impact
                     • Bring it up & discuss in code reviews
                    • Make sure your code is reasonably sized for
                              the problem it solves!


Thursday, February 24, 2011
Avoid “reuse” bloat
                    •         YAGNI - “You ain’t gonna need it”
                              •   Just write simple code that can be extended if
                                  needed
                              •   We typically never extend systems without
                                  altering their interfaces anyway
                    •         Game code is disposable, a means to an end
                              •   Make sure it works well NOW
                              •   Avoid “single-feature frameworks”


Thursday, February 24, 2011
Avoid repetition
                    •         Can often move repeated code to data
                              •   Higher information density but same end result
                          RegisterFunc("foo", func_foo);
                          RegisterFunc("bar", func_bar);
                          RegisterFunc("baz", func_baz);
                          // ...

                         static const struct {
                             const char *name;   void (*func)(void);
                         } opdata[] = {
                         !   { "foo", func_foo   },
                         !   { "bar", func_bar   },
                         !   { "baz", func_baz   },
                             // ...
                         };

                         for (int i=0; i < sizeof_array(opdata); ++i)
                             RegisterFunc(opdata[i].name, opdata[i].func);




Thursday, February 24, 2011
Compiler Output
                    • Look at the generated code!
                      • That’s what you’re checking in, not C++
                      • Don’t assume code is improved by the
                                compiler
                              • No magic going on, compilers are stupid
                    • Develop an intuition for what to expect
                      • Verify assumptions as you code
Thursday, February 24, 2011
Assembly
                    • Learn enough assembly to read compiler
                              output
                              • Function calls (calling convention)
                              • Memory loads and stores
                              • FP/Vector instructions
                    • It’s not very difficult - just do it
                      • Also improves your debugging skills
Thursday, February 24, 2011
Guidelines
                    • Avoid string classes, concatenation
                     • Excessive inlining
                    • Avoid template containers for simple
                              problems
                              • Inlining + instantiation cost
                              • Prefer C arrays for most jobs
Thursday, February 24, 2011
Guidelines
                    • Avoid complex types in function signatures
                              and interfaces
                               • Requires caller to jump through hoops
                               • Often bloats all call sites
                               • Prefer raw POD types
                                • (T* ptr, int count) is better
                                    than (std::vector<T>&)


Thursday, February 24, 2011
Guidelines
                    • Avoid inheritance, interfaces and virtual
                              functions
                              • Hidden costs are subtle
                              • Prefer function pointers for callbacks
                              • Prefer free functions on predeclared
                                types for header stripping


Thursday, February 24, 2011
Guidelines
                    •         Avoid operator<< streaming
                              •   Prefer printf() style APIs
                              •   Easy to make your own formatter for often-
                                  used types
                    •         Avoid singletons
                              •   They just add bloat around data that's just as
                                  global anyway
                              •   Prefer free functions around static data


Thursday, February 24, 2011
Summary
                    • Make sure the code/data you’re adding is
                              reasonably sized for the problem it solves
                              • Use no more than necessary
                    • Pick up some assembly and look at the
                              compiler output
                    • Always measure, examine & question!

Thursday, February 24, 2011
Questions?
                              Twitter: @deplinenoise
                               Email: <dep@dice.se>




Thursday, February 24, 2011

Weitere ähnliche Inhalte

Was ist angesagt?

Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizations
pjcozzi
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
Tiago Sousa
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
Electronic Arts / DICE
 

Was ist angesagt? (20)

Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizations
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 

Andere mochten auch

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Johan Andersson
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Johan Andersson
 

Andere mochten auch (20)

FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
 
Introduction to Data Oriented Design
Introduction to Data Oriented DesignIntroduction to Data Oriented Design
Introduction to Data Oriented Design
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
 
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOMHow High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 

Ähnlich wie Executable Bloat - How it happens and how we can fight it

Sintsov advanced exploitation in win32
Sintsov   advanced exploitation in win32Sintsov   advanced exploitation in win32
Sintsov advanced exploitation in win32
DefconRussia
 
Advanced Programming With Notes/DominoCOM Classes
Advanced Programming With Notes/DominoCOM ClassesAdvanced Programming With Notes/DominoCOM Classes
Advanced Programming With Notes/DominoCOM Classes
dominion
 

Ähnlich wie Executable Bloat - How it happens and how we can fight it (12)

Netty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersNetty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and Buffers
 
Mastering Python 3 I/O (Version 2)
Mastering Python 3 I/O (Version 2)Mastering Python 3 I/O (Version 2)
Mastering Python 3 I/O (Version 2)
 
From code to pattern, part one
From code to pattern, part oneFrom code to pattern, part one
From code to pattern, part one
 
Sintsov advanced exploitation in win32
Sintsov   advanced exploitation in win32Sintsov   advanced exploitation in win32
Sintsov advanced exploitation in win32
 
Mixing Source and Bytecode: A Case for Compilation By Normalization (OOPSLA 2...
Mixing Source and Bytecode: A Case for Compilation By Normalization (OOPSLA 2...Mixing Source and Bytecode: A Case for Compilation By Normalization (OOPSLA 2...
Mixing Source and Bytecode: A Case for Compilation By Normalization (OOPSLA 2...
 
Mastering Python 3 I/O
Mastering Python 3 I/OMastering Python 3 I/O
Mastering Python 3 I/O
 
Understanding the Python GIL
Understanding the Python GILUnderstanding the Python GIL
Understanding the Python GIL
 
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_heRecon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
Recon2016 shooting the_osx_el_capitan_kernel_like_a_sniper_chen_he
 
Handling I/O in Java
Handling I/O in JavaHandling I/O in Java
Handling I/O in Java
 
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
 
Advanced Programming With Notes/DominoCOM Classes
Advanced Programming With Notes/DominoCOM ClassesAdvanced Programming With Notes/DominoCOM Classes
Advanced Programming With Notes/DominoCOM Classes
 
Virtual Bolt Workshop - April 1, 2020
Virtual Bolt Workshop - April 1, 2020Virtual Bolt Workshop - April 1, 2020
Virtual Bolt Workshop - April 1, 2020
 

Mehr von Electronic Arts / DICE

High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
Electronic Arts / DICE
 

Mehr von Electronic Arts / DICE (19)

GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeSIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingCEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismCEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
 
Creativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsCreativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural Systems
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDShiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Battlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + MantleBattlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + Mantle
 

Executable Bloat - How it happens and how we can fight it

  • 1. Executable Bloat How it happens and how we can fight it Andreas Fredriksson <dep@dice.se> Thursday, February 24, 2011
  • 2. Background • Why is ELF/EXE size a problem? • Executable code takes up memory • Consoles have precious little RAM • More available memory enables richer gameplay • It is a lot of work to fix after the fact Thursday, February 24, 2011
  • 3. Software Bloat Software bloat is a term used to describe the tendency of newer computer programs to have a larger installation footprint, or have many unnecessary features that are not used by end users, or just generally use more system resources than necessary, while offering little or no benefit to its users. • Wikipedia has this definition (emphasis mine) • We are wasting memory we could be using to make better games Thursday, February 24, 2011
  • 4. C++ Bloat • Let’s look at a few typical C++ techniques • Some are even enforced in coding standards • Often recommended in literature • Often accepted without question Thursday, February 24, 2011
  • 5. Interface Class Hiding • Hiding single class Foo struct IFoo { virtual void func() = 0; behind interface IFoo }; • Intent is to avoid class Foo : public IFoo { }; virtual void func(); including heavy Foo header everywhere • Sometimes used for PRX/DLL loading Thursday, February 24, 2011
  • 6. Interface Class Hiding struct IFoo { 0 IFoo vtbl virtual void a() = 0; typeinfo_ptr virtual void b() = 0; pure_vcall }; pure_vcall class Foo : public IFoo { virtual void a(); virtual void b(); 0 Foo vtbl }; typeinfo_ptr Foo_a Foo_b • One hidden cost - virtual tables • One for each class • In PPU ABI cost is higher as Foo_a will be a pointer to a 4+4 byte ABI struct Thursday, February 24, 2011
  • 7. Interface Class Hiding • Additional overhead • Every call site needs vcall adjustment • ctor/dtor needs vptr adjustment • Total SNC overhead, 8 functions: 528 bytes • Likely more, due to callsites • These bytes have no value = bloat Thursday, February 24, 2011
  • 8. Excessive Inlining • Typically associated with templates • Templates are almost always inline • Smart pointers • String classes Thursday, February 24, 2011
  • 9. Excessive Inlining bool eastl_string_find(eastl::string& str) { ! return str.find("foo") != eastl::string::npos; } bool c_string_find(const char* str) { ! return strstr(str, "foo") != 0; } • Compare two string searches • c_string_find version - 56 bytes (SNC) • eastl_string_find - 504 bytes (SNC) • These bytes add zero value = bloat Thursday, February 24, 2011
  • 10. Excessive Inlining • Many other common inlining culprits • operator+= string concatenation • Call by value w/ inline ctor • Hard to control via compiler options • Sometimes you want inlining • Better to avoid pointless cases Thursday, February 24, 2011
  • 11. Static Initialization static const eastl::string strings[] = { "foo0", "foo1" }; static const Vec3 vecs[] = { { 0.2, 0.3, 0.5 }, { ... } }; • Extreme hidden costs for constructors • Generates initializer function • Have seen arrays of this kind generate over 20 kb of initializer code in the wild (SNC) • Array of 10 eastl::strings - 1292 bytes • Array of 10 vec3 - 368 bytes Thursday, February 24, 2011
  • 12. Static Initialization static const char* strings[] = { "foo0", "foo1", ... }; static const float vecs[][3] = { { 0.2, 0.3, 0.5 }, ... }; • Just don’t do it - prefer POD types • Make sure data ends up in .rodata segment • Adjust code using array accordingly • Alternatively make data load with level • No space overhead when not used Thursday, February 24, 2011
  • 13. operator<< • A special case of excessive inlining • Creeps up in formatted I/O • Assert macros • Prefer snprintf()-style APIs if you must format text at runtime • Usually less than half the overhead • Ideally avoid text formatting altogether Thursday, February 24, 2011
  • 14. Sorting • STL sort is a bloat generator • Specialized for each type - faster compares.. • ..but usually whole merge/quicksort duplicated per parameter type! - often around 1-2kb code • We have 140 sort calls in the code base - up to 280 kb overhead.. Thursday, February 24, 2011
  • 15. Sorting • Prefer qsort() for small/medium datasets • Adds callback overhead on PPU.. • Rule of thumb - qsort < 32-64 elements • Same applies to many other template algorithms • Use only when it really buys something Thursday, February 24, 2011
  • 16. Part 2: What you can do Thursday, February 24, 2011
  • 17. Accept the Domain • Console coding is a very specific problem space • Think and verify before you apply general desktop C++ advice or patterns • Bloat is caused by humans, not compilers • Example: "Virtual functions are essentially free" • True on x86 architecture (most of the time) • On PS3 PPU often two cache misses - ~1200 cycle penalty + ELF size bloat already covered Thursday, February 24, 2011
  • 18. Day to day • Think about full impact of your changes • .text size impact • .data/.rodata size impact • Bring it up & discuss in code reviews • Make sure your code is reasonably sized for the problem it solves! Thursday, February 24, 2011
  • 19. Avoid “reuse” bloat • YAGNI - “You ain’t gonna need it” • Just write simple code that can be extended if needed • We typically never extend systems without altering their interfaces anyway • Game code is disposable, a means to an end • Make sure it works well NOW • Avoid “single-feature frameworks” Thursday, February 24, 2011
  • 20. Avoid repetition • Can often move repeated code to data • Higher information density but same end result RegisterFunc("foo", func_foo); RegisterFunc("bar", func_bar); RegisterFunc("baz", func_baz); // ... static const struct { const char *name; void (*func)(void); } opdata[] = { ! { "foo", func_foo }, ! { "bar", func_bar }, ! { "baz", func_baz }, // ... }; for (int i=0; i < sizeof_array(opdata); ++i) RegisterFunc(opdata[i].name, opdata[i].func); Thursday, February 24, 2011
  • 21. Compiler Output • Look at the generated code! • That’s what you’re checking in, not C++ • Don’t assume code is improved by the compiler • No magic going on, compilers are stupid • Develop an intuition for what to expect • Verify assumptions as you code Thursday, February 24, 2011
  • 22. Assembly • Learn enough assembly to read compiler output • Function calls (calling convention) • Memory loads and stores • FP/Vector instructions • It’s not very difficult - just do it • Also improves your debugging skills Thursday, February 24, 2011
  • 23. Guidelines • Avoid string classes, concatenation • Excessive inlining • Avoid template containers for simple problems • Inlining + instantiation cost • Prefer C arrays for most jobs Thursday, February 24, 2011
  • 24. Guidelines • Avoid complex types in function signatures and interfaces • Requires caller to jump through hoops • Often bloats all call sites • Prefer raw POD types • (T* ptr, int count) is better than (std::vector<T>&) Thursday, February 24, 2011
  • 25. Guidelines • Avoid inheritance, interfaces and virtual functions • Hidden costs are subtle • Prefer function pointers for callbacks • Prefer free functions on predeclared types for header stripping Thursday, February 24, 2011
  • 26. Guidelines • Avoid operator<< streaming • Prefer printf() style APIs • Easy to make your own formatter for often- used types • Avoid singletons • They just add bloat around data that's just as global anyway • Prefer free functions around static data Thursday, February 24, 2011
  • 27. Summary • Make sure the code/data you’re adding is reasonably sized for the problem it solves • Use no more than necessary • Pick up some assembly and look at the compiler output • Always measure, examine & question! Thursday, February 24, 2011
  • 28. Questions? Twitter: @deplinenoise Email: <dep@dice.se> Thursday, February 24, 2011