SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Parallelizing Conqueror’s Blade*
Making the Most of Intel® Core™ for the Best Gaming Experience
Nan Mi
Engineer Lead @BoomingGames
Lei Su
Senior Engineer @BoomingGames
Sheng Guo
Application Engineer @Intel.com
Agenda
 Multi-core: Opportunities to scale user experience
 Conqueror’s Blade*: Case study to leverage multi-core
 Optimization background
 Building job system
 Jobifying engine sub-systems
 Scaling user experience
2
Next Generation Multi-Core Processor
3
 Physical CPU/cores increasing quickly
 4 cores: max install base
 6 cores: mainstream shipping
 8-18 cores: high-end shipping
 Multicore utilization of games today
 Most multithreaded, but only with 2~3
heavy threads
 Insufficient CPU utilization
Steam Hardware & Software Survey: February 2018
What to Do with the Idle of Cores
4
Boost
Performance
Enrich
Experience
Software occlusion culling Buffering load turbulenceBalancing load among cores
Global illumination Detailed animation
Realistic clothing
Realistic ragdoll
Realistic destruction Advanced particles
Wind & Weather
3D audio
Additional rendering passes
More details of distant model
Ambient animation and background life
Decorative contents
With Great User Experience Comes Great
Parallelized Engine
5
Maximized
User Experience
Parallelized
Game Engine
Scale User Experience (Performance + Effects) with More Cores
Key Problems to Consider
6
Maximize
User Experience
Parallelize
Game Engine
 Enable perceptible multi-core
scaling w/o impacting game play
 The quality of effects
 The types of high-quality effects
 The coverage of high-quality
effects on all unites
 Decompose engine functionality to
fine-grained jobs
 Rendering
 Game Logic
 Simulation
 Build efficient job scheduler
7
Case study: Conqueror’s Blade*
Outline
 Game Background
 Engine Architecture Evolution
 Building Scalable Game Engine
 Job system
 Case study parallelization engine subsystems
 Scaled Gaming Experience
 Tips & Tricks
 Future Work
8
Game Background
 Conqueror’s Blade* is a PC online-game
 Hero : Action gameplay
 Legion : Tactic gameplay
 Empowered war machines
 Immersive battlefield
9
Gameplay Trailer
10
Motivation For Multicore Scalable Engine
 Game is Logic Heavy
 Huge number of individual soldiers
 Dynamic battleground
 Rich battlefield elements
 Problems of Legacy Architecture
 Difficult to scale to more cores
 CPU Bound
11
Goals & Challenges
 Goals
 Support more than 1K actors with individual AI and states
 Dynamic battlefield
 Easy to scale
 Multi-thread debug friendly
 Challenges
 Game is in development & test
 On-the-fly upgrade engine
 Time-limited (~2.5 months)
 Technique Choice
 Entity-Component-System model
 Job system
12
ECS Model
 Entity-Component-System*
 Data is everything
 Entity is just ID
 Component holds only data
 System contains the same kind of component and methods
 Pros
 Parallelization friendly
 Cache friendly
 Memory management friendly
13
*[Timothy17] Overwatch Gameplay Architecture and Netcode, GDC 2017
Original vs ECS
 Original Model  ECS Model
14
Entity
Animation
Component
Physics
Component
Transform
Component
...
...Entity
Animation
Component
Physics
Component
Transform
Component ...
Entity
Animation
Component
Physics
Component
Transform
Component ...
Entity
Animation
Component
Physics
Component
Transform
Component ...
...
Animation
Component
Animation
Component
Animation
Component
Physics
Component
Physics
Component
Physics
Component
Transform
Component
Transform
Component
Transform
Component
Animation
System
Physics
System
Transform
System
Data organized by entity
Data Heterogeneous
Memory Jumping
Cache Miss
Data organized by system
Data Homogeneous
Memory Contiguous
Cache Friendly
Fixed Multi-thread (Legacy)
Render
Simulation
Logic
Visibility GBuffer Shadow Lighting Forward
Transpar
ent
Postproc
ess
UI
LOD
Animatio
n
Physics Particle
Lua AI Motor ...
...
Network
 Fixed Multi-thread
 Render
 Simulation
 Logic
15
Thread Fork/Join (Intermediate)
 Fixed Multi-thread
 Thread Fork/Join
 Thread Pool
 Fork/Join from fixed thread
16
Render
Simulation
Logic
Visibility GBuffer Shadow Lighting Forward
Transpar
ent
Postproc
ess
UI
LOD
Ani
mati
on
Lua AI
Network
Work
Thread
Work
Thread
Work
Thread
AI Task
AI Task
AI Task
Animation
Task
Animation
Task
Animation
Task
Animatio
n
AI
Physics Particle ...
Motor ...
Job Based (Final)
 Fixed Multi-thread
 Thread fork/join
 Job Based
 Render Backend
 Job System
 Network…
17
Global Job
Queue
Job
Job
Job
Work
Thread
Job Queue
Job
Job
Work
Thread
Job Queue
Job
Waiting Job
Job
JobJob...Work
Thread
Job Queue
Job
Job JobJob Network
Render
Backend
Engine Architecture
Job System
Job System
 Fiber based implementation*
 What is fiber
 A lightweight execution context(include a user provided stack, registers…)
 Fiber execution is collaborative, means a fiber can switch to another interactively
 Pros
 Easy to implement task schedule
 Easy to handle task dependency
 Job stack is isolated
 Avoid frequency context switch
 Cons
 C++ does not natively support fiber
 Implementation is different between OS
 Has some restrictions(thread_local invalid)
 Fiber Implement
 Boost context: Cross-platform, Industry proven, Fast
18
*[Christian15] Parallelizing the Naughty Dog engine using fibers, GDC 2015
Job Scheduler
 Thread Independent Job Queue
 Each work thread has its own job queue
 The job generated from the thread will be added to the queue
 Separate Global Job Queue
 Job submit outside job system (frame begin, some middleware …)
 LIFO Mode
 In most case, job dependency is tree like
 Some system add jobs occasionally but wait them immediately
 Job Stealing
 Worker thread load balance
19
Global Job
20
Global Job
Queue
Job
Job
Job
Work
Thread
Job Queue
Job
Job
Work
Thread
Job Queue
Job
Waiting Job
Job
JobJob...Work
Thread
Job Queue
Job
Job JobJob
Job
Outside threads
add global jobs
Work thread gets global
job from global queue
Job Dependency
21
Global Job
Queue
Job
Job
Job
Work
Thread
Job Queue
Job
Job
Work
Thread
Job Queue
Job
Waiting Job
Job
JobJob...Work
Thread
Job Queue
Job
Job JobJobJob
Job
runing waitingruning ready
dependency
new added jobs
Run First
Job Stealing
22
Global Job
Queue
Job
Job
Job
Waiting Job
Job
JobJob
...
Work
Thread
Job Queue
Job
Job
Work
Thread
Job Queue
Work
Thread
Job Queue
Job
Job
Queue
empty
On-The-Fly Change Step
 Change to ECS Model
 Entity level update to component level update
 Gather same component to system, system level update
 Parallelization each system
 Keep system tick order
 Split jobs in self system and wait jobs to finish before system end
 Modify system dependency
 Clarify system dependency
 Launch independent systems at the same time
 Wait system jobs in the system really dependent on them
23
System From Single-Thread To Multi-Thread
 Lock
 Always the first change step
 Behaves well when there are few conflicts
 Backup of lock-free version
 Batch and Swap
 Useful for polling system
 Lock-Free
 Use the simplest lock-free data structure
24
Subsystems Overview
25
Lua Physics
Animation Particle
Motor
Render
Physics System
 Physics System build on PhysX/Apex Library
 Features
 Rigidbody
 Cloth
 Destruction
 Ragdoll
26
Jobify PhysX Knowhow
 PhysX Library support task
 Only need to implement the
PxCpuDispatcher
 Code is easy to be integrated
 Details need consider
 PhysX occasionally submits tasks
and then immediately waits for them
to complete, so suggest using the
LIFO mode
 PhysX has synchronization stage
 PxScene::flushQueryUpdates
27
Trigger sync stage
Reduce shapes
usage!!!
Animation Works
 Animation Tree Update
 Each Animation Tree updates
independently
 Trigger Effect/Particle/Sound…
 Skeleton Transform Calculation
28
Simply split jobs by actor count!
Difficulties
 Related with many other systems
 Not thread-safe ready
 Difficult to balance job load
 Cost has huge difference between actors
29
bad job
MPSC Queue
30
op op op op op
Animation
Worker
Animation
Worker
Animation
Worker
Animation
Worker
Pre
Fetch OP
Post
Animation system Related system
Load Balance
 Cover other than really balance
 Split job by experience
 Launch independent systems earlier
 Wait animation results in another dependency system
31
cover job
Script
 Script Usage
 Lua as script
 Lua call engine c++ functions
 Script jobify
 Lua is not native multi-thread
 Make heavy calculation in C++
 Gather calculations together
 Parallel only c++ codes
 Script logic can tick with fixed time(like 100ms)
32
Jobify Particle System
 Particle System Module
 Experience Job Split Rules
 By particle classify
 By particle simulation phases
 Problems & Solutions
 Particle job conflicts
 Particle job workload balance
33
Particle System Module
 Particle Emitters
 Particle spawn and delete
 Particle Renders
 Billboard/Trail/Mesh/Beam …
 Particle Affectors
 Color over Life, gravity, motion …
 Use global particle pool to control particle budget
34
Particle System
Emitter
Affector
Affector
Render
…
Emitter
Affector
Affector
Render
…
…
Job Split Rule 1 - Particle Classify
 Entity-Relative
 Animation result dependent
 Animation trail, etc
 Non-Entity-Relative
 Smoke, explosion, weather, etc
35
Job Split Rule 2 – Particle Phases
 Spawn jobs
 Particle emit and delete
 Update jobs
 Particle property refresh
 Render Prepare jobs
 GPU friendly data
 Problems:
 Conflicts in global pool
 Simply splitting job by particle system count causes bad workload balance
36
Particle System
Emitter
Affector
Affector
Render
…
Emitter
Affector
Affector
Render
…
…Spawn
Update
Render Prepare
Particle System …
Solve Particle Job Conflict
 Conflict Case 1
 Particle Spawn
 Allocate particle block from pool with Atomic
 Allocate block is just AtomicAdd
 New particle from block
 Particle Dead
 Simple swap with the last particle in block
 When block is empty, free whole block back to pool
 Conflict Case 2
 Particle render transfer into one big vertex buffer
 Use AtomicAdd to get write position in linear pool
37
PoolBlock Block
Block Pointer
Block Particle count
(atomic)
Particle Particle
Particle Particle
Workload Balance Problem
38
Good
particle
jobs
Bad job, too heavy
Split by Emitter
 Some particle jobs are too heavy
 Weather particle
 Massive ammo animation trails
 Split by particle emitters
39
Render Thread
 Legacy Single Thread Render
 D3D11
 Deferred shading pipeline
 Visibility & render on main thread
40
Visibility
Scaleform
UI
GBuffer
Cascade
Shadow
Deferred
Shading
Forward Transparent PostProcess Present
Multi-threading Render
 Render Backend Thread
 Flush command list on intermedia context
 Render Job Context
 Build D3D11 command list use deferred context
 Split per scene
 6 render jobs
 Shadow
 GBuffer
 Terrain Relative
 Static Object Relative
 Dynamic Object Relative
 Translucent
 Forward
41
Render Multi-Thread WorkFlow
42
Time
Render Thread
Intermedia Context
Work Job
Deferred Context
Work Job
Deferred Context
Work Job
Deferred Context
Work Job
Deferred Context
Work Job
Deferred Context
Work Job
Deferred Context
Scaleform UI
Eye Visibility
Shadow
Visibility
Gbuffer Terrain
Gbuffer Static
Gbuffer Dynamic
Forward
Transparent
Cascade
Shadow
GBuffer
Command
Shadow
Command
Deferred
Shading
CLWait PostProcessCL
Performance Comparison - Before
43
> 50ms
Performance Comparison - After
44
much
butter
~19ms
CPU Scaling
45
0
0.5
1
1.5
2
2.5
3
3.5
2 cores 4 cores 6 cores 8 cores > 8 cores
Render needs to
better jobify
Extra Optimization
 Intel Masked Occlusion Culling Library *
 CPU Software Occlusion Culling
 Easy to be integrated
 Reduce draw call
46
*Masked Occlusion Culling, https://github.com/GameTechDev/MaskedOcclusionCulling
Masked Software Occlusion Culling Result
 Performance (4 cores)
47
Level Rasterize &
Visibility
MOC off MOC on Speedup
Main City 2.7ms 25 fps 30 fps 1.2x
Siege Battlefield 3.1ms 23.2 fps 29 fps 1.25x
Enriching Visual Effects for More Cores
 Clothing
 Physics destruction
 Particles
 Ragdoll
 Animation
48
Tips & Tricks
 Optimize the code itself first rather than parallelize
 Lock is your friend in the first step
 Pending and swap
 Data-oriented is both optimization friendly and debug friendly
 Simple structure means easier to parallelize and debug
49
Future Work
 Further data-oriented design
 More clearly identified system dependencies
 Chunk-based multi-thread rendering
 Job based lock (no more mutex, lock…)
50
51
Thanks
Legal Disclaimer & Optimization Notice
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are
reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice.
Notice revision #20110804
52
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL
DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES
RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR
OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests,
such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change
to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating
your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit
www.intel.com/benchmarks.
Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of
Intel Corporation in the U.S. and other countries.

Weitere ähnliche Inhalte

Was ist angesagt?

Mesociclo contraataque (jose quintás sesar)
Mesociclo contraataque (jose quintás sesar)Mesociclo contraataque (jose quintás sesar)
Mesociclo contraataque (jose quintás sesar)Futbol_Ofensivo
 
ECA Report on Youth Academies in Europe
ECA Report on Youth Academies in EuropeECA Report on Youth Academies in Europe
ECA Report on Youth Academies in EuropeEduardo Conde Tega
 
Kit 4 places 2cv mehari club cassis
Kit 4 places 2cv mehari club cassisKit 4 places 2cv mehari club cassis
Kit 4 places 2cv mehari club cassisMarques Laurent
 
Ejemplo Informe Análisis del Rival
Ejemplo Informe Análisis del RivalEjemplo Informe Análisis del Rival
Ejemplo Informe Análisis del RivalGaspar Cammarata
 
Scott Redhead Football CV 2016
Scott Redhead Football CV 2016Scott Redhead Football CV 2016
Scott Redhead Football CV 2016Scott Redhead
 
Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012
Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012 Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012
Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012 Patrizio Bruzzo
 

Was ist angesagt? (9)

Mesociclo contraataque (jose quintás sesar)
Mesociclo contraataque (jose quintás sesar)Mesociclo contraataque (jose quintás sesar)
Mesociclo contraataque (jose quintás sesar)
 
ECA Report on Youth Academies in Europe
ECA Report on Youth Academies in EuropeECA Report on Youth Academies in Europe
ECA Report on Youth Academies in Europe
 
Kit 4 places 2cv mehari club cassis
Kit 4 places 2cv mehari club cassisKit 4 places 2cv mehari club cassis
Kit 4 places 2cv mehari club cassis
 
Ejemplo Informe Análisis del Rival
Ejemplo Informe Análisis del RivalEjemplo Informe Análisis del Rival
Ejemplo Informe Análisis del Rival
 
Informe sevilla
Informe sevillaInforme sevilla
Informe sevilla
 
Scott Redhead Football CV 2016
Scott Redhead Football CV 2016Scott Redhead Football CV 2016
Scott Redhead Football CV 2016
 
Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012
Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012 Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012
Corso Uefa A - Settore Tecnico Coverciano - 31/10/2011 - 17/05/2012
 
Planificacion entren
Planificacion entrenPlanificacion entren
Planificacion entren
 
Talentscout
TalentscoutTalentscout
Talentscout
 

Ähnlich wie Parallelizing Conqueror's Blade

Threading Successes 05 Smoke
Threading Successes 05   SmokeThreading Successes 05   Smoke
Threading Successes 05 Smokeguest40fc7cd
 
CI from scratch with Jenkins (EN)
CI from scratch with Jenkins (EN)CI from scratch with Jenkins (EN)
CI from scratch with Jenkins (EN)Borislav Traykov
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Principles of operating system
Principles of operating systemPrinciples of operating system
Principles of operating systemAnil Dharmapuri
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
Parallel Extentions to the .NET Framework
Parallel Extentions to the .NET FrameworkParallel Extentions to the .NET Framework
Parallel Extentions to the .NET Frameworkukdpe
 
Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...
Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...
Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...Piccolo Engine
 
Building cautious software
Building cautious softwareBuilding cautious software
Building cautious softwareKyle Dyer
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Johan Andersson
 
Threading Game Engines: QUAKE 4 & Enemy Territory QUAKE Wars
Threading Game Engines: QUAKE 4 & Enemy Territory QUAKE WarsThreading Game Engines: QUAKE 4 & Enemy Territory QUAKE Wars
Threading Game Engines: QUAKE 4 & Enemy Territory QUAKE Warspsteinb
 
【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説
【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説
【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説Unity Technologies Japan K.K.
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoonmochimedia
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slotsmha4
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slotsmha4
 

Ähnlich wie Parallelizing Conqueror's Blade (20)

Threading Successes 05 Smoke
Threading Successes 05   SmokeThreading Successes 05   Smoke
Threading Successes 05 Smoke
 
CI from scratch with Jenkins (EN)
CI from scratch with Jenkins (EN)CI from scratch with Jenkins (EN)
CI from scratch with Jenkins (EN)
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Principles of operating system
Principles of operating systemPrinciples of operating system
Principles of operating system
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Parallel Extentions to the .NET Framework
Parallel Extentions to the .NET FrameworkParallel Extentions to the .NET Framework
Parallel Extentions to the .NET Framework
 
Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...
Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...
Lecture 02: Layered Architecture of Game Engine | GAMES104 - Modern Game Engi...
 
Building cautious software
Building cautious softwareBuilding cautious software
Building cautious software
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
 
Threading Game Engines: QUAKE 4 & Enemy Territory QUAKE Wars
Threading Game Engines: QUAKE 4 & Enemy Territory QUAKE WarsThreading Game Engines: QUAKE 4 & Enemy Territory QUAKE Wars
Threading Game Engines: QUAKE 4 & Enemy Territory QUAKE Wars
 
2337610
23376102337610
2337610
 
【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説
【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説
【Unite 2018 Tokyo】C# Job SystemとECS(Entity Component System)解説
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoon
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 
Operating System Assignment Help
Operating System Assignment HelpOperating System Assignment Help
Operating System Assignment Help
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slots
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slots
 
01.osdoc
01.osdoc01.osdoc
01.osdoc
 

Mehr von Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 

Mehr von Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Parallelizing Conqueror's Blade

  • 1. Parallelizing Conqueror’s Blade* Making the Most of Intel® Core™ for the Best Gaming Experience Nan Mi Engineer Lead @BoomingGames Lei Su Senior Engineer @BoomingGames Sheng Guo Application Engineer @Intel.com
  • 2. Agenda  Multi-core: Opportunities to scale user experience  Conqueror’s Blade*: Case study to leverage multi-core  Optimization background  Building job system  Jobifying engine sub-systems  Scaling user experience 2
  • 3. Next Generation Multi-Core Processor 3  Physical CPU/cores increasing quickly  4 cores: max install base  6 cores: mainstream shipping  8-18 cores: high-end shipping  Multicore utilization of games today  Most multithreaded, but only with 2~3 heavy threads  Insufficient CPU utilization Steam Hardware & Software Survey: February 2018
  • 4. What to Do with the Idle of Cores 4 Boost Performance Enrich Experience Software occlusion culling Buffering load turbulenceBalancing load among cores Global illumination Detailed animation Realistic clothing Realistic ragdoll Realistic destruction Advanced particles Wind & Weather 3D audio Additional rendering passes More details of distant model Ambient animation and background life Decorative contents
  • 5. With Great User Experience Comes Great Parallelized Engine 5 Maximized User Experience Parallelized Game Engine Scale User Experience (Performance + Effects) with More Cores
  • 6. Key Problems to Consider 6 Maximize User Experience Parallelize Game Engine  Enable perceptible multi-core scaling w/o impacting game play  The quality of effects  The types of high-quality effects  The coverage of high-quality effects on all unites  Decompose engine functionality to fine-grained jobs  Rendering  Game Logic  Simulation  Build efficient job scheduler
  • 8. Outline  Game Background  Engine Architecture Evolution  Building Scalable Game Engine  Job system  Case study parallelization engine subsystems  Scaled Gaming Experience  Tips & Tricks  Future Work 8
  • 9. Game Background  Conqueror’s Blade* is a PC online-game  Hero : Action gameplay  Legion : Tactic gameplay  Empowered war machines  Immersive battlefield 9
  • 11. Motivation For Multicore Scalable Engine  Game is Logic Heavy  Huge number of individual soldiers  Dynamic battleground  Rich battlefield elements  Problems of Legacy Architecture  Difficult to scale to more cores  CPU Bound 11
  • 12. Goals & Challenges  Goals  Support more than 1K actors with individual AI and states  Dynamic battlefield  Easy to scale  Multi-thread debug friendly  Challenges  Game is in development & test  On-the-fly upgrade engine  Time-limited (~2.5 months)  Technique Choice  Entity-Component-System model  Job system 12
  • 13. ECS Model  Entity-Component-System*  Data is everything  Entity is just ID  Component holds only data  System contains the same kind of component and methods  Pros  Parallelization friendly  Cache friendly  Memory management friendly 13 *[Timothy17] Overwatch Gameplay Architecture and Netcode, GDC 2017
  • 14. Original vs ECS  Original Model  ECS Model 14 Entity Animation Component Physics Component Transform Component ... ...Entity Animation Component Physics Component Transform Component ... Entity Animation Component Physics Component Transform Component ... Entity Animation Component Physics Component Transform Component ... ... Animation Component Animation Component Animation Component Physics Component Physics Component Physics Component Transform Component Transform Component Transform Component Animation System Physics System Transform System Data organized by entity Data Heterogeneous Memory Jumping Cache Miss Data organized by system Data Homogeneous Memory Contiguous Cache Friendly
  • 15. Fixed Multi-thread (Legacy) Render Simulation Logic Visibility GBuffer Shadow Lighting Forward Transpar ent Postproc ess UI LOD Animatio n Physics Particle Lua AI Motor ... ... Network  Fixed Multi-thread  Render  Simulation  Logic 15
  • 16. Thread Fork/Join (Intermediate)  Fixed Multi-thread  Thread Fork/Join  Thread Pool  Fork/Join from fixed thread 16 Render Simulation Logic Visibility GBuffer Shadow Lighting Forward Transpar ent Postproc ess UI LOD Ani mati on Lua AI Network Work Thread Work Thread Work Thread AI Task AI Task AI Task Animation Task Animation Task Animation Task Animatio n AI Physics Particle ... Motor ...
  • 17. Job Based (Final)  Fixed Multi-thread  Thread fork/join  Job Based  Render Backend  Job System  Network… 17 Global Job Queue Job Job Job Work Thread Job Queue Job Job Work Thread Job Queue Job Waiting Job Job JobJob...Work Thread Job Queue Job Job JobJob Network Render Backend Engine Architecture Job System
  • 18. Job System  Fiber based implementation*  What is fiber  A lightweight execution context(include a user provided stack, registers…)  Fiber execution is collaborative, means a fiber can switch to another interactively  Pros  Easy to implement task schedule  Easy to handle task dependency  Job stack is isolated  Avoid frequency context switch  Cons  C++ does not natively support fiber  Implementation is different between OS  Has some restrictions(thread_local invalid)  Fiber Implement  Boost context: Cross-platform, Industry proven, Fast 18 *[Christian15] Parallelizing the Naughty Dog engine using fibers, GDC 2015
  • 19. Job Scheduler  Thread Independent Job Queue  Each work thread has its own job queue  The job generated from the thread will be added to the queue  Separate Global Job Queue  Job submit outside job system (frame begin, some middleware …)  LIFO Mode  In most case, job dependency is tree like  Some system add jobs occasionally but wait them immediately  Job Stealing  Worker thread load balance 19
  • 20. Global Job 20 Global Job Queue Job Job Job Work Thread Job Queue Job Job Work Thread Job Queue Job Waiting Job Job JobJob...Work Thread Job Queue Job Job JobJob Job Outside threads add global jobs Work thread gets global job from global queue
  • 21. Job Dependency 21 Global Job Queue Job Job Job Work Thread Job Queue Job Job Work Thread Job Queue Job Waiting Job Job JobJob...Work Thread Job Queue Job Job JobJobJob Job runing waitingruning ready dependency new added jobs Run First
  • 22. Job Stealing 22 Global Job Queue Job Job Job Waiting Job Job JobJob ... Work Thread Job Queue Job Job Work Thread Job Queue Work Thread Job Queue Job Job Queue empty
  • 23. On-The-Fly Change Step  Change to ECS Model  Entity level update to component level update  Gather same component to system, system level update  Parallelization each system  Keep system tick order  Split jobs in self system and wait jobs to finish before system end  Modify system dependency  Clarify system dependency  Launch independent systems at the same time  Wait system jobs in the system really dependent on them 23
  • 24. System From Single-Thread To Multi-Thread  Lock  Always the first change step  Behaves well when there are few conflicts  Backup of lock-free version  Batch and Swap  Useful for polling system  Lock-Free  Use the simplest lock-free data structure 24
  • 26. Physics System  Physics System build on PhysX/Apex Library  Features  Rigidbody  Cloth  Destruction  Ragdoll 26
  • 27. Jobify PhysX Knowhow  PhysX Library support task  Only need to implement the PxCpuDispatcher  Code is easy to be integrated  Details need consider  PhysX occasionally submits tasks and then immediately waits for them to complete, so suggest using the LIFO mode  PhysX has synchronization stage  PxScene::flushQueryUpdates 27 Trigger sync stage Reduce shapes usage!!!
  • 28. Animation Works  Animation Tree Update  Each Animation Tree updates independently  Trigger Effect/Particle/Sound…  Skeleton Transform Calculation 28 Simply split jobs by actor count!
  • 29. Difficulties  Related with many other systems  Not thread-safe ready  Difficult to balance job load  Cost has huge difference between actors 29 bad job
  • 30. MPSC Queue 30 op op op op op Animation Worker Animation Worker Animation Worker Animation Worker Pre Fetch OP Post Animation system Related system
  • 31. Load Balance  Cover other than really balance  Split job by experience  Launch independent systems earlier  Wait animation results in another dependency system 31 cover job
  • 32. Script  Script Usage  Lua as script  Lua call engine c++ functions  Script jobify  Lua is not native multi-thread  Make heavy calculation in C++  Gather calculations together  Parallel only c++ codes  Script logic can tick with fixed time(like 100ms) 32
  • 33. Jobify Particle System  Particle System Module  Experience Job Split Rules  By particle classify  By particle simulation phases  Problems & Solutions  Particle job conflicts  Particle job workload balance 33
  • 34. Particle System Module  Particle Emitters  Particle spawn and delete  Particle Renders  Billboard/Trail/Mesh/Beam …  Particle Affectors  Color over Life, gravity, motion …  Use global particle pool to control particle budget 34 Particle System Emitter Affector Affector Render … Emitter Affector Affector Render … …
  • 35. Job Split Rule 1 - Particle Classify  Entity-Relative  Animation result dependent  Animation trail, etc  Non-Entity-Relative  Smoke, explosion, weather, etc 35
  • 36. Job Split Rule 2 – Particle Phases  Spawn jobs  Particle emit and delete  Update jobs  Particle property refresh  Render Prepare jobs  GPU friendly data  Problems:  Conflicts in global pool  Simply splitting job by particle system count causes bad workload balance 36 Particle System Emitter Affector Affector Render … Emitter Affector Affector Render … …Spawn Update Render Prepare Particle System …
  • 37. Solve Particle Job Conflict  Conflict Case 1  Particle Spawn  Allocate particle block from pool with Atomic  Allocate block is just AtomicAdd  New particle from block  Particle Dead  Simple swap with the last particle in block  When block is empty, free whole block back to pool  Conflict Case 2  Particle render transfer into one big vertex buffer  Use AtomicAdd to get write position in linear pool 37 PoolBlock Block Block Pointer Block Particle count (atomic) Particle Particle Particle Particle
  • 39. Split by Emitter  Some particle jobs are too heavy  Weather particle  Massive ammo animation trails  Split by particle emitters 39
  • 40. Render Thread  Legacy Single Thread Render  D3D11  Deferred shading pipeline  Visibility & render on main thread 40 Visibility Scaleform UI GBuffer Cascade Shadow Deferred Shading Forward Transparent PostProcess Present
  • 41. Multi-threading Render  Render Backend Thread  Flush command list on intermedia context  Render Job Context  Build D3D11 command list use deferred context  Split per scene  6 render jobs  Shadow  GBuffer  Terrain Relative  Static Object Relative  Dynamic Object Relative  Translucent  Forward 41
  • 42. Render Multi-Thread WorkFlow 42 Time Render Thread Intermedia Context Work Job Deferred Context Work Job Deferred Context Work Job Deferred Context Work Job Deferred Context Work Job Deferred Context Work Job Deferred Context Scaleform UI Eye Visibility Shadow Visibility Gbuffer Terrain Gbuffer Static Gbuffer Dynamic Forward Transparent Cascade Shadow GBuffer Command Shadow Command Deferred Shading CLWait PostProcessCL
  • 43. Performance Comparison - Before 43 > 50ms
  • 44. Performance Comparison - After 44 much butter ~19ms
  • 45. CPU Scaling 45 0 0.5 1 1.5 2 2.5 3 3.5 2 cores 4 cores 6 cores 8 cores > 8 cores Render needs to better jobify
  • 46. Extra Optimization  Intel Masked Occlusion Culling Library *  CPU Software Occlusion Culling  Easy to be integrated  Reduce draw call 46 *Masked Occlusion Culling, https://github.com/GameTechDev/MaskedOcclusionCulling
  • 47. Masked Software Occlusion Culling Result  Performance (4 cores) 47 Level Rasterize & Visibility MOC off MOC on Speedup Main City 2.7ms 25 fps 30 fps 1.2x Siege Battlefield 3.1ms 23.2 fps 29 fps 1.25x
  • 48. Enriching Visual Effects for More Cores  Clothing  Physics destruction  Particles  Ragdoll  Animation 48
  • 49. Tips & Tricks  Optimize the code itself first rather than parallelize  Lock is your friend in the first step  Pending and swap  Data-oriented is both optimization friendly and debug friendly  Simple structure means easier to parallelize and debug 49
  • 50. Future Work  Further data-oriented design  More clearly identified system dependencies  Chunk-based multi-thread rendering  Job based lock (no more mutex, lock…) 50
  • 52. Legal Disclaimer & Optimization Notice Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 52 INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

Hinweis der Redaktion

  1. Title: Parallelizing Conqueror’s Blade: Making the Most of Intel Core for the Best Gaming Experience Session Description: Giving your players the best experience possible on all levels of hardware is the ultimate goal. However, with the quickly increasing number of cores built-in to modern mainstream CPUs, challenges inherent in developing gaming engines leaves many potentially available cores sitting idle on the sideline. In this talk, we'd like to share our experience and lessons in building our multicore scalable game engine of Conquer's Blade, an AAA game of ancient warfare from Netease/BoomingGames. We'll detail how we multithread the game engine, especially the rendering system which is typically the No.1 CPU bottleneck in modern games, to squeeze out performance scalability. And with the resulted performance headroom, how to implement the perceptible visual differentiation for maximizing the gaming experience on different CPU platforms was introduced as well.
  2. User Experience = performance + effects (visual/audio)
  3. OK, now developers from BoomingGames will share Experience and lessons from engineering practice.
  4. Hello everyone, I’m Nan Mi, engineer lead of Boominggames. We will first introduce our game background. Then will show our engine architecture evolution. Then will go to detail about how we use job system to build scalable game engine, The case study parallelizaton subsystem in engine. We will show scaled gaming experience. And then show the tips, tricks we learnd from the practice and future work.
  5. Conqueror’s Blade is an PC online-game, now in beta-test and will coming soon. Player controls both hero and a legion to battle in the world. control hero is like an action gameplay, and meanwhile control legion is somekind tactic gameplay. The battlefield mix cold and hot weapons, empowered war machines to show a immersive battlefield.
  6. Let’s see game trailer to feeling the war.
  7. OK, our game is logic heavy, Its include huge amount of individual soldiers with independent AI and animation and states. It’s a dynamic battleground with rich battlefield elements like explode, destruction, legion melee and so on. Our legacy architecture have some problems, Its difficult to scale to more cores, and cpu bound, so we need a more multicore scalable engine. But this architecture is easy to understand
  8. Our Goals is to Support more than 1K actors with individual AI and states Dynamic battlefield with destruction and Engine need Easy to scale and Multi-thread debug friendly Challenges, Game is still in developing & test On-the-fly smooth upgrade engine Time-limited (~2.5 months), So our technique choice is Entity-Component-System model and job system. My colleague Lei Su will introduce the implement detail of the ECS and job system.
  9. Hello guys, I’m sulei, senior engineer of boominggames Ok, let’s talk about entity-component-system model. It’s a data organization architecture, and is similar with [OverWatch] In ecs model, data is everything. Entity is just a ID. Component holds only datas. And the system contains the same kind of compoent and its methods You can think as we change our engine interfaces from c++ style to c style, and change the design pattern from object oriented pattern to data oriented pattern Why we do these changes? Ok, we think the ECS model has at least 3 advantages. Which we called parallelization friendly ,cache friendly and memory management friendly Let’s make a intuitive comparison between the original model with ecs model
  10. In original model, An entity holds all its component data, and each component has its interfaces. Data is organized by entity, so in an entity, data is heterogeneous So if we update an entity’s all components first, which means update from the pictures left to right, the memory is contiguous but the methods is different If we update all entity’s same component first, in the picture top to bottom, the memory will jumping We can see all the 2 methods is not parallel friendly, and the second one will also cause cache miss In ecs model, we update the systems one by one. Then the memory is contiguous, and the update method is same. Obviously, this is parallel friendly and cache friendly So, we choose the entity-component-system architecture to organize our data. Next we will talk about our multi-thread architecture evolution.
  11. As you see in the picture Our original multi-thread mode is quite easy to understand. We have 3 fixed threads, One for render, one for simulation and one for logic. The Network and IO thread will always be there, we will not talk about it. So as our game needs more and more excellent experience, the architecture hit its bottleneck, its hard to scale to more cores. Then we change to fork/join mode.
  12. We still have 3 heavy threads, but each thread can fork some thread to parallel do one kind of works, and then back to original thread to continue. This is very similar with single thread execute sequence. We gain some boost on this architecture, but we abandon it quickly. Why? Ok, before I say the reason, I would like to share some of my little understand of system design first. I think when we design a system, we can not only consider the system self’s efficiency, but also we should take the system’s user efficiency into account. Means when we design a system for designer, we need to consider can the designer use it to quickly make much different game experience? When we design a system for artist, we need to consider how to really free the artist’s inspiration. So back to the multi-thread architecture, its user is programmer. We should consider the programmer’s efficiency. When we use this architecture, programmers would have to consider thread fork and join, and the worker thread count may influence task split, etc. All theses are not friendly Finally, we choose the job based architecture, both system self efficient and programmer efficient
  13. In this architecture, the engine has a render backend thread, a network and IO thread, and the rest is the job system. The job system will use a thread pool to run the jobs. So this mode is really suitable for multi-core architecture, and naturelly scalable when cpu core count increases. And it’s programmer friendly. The programmers no more need to consider worker thread count, they can split jobs with nearly zero consideration, and the jobs’ dependency is much more easy and free to express And in theory, this architecture is more efficient than the previous one. Let’s look inside the job system.
  14. We use the fiber based job system implementation, it’s the same as naughty dog. Ok, let’s see what is fiber. In my opinion, fiber has two key features. One is it’s a lightweight execution context include a user provided stack and registers and so on. And the other powerful magic fiber has is that the fiber execution is collaborative, means a fiber can switch to another interactively, and in theory the switch is fast. This makes fiber is a wonderful choice to implement job system. Easily switched in and out means task schedule is easy to implement, and the task dependency is easy to build. User provided stack mean each fiber can have individual stack, so the job runs in the fiber’s stack is isolated Manually control fiber switch, means we can easily solve task chaining effect. So to avoid context switch. Task chaining effect is that A dependent on B and C, when A is wait, he can choose to run D, but D launched E and F, So when B, C finished. In theory, we can run A. But A has been buried in the call chain. So it need to wait D to finish or suffer a context switch. Ok, the fiber is beauty, but it also have some problems. It’s not C++ language level navity supported, and even in os level, its implementation is different. And if we used fiber, the job codes runs in the fiber must obey some restrictions, like can not use thread_local To solve the problems, we choose the boost context to implement our fiber. Boost context is cross-platform, industry proven and fast And we write the job codes restirctions to our coding standards. Ok, this is the fundamental of our job system. Next we will talk about the core of the job system, the job scheduler.
  15. When we design our job scheduler, we considered the game engine’s peculiarity. We use 2 types of queues, One is thread independent queue, each worker thread has its own job queue, and the job generated from the thread will be added to the queue. This will deduce the job taken conflicts And we also have a separate global job queue for the threads outside of the job system to submit job to run in the job system. For example, at the frame begin we will and a initial update job to the job system by render backend. As now, we didn’t task over all the 3rd party middlewares multi-thread system, so the jobs from these threads will be added to the global job queue. Global job queue is used for job submit outside the job system, like at the frame begin or some middlewares we didn’t task over their multi-threading system Maybe in the future when all middlewares multi-threading system is under our control, then we treat the whole engine update as one big initial job, we can remove the Global Queue. Our engine generate jobs is tree liked and some systems add jobs occasionally but wait them to complete immediately. Consider these, we choose the stack like last in first out schedule mode. When we put jobs to job system, we can not ensure the jobs is split fairly. So there will be some worker thread finished its all jobs, but another worker thread may have many jobs to run. To balance the work load between worker thread, the fast worker thread can steal job from another. Words description is abstract, let’s take a visualized look of the scheduler!
  16. Ok, the global job is generated from the outside threads, and the worker thread will get the global job from the global queue
  17. Ok, also this picture, here a job is running, then it generated 2 new jobs, and choose to wait the jobs complete. So the job has dependency on the newly generated 2 jobs, and its state changed from running to waiting. Been switched out to the waiting queue. Because of the stack like LIFO mode, the new added jobs run first. When both of the 2 jobs finished, the parent job become ready. Then the scheduler will switch in the parent job to contine. This is the thumbnail of the job schedule. Ok, next job stealing.
  18. It’s quite simple, as you can see there is a worker thread finish all its jobs. So it would steal a job free the tail of another worker thread. Then it has job to do now. Ok, we introduced 2 powerful weapons(the entity-component-system model and the job based multi-threading) to optimize our engine. Let’s use them step by step.
  19. First, we decided to change our engines data organization. It’s the base of parallelization. We change update order from by entity to by component. This will change the system’s behavior. But never mind, we fix it first. At the mean while, the performance is a little loss, because this update method cause cache miss. But it’s not a big problem. We will soon get it back. We the update method is stable, we gather the same component to system, and update by system order. This is just change the place to locate the component data essentially, so it’s relative simple and bug less. From now on, our change to ECS model is finished, we can start to parallel each systems now. When parallel the systems
  20. Game engine’s multi systems has function update or tick, so batch and swap is widely used in engine Complex lock-free data structure is difficult to debug
  21. This is the performance result we start to profimize. U can see, there only 3 heavy thread works, a lot of empty hone in other thread.
  22. 标题标题!按什么标准来分job?
  23. PhysX recursion generate tasks, so it suggest LIFO mode do deal with its jobs. Optimize (the author has started to optimize this function) Reduce Shapes Usage Each soldier shape from 60 to 3 Original Each solider state use one shape to present Optimize Each solider max has 3 shapes Each pose move the shape
  24. Now simply split jobs by actor count, in the future we can split jobs by animation calculation types
  25. Mpsc queue to solve thread safe problem Cost difference solve by cover not really solve
  26. MPSC: Multi producer single consumer
  27. No deeper technology
  28. All above system’s jobify is relatively simple. Next I will return the talk back to minan, He will give us some more complex job split cases.
  29. Ok, lets go to jobify particle system. First will introduce our particle sytem module. We rely on two experience rules to split particle jobs, on by particle claasify, and another rule rely on particle simulation phases. Then show problems we meet abount job conflicts and workload balance problem and our solution.
  30. One particle system have 3 modules. Particle emitters control the spawn and delete dead particle. The render module controls how we render the particle, use billboard or trail or mesh or beam. Each emitter may have several affecters, Each affecter control how to modify the particle data while its lifetime, as color over lift, gravity affector, and motion and so on. And we use a global particle pool to control particle system budget, it means in the initialize time , we now the limit of the particle
  31. So our First rule is split particle by classify: The particle system can naturelly split into two types. Entity relative, dependent on animation result such as animation trail or some character skills None entity relative, such as smoke or explode or bomb in the scene, they are self-explain and not reply on any actor in the scene. This split gives us a choice to submit none-entity-relative jobs in the very beginning of the frame, while entity-relative jobs need wait animation system finished. This helps balance the job workload. 先说按什么标准来分job?再说如何submission。从听众关注的角度来表达观点。
  32. Inside Each Particle System, Its simple split whole particle simulation into 3 phases. First one is spawn jobj, we parallel all emitters together to spawn and delete dead particles. Then update jobs like color affector or size affector will refresh and update the particle property。 The third phase we prepare particle for render, build gpu friendly data, such as vertex buffer, material info and drallcall ready data. Each phase will wait the last phase finished. But this cause two problems: First one is conflict in global pool. As we use one big particle pool to control the budget. The spawn jobs need paralle get and delete particles from the pool. And render prepare have same problem to paralle write particle result into one big vertex buffer pool. Another problem is that some particle system job may run much longer than others, this cause bad workload balance.
  33. For particle job conflict problem, its easy to deal with a simple lock-free version. We Allocate particle from global pool block by block and use atomic number to avoid multi thread problem. One block is 64 particles, which size it good for one cache line. And their a atomic number holds the total particle number use in the block. Its very like linear allocator. Spawn one particle is just Atomic Add the particle count, and allocate in the block. If one block is full, the particle system will allocate a new block from the global pool. And particle dead is reverse way, just swap with the last one in the block. And atomic Decrease the number. When whole block is empty or particle system is removed, free whole block back to pool. The prepare phase conflict deals in same way. Each job use AtomicAdd to get a write position in the whole vertex buffer, then all prepare jobs can parallel write into same big pool
  34. Another problem with particle jobs is that some update jobs may much heavier than others. Your can see in the picture, bad heavy job block the whole job systems. While the prepare phase need wait all particle update finished.
  35. These jobs maybe heavy weather particles or massive ammos animation trial. As we batch one kind ammo’s particle into one big particle system. So these single particle system need split jobs deeper by particle emitters. One job for each particle system emitter.
  36. The last case of subsystem is our render thread. Our legacy single thread render build on D3D11, use traditional deferred shading pipeline. As you can see ,we need deal visibility and whole render pipeline in main thread.
  37. Our new multi-threading render have too parts. One is Render Backend Tread you see in the previous section. This thread is not work on our job system, it flush build command list on intermedia context. Other part is render job context. Each render job context will build d3d11 command list use deferred context. We simple split the jobs by scenes, as you can see. Their will max have 6 jobs, shadow, 3 for gbuffer, one for translucent, one for forward. This split strategy is very simple to implement and this is our time-limited choice.
  38. In the very beginning of the frame, the render backend thread will deal with scaleform ui. UI will not build command list, instead sumit direcly on intermedia context. This is because we want to make gpu happy, and send work to gpu as soon as possible. In the same time, two jobs sumit to work job, one for eye visibility and one for shadow visibility. After visibility job finished, it will emit more jobs as gbuffer relative and cascade shadow part. When work job finish build command list, it will go back to render backend thread to submit. As some command list have order depency, the deferred shading work need wait both gbuffer and shadow command list sumit finished.
  39. For our stresstest senarion, This our early performance result on intel high end pc, more than 8 cores. We can see a lot of holes in the result, the cpu usage is quite low. One frame cast more than 50 milliseconds.
  40. After parallel evolution. The result becomes much better. The total time cast drop to about 19ms. But you can still see some holes in the pictures. The holes shows the dependency between different system. For example, the physics system need wait all animation result finished. While actually only those ragdoll results need animation. And render thread still emit some long jobs. Our next step is to make more clear system dependency, and make the job wait in the first time the other system use the result.
  41. Another picture show the performance on diferrent cores. You can see system is now scalable much well from 2 cores to 6 cores, But performance improve little over 6 cores. This main reason is that over 6cores the whole system is bound by render jobs. As we only use per scene build command list in render, so in theory we have only 6 jobs for render. Need better jobify render in the future
  42. Over job system, another optimization weapon is intel masked occlusion culling library. It’s a high performance software occlusion culling library and easy to integration. Helps a lot for reduce draw call. we replace our original occlusion culling implement by it.
  43. For Common case, both in our main city and battlefield, we get have 20% performance improve For Some extreme situation, like behind wall, you may double performance
  44. Parallization on multi-core give us performance improve, so we can enriching visual effects for high end pc. Like more clothing, destruction, more particles , ragdoll effect.