Weitere Ă€hnliche Inhalte Ăhnlich wie Beyond the GFLOPS (20) KĂŒrzlich hochgeladen (20) Beyond the GFLOPS1. G O SG O SBeyond the GFLOPSBeyond the GFLOPS
Dominic Mallinson
Vice President, US R & D
Dominic Mallinson
Vice President, US R & D
Sony Computer Entertainment Inc.Sony Computer Entertainment Inc.
2. âWh t t li b?âWhy not go out on a limb?
Thatâs where the fruit is.âThat s where the fruit is
(Will Rogers, cowboy, actor, philanthropist)
© 2007 SCE
3. Th C ll B db d E iTh C ll B db d E iThe Cell Broadband Engine
(Cell/B E ) Processor
The Cell Broadband Engine
(Cell/B E ) Processor(Cell/B.E.) Processor(Cell/B.E.) Processor
© 2007 SCE
4. The Cell/B.E. ProcessorThe Cell/B.E. Processor
Leading the industry in heterogeneous multi-core
200+ GFLOPS high performance computing
Leading the industry in heterogeneous multi-core
200+ GFLOPS high performance computing200+ GFLOPS high performance computing
But what lies beyond the GFLOPS statistics ?
200+ GFLOPS high performance computing
But what lies beyond the GFLOPS statistics ?
Why does an application need Cell/B.E.âs power ?
How can we make Cell/B.E.âs performance more accessible ?
Why does an application need Cell/B.E.âs power ?
How can we make Cell/B.E.âs performance more accessible ?
What part do you and the Cell/B.E.âs software community play ?What part do you and the Cell/B.E.âs software community play ?
© 2007 SCE
5. Why does SCE need
C ll/B E f ?
Why does SCE need
C ll/B E f ?Cell/B.E. performance ?Cell/B.E. performance ?
© 2007 SCE
6. Games and Virtual WorldGames and Virtual World
GBytes of data streaming through the CPU in real-time
100 f i ti 3D h t
GBytes of data streaming through the CPU in real-time
100 f i ti 3D h t100s of animating 3D characters on screen
True HD 3D Graphics with millions of vertices visible
100s of animating 3D characters on screen
True HD 3D Graphics with millions of vertices visible
Complex Artificial Intelligence techniques
Physical Simulation, cloth, fluids, soft and rigid bodies
Complex Artificial Intelligence techniques
Physical Simulation, cloth, fluids, soft and rigid bodies
Real-time spatial audio processing and encode
Millions of simultaneous users
Real-time spatial audio processing and encode
Millions of simultaneous users
© 2007 SCE
Potential for client and server to use Cell/B.E. processorPotential for client and server to use Cell/B.E. processor
8. Media ProcessingMedia Processinggg
Blu-ray movie playback
1080p video decode in AVC VC1 or MPEG2
Blu-ray movie playback
1080p video decode in AVC VC1 or MPEG21080p video decode in AVC, VC1 or MPEG2
Simultaneous 480p âpicture in pictureâ decode
7.1 multi-channel audio decode and mixing
1080p video decode in AVC, VC1 or MPEG2
Simultaneous 480p âpicture in pictureâ decode
7.1 multi-channel audio decode and mixing7.1 multi channel audio decode and mixing
⊠and a Java⹠VM
Remote Play function of PLAYSTATIONÂź3 (PS3âą)
7.1 multi channel audio decode and mixing
⊠and a Java⹠VM
Remote Play function of PLAYSTATIONÂź3 (PS3âą)y ( )
Realtime AV encoding and streaming to a PlayStationÂźPortable
Multi-person AV Chat
y ( )
Realtime AV encoding and streaming to a PlayStationÂźPortable
Multi-person AV Chat
© 2007 SCE
1 encode plus up to 5 decodes, AEC noise reduction1 encode plus up to 5 decodes, AEC noise reduction
9. Folding@homeTM
on PS3Folding@homeTM
on PS3g@g@
A distributed computing project from Stanford University
R h i t t i i f ldi t h l d t d d fi d
A distributed computing project from Stanford University
R h i t t i i f ldi t h l d t d d fi dResearch into protein misfolding to help understand and find
treatments for diseases such as Alzheimerâs and cancer.
PS3 Client launched in March 2007
Research into protein misfolding to help understand and find
treatments for diseases such as Alzheimerâs and cancer.
PS3 Client launched in March 2007PS3 Client launched in March 2007
Over 250,000 unique PS3 users in the first month
488 TFLOPS (Stanford metrics from June 14th 2007)
PS3 Client launched in March 2007
Over 250,000 unique PS3 users in the first month
488 TFLOPS (Stanford metrics from June 14th 2007)( )
26,961 Active Cell/B.E. CPUs
More than doubled previous PC/GPU contributions
( )
26,961 Active Cell/B.E. CPUs
More than doubled previous PC/GPU contributions
© 2007 SCE
DEMODEMO
11. Accessing the power of Cell/B.E.Accessing the power of Cell/B.E.g pg p
The Cell/B.E. is designed for performanceThe Cell/B.E. is designed for performance
Maximum performance requires complex software
The upper quartile of engineers already achieve it
Maximum performance requires complex software
The upper quartile of engineers already achieve it
The lower quartile currently cannot
Research and Industry must bridge this gap
The lower quartile currently cannot
Research and Industry must bridge this gapy g g p
Many programming models are emerging
How does SCE tackle this problem ?
y g g p
Many programming models are emerging
How does SCE tackle this problem ?
© 2007 SCE
How does SCE tackle this problem ?How does SCE tackle this problem ?
12. SCEâs SPURS EnvironmentSCEâs SPURS Environment
A flexible, cooperative SPE management layerA flexible, cooperative SPE management layer
SPE-centric scheduling (minimal PPU overhead)
Low or zero context switch overhead
SPE-centric scheduling (minimal PPU overhead)
Low or zero context switch overhead
Application control for scheduling priorities
Supports sharing SPE with 3rd party middleware
Application control for scheduling priorities
Supports sharing SPE with 3rd party middlewareSupports sharing SPE with 3rd party middleware
Built on top of OS SPE Threads
Supports sharing SPE with 3rd party middleware
Built on top of OS SPE Threads
© 2007 SCE
Policy manager allows multiple modelsPolicy manager allows multiple models
13. Duck Demo SPE UsageDuck Demo SPE Usagegg
Old Code â no machine vision â 6 SPEsOld Code â no machine vision â 6 SPEs Old Code - machine vision â 8 SPEsOld Code - machine vision â 8 SPEs
SPE0 â Surface water physics
SPE1 â Splash physics
SPE2 â Boat 1 physics
SPE3 Boat 2 physics
SPE0 â Surface water physics
SPE1 â Splash physics
SPE2 â Boat 1 physics
SPE3 Boat 2 physics
SPE0-SPE5 UNCHANGED
Added machine vision, particle water
SPE0-SPE5 UNCHANGED
Added machine vision, particle water
© 2007 SCE
SPE3 â Boat 2 physics
SPE4 â Collision physics
SPE5 â Graphics
SPE3 â Boat 2 physics
SPE4 â Collision physics
SPE5 â Graphics
SPE6 â Particle water physics
SPE7 â Machine vision
SPE6 â Particle water physics
SPE7 â Machine vision
14. Goal: Everything on 6 SPEsGoal: Everything on 6 SPEsy gy g
Refactor with SPURSRefactor with SPURSNaĂŻve use of SPURSNaĂŻve use of SPURS
Refactor machine vision
Refactor particle water
Use SPURS to share SPEs
Refactor machine vision
Refactor particle water
Use SPURS to share SPEs
Just try to move work around
Water + Boat 2 is over time
Graphics + Machine vision
Just try to move work around
Water + Boat 2 is over time
Graphics + Machine vision
© 2007 SCE
Use SPURS to share SPEs
Room to âbreathâ
Use SPURS to share SPEs
Room to âbreathâ
Graphics + Machine vision
Fits but no room to flex
Graphics + Machine vision
Fits but no room to flex
15. SCEâs SPURS EnvironmentSCEâs SPURS Environment
The âTasksâ policy module
Si il t th d b t ti h d li
The âTasksâ policy module
Si il t th d b t ti h d liSimilar to threads but cooperative scheduling
SPEâs pull tasks from a shared memory pool
Good for mid to high complexity programs
Similar to threads but cooperative scheduling
SPEâs pull tasks from a shared memory pool
Good for mid to high complexity programsGood for mid to high complexity programs
The âJobsâ policy module
Stateless execution kernels (specify all input/output)
Good for mid to high complexity programs
The âJobsâ policy module
Stateless execution kernels (specify all input/output)Stateless execution kernels (specify all input/output)
SPEâs pull from a shared queue of jobs
Good for low to mid complexity programs
Stateless execution kernels (specify all input/output)
SPEâs pull from a shared queue of jobs
Good for low to mid complexity programs
© 2007 SCE
Good for low to mid complexity programs
Ideal for stream processing
Good for low to mid complexity programs
Ideal for stream processing
16. Job StreamingJob Streaming
PPE thread
gg
Divide a program and data into pieces (called Jobs)
Define dependencies between groups of jobs
Divide a program and data into pieces (called Jobs)
Define dependencies between groups of jobs
J b Li t
p g p j
Build Job Lists
SPEs grab Jobs and execute them in parallel
p g p j
Build Job Lists
SPEs grab Jobs and execute them in parallel
Job
Job
Job
Job
Job
Job
Job
Job
Job List
Job
Program
and
Data
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
Job
© 2007 SCE
Job
Job
Job
Job
Job
PPE thread
17. Job Streaming PipelineJob Streaming Pipelineg pg p
RAM RAMRAM RAM RAM
SPU
Execute
Code*,
Parameters SPE
JD Address Execute
Input
Data
Output
Data
Parameters,
I/O addresses,
I/O sizes,
etc.
CODEJD Address
© 2007 SCE
âprefetchââprefetchâ âinputââinputâ âexecuteââexecuteâ âoutputââoutputâ
18. Multi-BufferingMulti-Bufferinggg
Job stages are interleaved so that DMA memory transfers will be
in progress during job execution
Job stages are interleaved so that DMA memory transfers will be
in progress during job execution
Each color represents a different job.Each color represents a different job.
in progress during job execution.in progress during job execution.
prefetch prefetch prefetch prefetch prefetch
I t I t I t I t I tInput Input Input Input Input
Exec Exec Exec Exec Exec
Output Output Output Output Output
TIMETIME
P P S P S E P S E F P S E F S E F E F F
P t ti ll th i t lli f t f !P t ti ll th i t lli f t f !
© 2007 SCE
Potentially, there is no stalling for memory transfers!Potentially, there is no stalling for memory transfers!
19. SCEâs SPURS EnvironmentSCEâs SPURS Environment
SPURS solves part of the problem
All ff ti h i f th SPE
SPURS solves part of the problem
All ff ti h i f th SPEAllows effective sharing of the SPE resources
Simplifies the programming and synchronization
B t it till d ât b id th
Allows effective sharing of the SPE resources
Simplifies the programming and synchronization
B t it till d ât b id thBut it still doesnât bridge the gap
We need higher level models which provideâŠ
f S
But it still doesnât bridge the gap
We need higher level models which provideâŠ
f SAutomatic DMA for large code and data on SPE
Parallel programming abstractions
S l bl h i ti th d
Automatic DMA for large code and data on SPE
Parallel programming abstractions
S l bl h i ti th d
© 2007 SCE
Scalable synchronization methods
Full debug and performance analysis
Scalable synchronization methods
Full debug and performance analysis
20. The Cell/B E Software CommunityThe Cell/B E Software CommunityThe Cell/B.E. Software CommunityThe Cell/B.E. Software Community
© 2007 SCE
21. The Importance of the CoCThe Importance of the CoCpp
The Center of Competence is a focal point
T b i t th h d i d t
The Center of Competence is a focal point
T b i t th h d i d tTo bring together researchers and industry
To help develop optimized âstandardâ libraries for Cell/B.E
Research new programming languages/models
To bring together researchers and industry
To help develop optimized âstandardâ libraries for Cell/B.E
Research new programming languages/modelsResearch new programming languages/models
Research new compiler techniques
General multi-core / parallel programming research
Research new programming languages/models
Research new compiler techniques
General multi-core / parallel programming research
Dealing with distributed memory hierarchies
Research scalability of synchronization methods
De elop tools that can help is ali e parallel soft are
Dealing with distributed memory hierarchies
Research scalability of synchronization methods
De elop tools that can help is ali e parallel soft are
© 2007 SCE
Develop tools that can help visualize parallel softwareDevelop tools that can help visualize parallel software
22. Industry SupportIndustry Supporty ppy pp
Terra Soft Solutions â Yellow Dog Linux for PS3Terra Soft Solutions â Yellow Dog Linux for PS3
Mercury Systems
RapidMind
Mercury Systems
RapidMindp
Cmpware, Inc.
Reservoir Labs
p
Cmpware, Inc.
Reservoir LabsReservoir Labs
Gedae
Reservoir Labs
Gedae
© 2007 SCE
allineaallinea
24. Concluding ThoughtsConcluding Thoughtsg gg g
The Cell/B.E. has amazing performanceThe Cell/B.E. has amazing performance
Its available now in consumer and HPC marketsIts available now in consumer and HPC markets
We need more software targeting Cell/B.E.
We need Cell/B E âs power to be more accessible
We need more software targeting Cell/B.E.
We need Cell/B E âs power to be more accessibleWe need Cell/B.E. s power to be more accessible
We need more research into Cell/B.E. and multi-core
We need Cell/B.E. s power to be more accessible
We need more research into Cell/B.E. and multi-core
© 2007 SCE
25. We need YOU to help us goWe need YOU to help us goWe need YOU to help us go..We need YOU to help us go..
Beyond the GFLOPSBeyond the GFLOPS
© 2007 SCE