SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Heterogeneous System
Architecture Overview
M.Naveen Kumar
13241D5710
M.Tech_vlsi
Opencl platform model
Execution model
Memory model
HSA Foundation
• Founded in June 2012
• Developing a new platform for
heterogeneous systems
• www.hsafoundation.com
• Specifications under development in
working groups
• Our first specification, HSA
Programmers Reference Manual is
already published and available on
our web site
• Additional specifications for System
Architecture, Runtime Software and
Tools are in process
33
HSA Foundation Membership -August 2013
34
Founders
Promoters
Supporters
Contributors
Academic
Associates
HSA — An Open Platform
• Open Architecture, membership open to all
– HSA Programmers Reference Manual
– HSA System Architecture
– HSA Runtime
• Delivered via royalty free standards
– Royalty Free IP, Specifications and APIs
• ISA agnostic for both CPU and GPU
• Membership from all areas of computing
– Hardware companies
– Operating Systems
– Tools and Middleware
© Copyright 2012 HSA Foundation. All
Rights Reserved.
35
Inflections in Processor Design
© Copyright 2012 HSA Foundation. All
Rights Reserved.
36
?
Single-thread
Performance
Time
we are
here
Enabled by:
 Moore’s Law
 Voltage
Scaling
Constrained by:
Power
Complexity
Single-Core Era
ModernApplication
Performance
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by:
 Abundant data
parallelism
 Power efficient
GPUs
Temporarily
Constrained by:
Programming
models
Comm.overhead
Throughput
Performance
Time (# of processors)
we are
here
Enabled by:
 Moore’s Law
 SMP architecture
Constrained by:
Power
Parallel SW
Scalability
Multi-Core Era
Assembly  C/C++  Java … pthreads  OpenMP / TBB …
Shader  CUDA OpenCL
 C++ and Java
CPU
CPU
APU
APU
With
HSA
Memory
CPUCPU CPU
CPU Memory
CPUCPUCPUCPU
GPU Memory
GPU
Memory
CPUCPUCPUCPU GPU
EXAMPLE WorkloadS
HAAR FACE DETECTION
Cornerstone technology
for ComputerVision
Looking for Faces in all
the Right Places
Quick HD Calculations
Search square = 21 x 21
Pixels = 1920 x 1080 = 2,073,600
Search squares = 1900 x 1060 = ~2 Million
© Copyright 2012 HSA Foundation. All
Rights Reserved.
40
Looking for different size faces — by
scaling the video frame
© Copyright 2012 HSA Foundation. All
Rights Reserved.
41
More HD Calculations
70% scaling in H and V
Total Pixels = 4.07 Million
Search squares = 3.8 Million
HAAR Cascade stages
© Copyright 2012 HSA Foundation. All
Rights Reserved.
42
Feature l
Feature m
Feature p
Feature r
Feature q
Feature k
Stage N
Stage N+1
Face still
possible?Yes
No
REJECT
FRAME
22 cascade stages, early out between
each
© Copyright 2012 HSA Foundation. All
Rights Reserved.
43
STAGE 22STAGE 21STAGE 2STAGE 1
NO FACE
FACE
CONFIRMED
Final HD Calculations
Search squares = 3.8 million
Average features per square = 124
Calculations per feature = 100
Calculations per frame = 47 GCalcs
Calculation Rate
30 frames/sec = 1.4TCalcs/second
60 frames/sec = 2.8TCalcs/second
… and this only gets front-facing faces
Unbalancing due to Early exits
• When running on the GPU, we run each search rectangle on a separate
work item
• Early out algorithms, like HAAR, exhibit divergence between work items
– Some work items exit early
– Their neighbors continue
– SIMD packing suffers as a result
© Copyright 2012 HSA Foundation. All
Rights Reserved.
44
Live
Dead
© Copyright 2012 HSA Foundation. All
Rights Reserved.
46
The HSA Future
• Architected heterogeneous processing on the SOC
• Programming of accelerators becomes much easier
• Accelerated software that runs across multiple hardware vendors
• Scalability from smart phones to super computers on a common architecture
• GPU acceleration of parallel processing is the initial target, with DSPs
and other accelerators coming to the HSA system architecture model
• Heterogeneous software ecosystem evolves at a much faster pace
• Lower power, more capable devices in your hand, on the wall, in the cloud
THANK YOU

Weitere ähnliche Inhalte

Ähnlich wie Mnk hsa ppt

Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Subbu Rama
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLMohammad Sabouri
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...Rogue Wave Software
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...ijceronline
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6Shah Zaib
 
Mulvery Detail - English
Mulvery Detail - EnglishMulvery Detail - English
Mulvery Detail - EnglishDaichi Teruya
 
Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processingJeongpyo Kong
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLPeace Lee
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training ClassDeepak Shankar
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsDeepak Shankar
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemMelissa Luster
 
RCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And LimitsRCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And LimitsMarco Santambrogio
 

Ähnlich wie Mnk hsa ppt (20)

Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRL
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Digi sudoku
Digi sudokuDigi sudoku
Digi sudoku
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Mulvery Detail - English
Mulvery Detail - EnglishMulvery Detail - English
Mulvery Detail - English
 
Report_Altair
Report_AltairReport_Altair
Report_Altair
 
Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processing
 
Guider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGL
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
JavaFX 101
JavaFX 101JavaFX 101
JavaFX 101
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
Latest resume
Latest resumeLatest resume
Latest resume
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
 
RCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And LimitsRCW@DEI - Real Needs And Limits
RCW@DEI - Real Needs And Limits
 

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Mnk hsa ppt

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 33. HSA Foundation • Founded in June 2012 • Developing a new platform for heterogeneous systems • www.hsafoundation.com • Specifications under development in working groups • Our first specification, HSA Programmers Reference Manual is already published and available on our web site • Additional specifications for System Architecture, Runtime Software and Tools are in process 33
  • 34. HSA Foundation Membership -August 2013 34 Founders Promoters Supporters Contributors Academic Associates
  • 35. HSA — An Open Platform • Open Architecture, membership open to all – HSA Programmers Reference Manual – HSA System Architecture – HSA Runtime • Delivered via royalty free standards – Royalty Free IP, Specifications and APIs • ISA agnostic for both CPU and GPU • Membership from all areas of computing – Hardware companies – Operating Systems – Tools and Middleware © Copyright 2012 HSA Foundation. All Rights Reserved. 35
  • 36. Inflections in Processor Design © Copyright 2012 HSA Foundation. All Rights Reserved. 36 ? Single-thread Performance Time we are here Enabled by:  Moore’s Law  Voltage Scaling Constrained by: Power Complexity Single-Core Era ModernApplication Performance Time (Data-parallel exploitation) we are here Heterogeneous Systems Era Enabled by:  Abundant data parallelism  Power efficient GPUs Temporarily Constrained by: Programming models Comm.overhead Throughput Performance Time (# of processors) we are here Enabled by:  Moore’s Law  SMP architecture Constrained by: Power Parallel SW Scalability Multi-Core Era Assembly  C/C++  Java … pthreads  OpenMP / TBB … Shader  CUDA OpenCL  C++ and Java
  • 39. HAAR FACE DETECTION Cornerstone technology for ComputerVision
  • 40. Looking for Faces in all the Right Places Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million © Copyright 2012 HSA Foundation. All Rights Reserved. 40
  • 41. Looking for different size faces — by scaling the video frame © Copyright 2012 HSA Foundation. All Rights Reserved. 41 More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million
  • 42. HAAR Cascade stages © Copyright 2012 HSA Foundation. All Rights Reserved. 42 Feature l Feature m Feature p Feature r Feature q Feature k Stage N Stage N+1 Face still possible?Yes No REJECT FRAME
  • 43. 22 cascade stages, early out between each © Copyright 2012 HSA Foundation. All Rights Reserved. 43 STAGE 22STAGE 21STAGE 2STAGE 1 NO FACE FACE CONFIRMED Final HD Calculations Search squares = 3.8 million Average features per square = 124 Calculations per feature = 100 Calculations per frame = 47 GCalcs Calculation Rate 30 frames/sec = 1.4TCalcs/second 60 frames/sec = 2.8TCalcs/second … and this only gets front-facing faces
  • 44. Unbalancing due to Early exits • When running on the GPU, we run each search rectangle on a separate work item • Early out algorithms, like HAAR, exhibit divergence between work items – Some work items exit early – Their neighbors continue – SIMD packing suffers as a result © Copyright 2012 HSA Foundation. All Rights Reserved. 44 Live Dead
  • 45.
  • 46. © Copyright 2012 HSA Foundation. All Rights Reserved. 46 The HSA Future • Architected heterogeneous processing on the SOC • Programming of accelerators becomes much easier • Accelerated software that runs across multiple hardware vendors • Scalability from smart phones to super computers on a common architecture • GPU acceleration of parallel processing is the initial target, with DSPs and other accelerators coming to the HSA system architecture model • Heterogeneous software ecosystem evolves at a much faster pace • Lower power, more capable devices in your hand, on the wall, in the cloud

Hinweis der Redaktion

  1. We will be open on this. We will reach out to partners and collaborate to bring this to market in the right form
  2. Now lets analyze how it works, in the HAAR algorithm.
  3. Now, just what do we do in each of those search boxes?
  4. Each successive cascade stage searches for more features. - Initial cascade stage is cheap, each successive cascade stage is more expensive Exit from a cascade stage means a face is not present in the rectangle Surviving all the cascades confirms a face is present This is a nested data parallel opportunity: Significant chunks of parallel code in control flow
  5. The GPU runs at its peak efficiency when its SIMD engines are fully loaded with live work items. Lets see how this looks relative to the CPU at each stage.