SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Sensor Integration and Improved
User Experiences at Lower Power
Phil Rogers
President, HSA Foundation
Corporate Fellow, AMD
SENSOR INTEGRATION CHALLENGES

      Sensors on all new devices, going up in
       resolution and data rates

      Extract meaning from video, audio and
       location

      Mix of scalar and parallel workloads

      Process locally or in the cloud

      Power must keep going down


© Copyright 2012 HSA Foundation. All Rights Reserved.   2
ABUNDANT PARALLEL WORKLOADS

                                                        Biometric
                                                        Recognition

                         Natural UI &                   Secure, fast, accurate:
                                                        face, voice, fingerprints   Augmented
                         Gestures                                                   Reality
                         Touch, gesture,                                            Superimpose graphics,
                         and voice                                                  audio, and other digital
                                                                                    information as a virtual
                                                                                    overlay



                         Content                                                    AV Content
                         Everywhere                                                 Management
                         Content from any                                           Searching, indexing and
                         source to any display                                      tagging of video & audio.
                         seamlessly                                                 multimedia data mining
                                                        Beyond HD
                                                        Experiences
                                                        Streaming media, new
                                                        codecs, 3D, transcode,
                                                        audio




© Copyright 2012 HSA Foundation. All Rights Reserved.                                                          3
A NEW ERA OF PROCESSOR
PERFORMANCE
                                                                                                                           Heterogeneous
              Single-Core Era                                         Multi-Core Era
                                                                                                                            Systems Era
       Enabled by:           Constrained by:               Enabled by:             Constrained by:      Enabled by:                          Temporarily
        Moore’s               Power                        Moore’s Law               Power             Abundant data                      Constrained by:
         Law                   Complexity                   SMP                       Parallel SW        parallelism                            Programming
        Voltage                                             architecture              Scalability       Power efficient                        models
               Scaling                                                                                    GPUs                                   Comm.overhead

      Assembly  C/C++  Java …                                 pthreads  OpenMP  TBB                                   Shader  CUDA OpenCL
                                                                                                                               C++ and Java




                                                                                                     Modern Application
       Single-thread
       Performance




                                                        Performance




                                                                                                       Performance
                                                        Throughput
                                             ?
                                                                                       we are
                                  we are                                                here
                                   here
                                                                                                                                  we are
                                                                                                                                   here

                           Time                                         Time (# of processors)                              Time (Data-parallel exploitation)




© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                                            4
EVOLUTION OF HETEROGENEOUS
      COMPUTING
   Architecture Maturity & Programmer Accessibility




                                                                                                                                 Architected Era
                                      Excellent




                                                                                           Standards Drivers Era       Heterogeneous System Architecture
                                                                                                                              GPU Peer Processor
                                                                                         OpenCL™, DirectCompute
                                                                                                                          Mainstream programmers
                                                           Proprietary Drivers Era          Driver-based APIs
                                                                                                                          Full C++
                                                                                       Expert programmers                GPU as a co-processor
                                                          Graphics & Proprietary
                                                                                       C and C++ subsets                 Unified coherent address space
                                                            Driver-based APIs                                             Task parallel runtimes
                                                                                       Compute centric APIs , data
                                                                                        types                             Nested Data Parallel programs
                                                       “Adventurous” programmers                                         User mode dispatch
                                                                                       Multiple address spaces with
                                                                                        explicit data movement            Pre-emption and context
                                                       Exploit early programmable
                                                                                       Specialized work queue based       switching
                                                        “shader cores” in the GPU
                                                                                        structures
                                                       Make your program look like    Kernel mode dispatch
                                                        “graphics” to the GPU
Poor




                                                       CUDA™, Brook+, etc
                                                               2002 - 2008                     2009 - 2011                        2012 - 2020
                                                                2002 - 2008                    2009 - 2011                       2012 - 2020

      © Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                                 5
HSA FEATURE ROADMAP
         Physical                                 Optimized          Architectural            System
        Integration                               Platforms           Integration           Integration


  Integrate CPU & GPU                       GPU Compute C++       Unified Address Space    GPU compute
         in silicon                             support             for CPU and GPU        context switch


                                                                   GPU uses pageable
      Unified Memory                                                                       GPU graphics
                                          User mode scheduling     system memory via
        Controller                                                                          pre-emption
                                                                      CPU pointers


         Common                            Bi-Directional Power
                                                                  Fully coherent memory   Quality of Service
       Manufacturing                       Mgmt between CPU
                                                                   between CPU & GPU
        Technology                               and GPU



© Copyright 2012 HSA Foundation. All Rights Reserved.                                                          6
HETEROGENEOUS SYSTEM
ARCHITECTURE – AN OPEN PLATFORM
      Open Architecture, published specifications
              HSAIL virtual ISA
              HSA memory model
              HSA system architecture
      ISA agnostic for both CPU and GPU
      HSA Foundation formed in June 2012
      Inviting partners to join us, in all areas
              Hardware companies
              Operating Systems
              Tools and Middleware
              Applications

© Copyright 2012 HSA Foundation. All Rights Reserved.   7
HSA INTERMEDIATE LAYER - HSAIL
        HSAIL is a virtual ISA for parallel programs
               Finalized to ISA by a JIT compiler or “Finalizer”
               ISA independent by design

        Explicitly parallel
               Designed for data parallel programming

        Support for exceptions, virtual functions,
         and other high level language features
        Syscall methods
               GPU code can call directly to system services,
                IO, printf, etc

        Debugging support
© Copyright 2012 HSA Foundation. All Rights Reserved.               8
HSA MEMORY MODEL
      Designed to be compatible with C++11, Java and
       .NET Memory Models


      Relaxed consistency memory model for parallel
       compute performance


      Loads and stores can be re-ordered by the
       finalizer


      Visibility controlled by:
              Load.Acquire
              Store.Release
              Barriers

© Copyright 2012 HSA Foundation. All Rights Reserved.   9
THE HSA FOUNDATION
                Founders




                 Promoters




                Supporters



                Contributors



                 Associates
HSA SOFTWARE STACKS
APPLICATIONS, SYSTEM SOFTWARE AND
PROGRAMMING MODELS
Driver Stack                                                 HSA Software Stack

           Apps                                                    Apps
              Apps                                                        Apps
                        Apps                                                     Apps
                           Apps                                                         Apps
                                    Apps                                                       Apps
                                           Apps                                                       Apps



               Domain Libraries                                           HSA Domain Libraries



           OpenCL™ 1.x, DX Runtimes,
                                                                                                                   HSA Runtime
               User Mode Drivers
                                                                                               Task Queuing
                                                                                HSA JIT
                                                                                                 Libraries
                                                                                                                   HSA Kernel
           Graphics Kernel Mode Driver
                                                                                                                   Mode Driver


                                                        Hardware - APUs, CPUs, GPUs

          User mode component                           Kernel mode component             Components contributed by third parties

© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                               12
HSA SOLUTION STACK
                                           Application

                                        Domain Specific Libs
    Application SW               (Bolt, OpenCV™, … many others)

                                            OpenCL™       DirectX          Other
                                            Runtime       Runtime         Runtime




                                  HSA Runtime                       Legacy
                                                                    Drivers
     HSA Software                 HSAIL           Ctl
           Drivers
                           HSA Finalizer     Knl Driver

                                   GPU ISA
                                                                          Other
  Differentiated HW   CPU(s)                     GPU(s)
                                                                       Accelerators
OPENCL™ AND HSA
     HSA is an optimized platform architecture for
      OpenCL™
            Not an alternative to OpenCL™

     OpenCL™ on HSA will benefit from
            Avoidance of wasteful copies
            Low latency dispatch
            Improved memory model
            Pointers shared between CPU and GPU

     HSA also exposes a lower level programming interface, for
      those that want the ultimate in control and performance
            Can be exposed through OpenCL extensions


© Copyright 2012 HSA Foundation. All Rights Reserved.             14
MAKE GPUS EASIER TO PROGRAM:
PRIMARY PROGRAMMING MODELS
•     Microsoft C++AMP
        •    Address large population of Windows developers
        •    Integrated in Visual Studio and WinRT
        •    Microsoft Community Promise License for open platform use


•     Java acceleration
        •    Aparapi on OpenCL today
        •    Project Sumatra to add HSA support in an OpenJDK for Java 8
        •    Driving to have Sumatra absorbed into Java 9



© Copyright 2012 HSA Foundation. All Rights Reserved.                      15
AMD’S OPEN SOURCE COMMITMENT TO HSA
     We will open source our linux execution and compilation stack
            Jump start the ecosystem
            Allow a single shared implementation where appropriate
            Enable university research in all areas

                  Component Name                         AMD       Rationale
                                                        Specific
                  HSA Bolt Library                        No       Enable understanding and debug
                  HSAIL Code Generator                    No       Enable research
                  LLVM Contributions                      No       Industry and academic collaboration
                  HSA Assembler                           No       Enable understanding and debug
                  HSA Runtime                             No       Standardize on a single runtime
                  HSA Finalizer                           Yes      Enable research and debug
                  HSA Kernel Driver                       Yes      For inclusion in linux distros

© Copyright 2012 HSA Foundation. All Rights Reserved.                                                    16
ACCELERATED WORKLOADS
CLIENT AND SERVER EXAMPLES
HAAR Face Detection
CORNERSTONE TECHNOLOGY
FOR COMPUTERVISION
LOOKING FOR FACES ONE PLACE AT A TIME



Quick HD Calculations
Search square = 21 x 21
Pixels = 1920 x 1080 = 2,073,600
Search squares = 1900 x 1060 = ~2 Million
LOOKING FOR DIFFERENT SIZE FACES –
BY SCALING THE VIDEO FRAME




       More HD Calculations
       70% scaling in H and V
       Total Pixels = 4.07 Million
       Search squares = 3.8 Million




© Copyright 2012 HSA Foundation. All Rights Reserved.   20
HAAR CASCADE STAGES
                                                        Feature k


                                                        Feature l    Stage N


                                                        Feature m

                                                                                Face still
                                                                      Yes       possible?

                                                        Feature p
                                                                                   No
                                                        Feature r   Stage N+1


                                                        Feature q               REJECT
                                                                                FRAME




© Copyright 2012 HSA Foundation. All Rights Reserved.                                        21
22 CASCADE STAGES,
EARLY OUT BETWEEN EACH
                                                                                                  FACE
     STAGE 1                       STAGE 2                    STAGE 21        STAGE 22            CONFIRMED




                                                        NO FACE

                    Final HD Calculations                         Calculation Rate
                    Search squares = 3.8 million                  30 frames/sec = 1.4TCalcs/second
                    Average features per square = 124             60 frames/sec = 2.8TCalcs/second
                    Calculations per feature = 100
                    Calculations per frame = 47 GCalcs            …and this only gets front-facing faces


© Copyright 2012 HSA Foundation. All Rights Reserved.                                                      22
CASCADE DEPTH ANALYSIS
                                                        Cascade Depth
                                                                    25

                                                                   20
                                                                   15
                                                                10
                                                               5         20-25
                                                               0         15-20
                                                                         10-15
                                                                         5-10
                                                                         0-5




© Copyright 2012 HSA Foundation. All Rights Reserved.                            23
PROCESSING TIME/STAGE
                                                                  “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

                          100

                              90

                              80

                              70
                  Time (ms)




                              60

                              50

                              40

                              30
          GPU
                              20

                              10
          CPU
                               0
                                         1                    2                    3                    4            5         6   7   8   9-22
                                                                                                               Cascade Stage
    AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
    6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)



© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                             24
PERFORMANCE CPU-VS-GPU
                                                                       “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)
                               12


                               10


                                8
                  Images/Sec




                                6


                                4
           CPU

           HSA                  2

           GPU
                                0
                                      0                  1                  2               3         4        5        6    7   8   22
                                                                                           Number of Cascade Stages on GPU
       AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
       6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)




© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                     25
HAAR SOLUTION – RUN DIFFERENT
CASCADES ON GPU AND CPU
                                 By seamlessly sharing data between CPU and GPU,
                                HSA allows the right processor to handle its appropriate
                                                       workload
                                               +2.5x




                                                                        -2.5x
                                   INCREASED                 DECREASED ENERGY
                                  PERFORMANCE                   PER FRAME


© Copyright 2012 HSA Foundation. All Rights Reserved.                                      26
ACCELERATING B+TREE
SEARCHES
CLOUD SERVER WORKLOAD
B+TREE SEARCHES
     B+Trees are used by enterprise DB applications
            SQL: SQLite, MySQL, Oracle, and others
            No-SQL: CouchDB, Tokyo Cabinet, and others
                   Audio search, video copy detection
            DB Size: Can be many times larger than GPU memory!
                                                                   A simple B+Tree linking the keys 1-7. The
                                                                   linked list (red) allows rapid in-order traversal.
     B+Trees are a fundamental data structure
            Used to reduce memory & disk access to locate a key
            Can support index- and range-based queries
            Can be updated efficiently




© Copyright 2012 HSA Foundation. All Rights Reserved.                                                              28
PARALLEL B+TREE SEARCHES ON HSA
 With HSA, DB can be larger than GPU                                                                                                  HSA increases performance versus Multi
 memory, and can be shared with CPU.                                                                                                  Threaded CPU, even for tree structures that
                                                                                                                                      reside in pinned host memory.
         HSA lets us move compute to data
                                                                                                                                                           Millions of Queries Per Second (MPQ) by B+Tree “order”
                      Parallel search can move to GPU                                                                                                80
                      Sequential updates can remain on CPU                                                                                           70
                                                                                                                                                      60
                                                                                                                                                      50




                                                                                                                                               MQPS
              Platform                                            Size < 3 GB                     Size > 3 GB                                         40
                                                                                                                                                                                                                       CPU
                                                                                                                                                      30
              dGPU                                                         ✓                               ✗                                          20                                                               APU
              (memory size = 3GB)                                                                                                                     10
              HSA                                                          ✓                               ✓                                           0
                                                                                                                                                              4        8        16       32          64          128
                                                                                                                                                                           “Order” of B+Tree Node


     M. Daga, and M. Nutter, “Exploiting Coarse-Grained Parallelism in B+Tree Searches on an APU”, Accepted at ”Second Workshop on Irregular Applications: Algorithms and Architectures, (IA3)” November 2012.
     AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM




© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                                                                                                        29
ACCELERATING SUFFIX ARRAY
CONSTRUCTION
CLOUD SERVER WORKLOAD
SUFFIX ARRAYS
     Suffix Arrays are used to accelerate in-memory cloud workloads
            Full text index search
            Lossless data compression
            Bio-informatics

     Suffix Arrays are a fundamental data structure
            Designed for efficient searching of a large text
                   Quickly locate every occurrence of a substring S in a text T

     Suffix Array Construction contains parallel & sequential steps
            Sorting, ideal for GPU
            Nested recursion, ideal for CPU


© Copyright 2012 HSA Foundation. All Rights Reserved.                              31
ACCELERATED SUFFIX ARRAY
CONSTRUCTION ON HSA
By efficiently sharing data between CPU and                                                                                          By offloading data parallel computations to
GPU, HSA lets us move compute to data                                                                                                GPU, HSA increases performance and
without penalty of intermediate copies.                                                                                              reduces energy for Suffix Array Construction
                                                                                                                                     versus single threaded execution.
    Skew Algorithm for Compute SA
                               Radix Sort::GPU
                                                                                                                                                     +5.8x

                             Lexical Rank::CPU


                              Compute SA::CPU


                               Radix Sort::GPU                                                                                                                                        -5x


                               Merge Sort::GPU
                                                                                                                                         INCREASED                                DECREASED
                                                                                                                                        PERFORMANCE                                ENERGY
    M. Deo, “Parallel Suffix Array Construction and Least Common Prefix for the GPU”, Submitted to ”Principles and Practice of Parallel Programming, (PPoPP’13)” February 2013.
    AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM




© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                                                                         32
VASCULAR IMAGE
ENHANCEMENT
EMBEDDED MEDICAL WORKLOAD
VASCULATURE IMAGE ENHANCEMENT
     Automatic Image Enhancement
            Reduces expertise required to analyze the image
            Reduces time to diagnosis
    Vasculature Image Enhancement:
                                                        Input Image   Output Image
                GPU::Gaussian Convolution


               GPU::Hessian Matrix Compute
                Intermediate copied to CPU


                GPU::Eigen Decomposition
                Intermediate copied to CPU


              GPU::Vascular Network Compute
                Intermediate copied to CPU


                      CPU::Analysis




© Copyright 2012 HSA Foundation. All Rights Reserved.                                34
IMPROVED PERFORMANCE AND ENERGY
                                        HSA increases performance and reduces energy versus single
                                        threaded execution by offloading data parallel compute to GPU

                                                                            +14x




                                                                                                                        -14x
                                                       INCREASED                                                    DECREASED
                                                      PERFORMANCE                                                    ENERGY
    AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
    6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)



© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                           35
EASE OF PROGRAMMING
CODE COMPLEXITY VS. PERFORMANCE
LINES-OF-CODE AND PERFORMANCE FOR
DIFFERENT PROGRAMMING MODELS
         350                                                                                                                                                                                35.00
                                                                                             (Exemplary ISV “Hessian” Kernel)

         300                                                                                                                                                                                30.00
                                                                                                                     Init.


         250                                                                                                                                                                                25.00
                                                                                     Launch




                                                                                                                                                                                                    Performance
         200                                                                                                      Compile                                                                   20.00
   LOC




                                                                                                                                               Compile
                                                                                                                    Copy
                                                                                                                                                 Copy
         150                                                                                                                                                                                15.00
                                                                                                                   Launch                       Launch               Launch

                                                                                    Algorithm
                                                        Launch
         100                                                                                                                                                                                10.00
                           Launch
                                                                                                                  Algorithm                    Algorithm            Algorithm   Launch

         50                                                                                                                                                                                 5.00
                         Algorithm                     Algorithm                                                                                                                Algorithm


                                                                                                                 Copy-back                    Copy-back             Copy-back
          0                                                                                                                                                                                 0
                          Serial CPU                       TBB                     Intrinsics+TBB                  OpenCL™-C                OpenCL™ -C++            C++ AMP      HSA Bolt
               Copy-back                Algorithm                  Launch                     Copy                           Compile                Init          Performance

                AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM.
                Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta



© Copyright 2012 HSA Foundation. All Rights Reserved.                                                                                                                                              37
THE HSA FUTURE




                                     Highly productive programmers
                                 + Scalable performance
                                 + Power efficiency
                                 = AMAZING USER EXPERIENCES


© Copyright 2012 HSA Foundation. All Rights Reserved.                38
THE HSA FOUNDATION
                Founders




                 Promoters




                Supporters



                Contributors



                 Associates
THANK YOU!




                                                  www.hsafoundation.com




© Copyright 2012 HSA Foundation. All Rights Reserved.                     40

Weitere ähnliche Inhalte

Was ist angesagt?

Concur Best Practices In Travel And Expense Management
Concur Best Practices In Travel And Expense ManagementConcur Best Practices In Travel And Expense Management
Concur Best Practices In Travel And Expense Managementjaysdon02
 
Cots moves to multicore: AMD
Cots moves to multicore: AMDCots moves to multicore: AMD
Cots moves to multicore: AMDKonrad Witte
 
Information Exchanges – Scaling strategies
Information Exchanges – Scaling strategiesInformation Exchanges – Scaling strategies
Information Exchanges – Scaling strategiesValtech India
 
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...Keshav Murthy
 
The smartphone opportunity-r3
The smartphone opportunity-r3The smartphone opportunity-r3
The smartphone opportunity-r3Brian Richards
 
Destiny pen user manual
Destiny pen user manualDestiny pen user manual
Destiny pen user manualTony Toole
 
Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...
Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...
Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...Saiful Hidayat
 
There Is No Cloud - Open Spectrum Inc - Sean Patrick Tario
There Is No Cloud - Open Spectrum Inc - Sean Patrick TarioThere Is No Cloud - Open Spectrum Inc - Sean Patrick Tario
There Is No Cloud - Open Spectrum Inc - Sean Patrick TarioOpen Spectrum Inc
 
IT FUTURE 2011 - Fujitsu ror orchestration
IT FUTURE 2011 - Fujitsu ror orchestrationIT FUTURE 2011 - Fujitsu ror orchestration
IT FUTURE 2011 - Fujitsu ror orchestrationFujitsu France
 
Google apps brochure
Google apps brochureGoogle apps brochure
Google apps brochureFrank Jung
 
COSC 426 Lect 2. - AR Technology
COSC 426 Lect 2. - AR Technology COSC 426 Lect 2. - AR Technology
COSC 426 Lect 2. - AR Technology Mark Billinghurst
 
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...qedanne
 
Introduction to Enterprise Cloud Economics
Introduction to Enterprise Cloud EconomicsIntroduction to Enterprise Cloud Economics
Introduction to Enterprise Cloud EconomicsEverest Group
 
05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_powerGennaro (Rino) Persico
 
EBSgnrlpresoAug2000
EBSgnrlpresoAug2000EBSgnrlpresoAug2000
EBSgnrlpresoAug2000gamaus
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iCOMMON Europe
 
Designing the User Experience
Designing the User ExperienceDesigning the User Experience
Designing the User ExperienceJason Wehmhoener
 

Was ist angesagt? (20)

Concur Best Practices In Travel And Expense Management
Concur Best Practices In Travel And Expense ManagementConcur Best Practices In Travel And Expense Management
Concur Best Practices In Travel And Expense Management
 
Dsp
DspDsp
Dsp
 
Cots moves to multicore: AMD
Cots moves to multicore: AMDCots moves to multicore: AMD
Cots moves to multicore: AMD
 
Information Exchanges – Scaling strategies
Information Exchanges – Scaling strategiesInformation Exchanges – Scaling strategies
Information Exchanges – Scaling strategies
 
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
Performance and scalability of Informix ultimate warehouse edtion on Intel Xe...
 
Chris millercloud
Chris millercloudChris millercloud
Chris millercloud
 
The smartphone opportunity-r3
The smartphone opportunity-r3The smartphone opportunity-r3
The smartphone opportunity-r3
 
Destiny pen user manual
Destiny pen user manualDestiny pen user manual
Destiny pen user manual
 
Qf deck
Qf deckQf deck
Qf deck
 
Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...
Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...
Saiful hidayar santri indigo telkom republika pondok pesantren keresek garut ...
 
There Is No Cloud - Open Spectrum Inc - Sean Patrick Tario
There Is No Cloud - Open Spectrum Inc - Sean Patrick TarioThere Is No Cloud - Open Spectrum Inc - Sean Patrick Tario
There Is No Cloud - Open Spectrum Inc - Sean Patrick Tario
 
IT FUTURE 2011 - Fujitsu ror orchestration
IT FUTURE 2011 - Fujitsu ror orchestrationIT FUTURE 2011 - Fujitsu ror orchestration
IT FUTURE 2011 - Fujitsu ror orchestration
 
Google apps brochure
Google apps brochureGoogle apps brochure
Google apps brochure
 
COSC 426 Lect 2. - AR Technology
COSC 426 Lect 2. - AR Technology COSC 426 Lect 2. - AR Technology
COSC 426 Lect 2. - AR Technology
 
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
 
Introduction to Enterprise Cloud Economics
Introduction to Enterprise Cloud EconomicsIntroduction to Enterprise Cloud Economics
Introduction to Enterprise Cloud Economics
 
05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power
 
EBSgnrlpresoAug2000
EBSgnrlpresoAug2000EBSgnrlpresoAug2000
EBSgnrlpresoAug2000
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM i
 
Designing the User Experience
Designing the User ExperienceDesigning the User Experience
Designing the User Experience
 

Andere mochten auch

Gpu Compute
Gpu ComputeGpu Compute
Gpu Computejworth
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
Concurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyConcurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyTheo Hultberg
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD
 
Modeling & design multi-core NUMA simulator
Modeling & design multi-core NUMA simulatorModeling & design multi-core NUMA simulator
Modeling & design multi-core NUMA simulatorAbed Maatalla
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 

Andere mochten auch (20)

Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
Concurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyConcurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRuby
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory Access
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
NUMA overview
NUMA overviewNUMA overview
NUMA overview
 
Modeling & design multi-core NUMA simulator
Modeling & design multi-core NUMA simulatorModeling & design multi-core NUMA simulator
Modeling & design multi-core NUMA simulator
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 

Ähnlich wie ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at Even Lower Power – HSA

AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 
FewebPlus @ microsoft 19 april 2010 cloud continuum
FewebPlus @ microsoft 19 april 2010 cloud continuumFewebPlus @ microsoft 19 april 2010 cloud continuum
FewebPlus @ microsoft 19 april 2010 cloud continuumTom Crombez
 
Oasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialOasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialJamie Clark
 
Tech Ed09 India Ver M New
Tech Ed09 India Ver M NewTech Ed09 India Ver M New
Tech Ed09 India Ver M Newrsnarayanan
 
Cisco Unified Computing Systems Update
Cisco Unified Computing Systems UpdateCisco Unified Computing Systems Update
Cisco Unified Computing Systems UpdateCisco Canada
 
Lap around windows azure
Lap around windows azureLap around windows azure
Lap around windows azureManish Corriea
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...CloudOps Summit
 
Govind ioug120505
Govind ioug120505Govind ioug120505
Govind ioug120505gshare
 
Drupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows AzureDrupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows AzureFord AntiTrust
 
Cloud computing bringing the dark side of enterprise apps into the light by...
Cloud computing   bringing the dark side of enterprise apps into the light by...Cloud computing   bringing the dark side of enterprise apps into the light by...
Cloud computing bringing the dark side of enterprise apps into the light by...Khazret Sapenov
 
The Data Distribution Service
The Data Distribution ServiceThe Data Distribution Service
The Data Distribution ServiceAngelo Corsaro
 
Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)IBM Danmark
 
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data CenterCloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data Centervsarathy
 
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure PlatformVitor Tomaz
 
Compatible one cloud expowest nov 2012
Compatible one cloud expowest nov 2012Compatible one cloud expowest nov 2012
Compatible one cloud expowest nov 2012CompatibleOne
 
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developerVitor Tomaz
 
BOI 2011 - Be what's next
BOI 2011 - Be what's nextBOI 2011 - Be what's next
BOI 2011 - Be what's nextTudor Damian
 
Cloud Computing in a Nutshell
Cloud Computing in a NutshellCloud Computing in a Nutshell
Cloud Computing in a NutshellVictor Haydin
 

Ähnlich wie ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at Even Lower Power – HSA (20)

AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
FewebPlus @ microsoft 19 april 2010 cloud continuum
FewebPlus @ microsoft 19 april 2010 cloud continuumFewebPlus @ microsoft 19 april 2010 cloud continuum
FewebPlus @ microsoft 19 april 2010 cloud continuum
 
Oasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialOasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficial
 
Tech Ed09 India Ver M New
Tech Ed09 India Ver M NewTech Ed09 India Ver M New
Tech Ed09 India Ver M New
 
Cisco Unified Computing Systems Update
Cisco Unified Computing Systems UpdateCisco Unified Computing Systems Update
Cisco Unified Computing Systems Update
 
Lap around windows azure
Lap around windows azureLap around windows azure
Lap around windows azure
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
 
Govind ioug120505
Govind ioug120505Govind ioug120505
Govind ioug120505
 
Drupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows AzureDrupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows Azure
 
Cloud computing bringing the dark side of enterprise apps into the light by...
Cloud computing   bringing the dark side of enterprise apps into the light by...Cloud computing   bringing the dark side of enterprise apps into the light by...
Cloud computing bringing the dark side of enterprise apps into the light by...
 
The Data Distribution Service
The Data Distribution ServiceThe Data Distribution Service
The Data Distribution Service
 
Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)
 
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data CenterCloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
 
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
 
Compatible one cloud expowest nov 2012
Compatible one cloud expowest nov 2012Compatible one cloud expowest nov 2012
Compatible one cloud expowest nov 2012
 
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
 
BOI 2011 - Be what's next
BOI 2011 - Be what's nextBOI 2011 - Be what's next
BOI 2011 - Be what's next
 
Cloud Computing in a Nutshell
Cloud Computing in a NutshellCloud Computing in a Nutshell
Cloud Computing in a Nutshell
 
426 lecture2: AR Technology
426 lecture2: AR Technology426 lecture2: AR Technology
426 lecture2: AR Technology
 

Mehr von HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 

Mehr von HSA Foundation (19)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Kürzlich hochgeladen

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at Even Lower Power – HSA

  • 1. Sensor Integration and Improved User Experiences at Lower Power Phil Rogers President, HSA Foundation Corporate Fellow, AMD
  • 2. SENSOR INTEGRATION CHALLENGES  Sensors on all new devices, going up in resolution and data rates  Extract meaning from video, audio and location  Mix of scalar and parallel workloads  Process locally or in the cloud  Power must keep going down © Copyright 2012 HSA Foundation. All Rights Reserved. 2
  • 3. ABUNDANT PARALLEL WORKLOADS Biometric Recognition Natural UI & Secure, fast, accurate: face, voice, fingerprints Augmented Gestures Reality Touch, gesture, Superimpose graphics, and voice audio, and other digital information as a virtual overlay Content AV Content Everywhere Management Content from any Searching, indexing and source to any display tagging of video & audio. seamlessly multimedia data mining Beyond HD Experiences Streaming media, new codecs, 3D, transcode, audio © Copyright 2012 HSA Foundation. All Rights Reserved. 3
  • 4. A NEW ERA OF PROCESSOR PERFORMANCE Heterogeneous Single-Core Era Multi-Core Era Systems Era Enabled by: Constrained by: Enabled by: Constrained by: Enabled by: Temporarily  Moore’s Power  Moore’s Law Power  Abundant data Constrained by: Law Complexity  SMP Parallel SW parallelism Programming  Voltage architecture Scalability  Power efficient models Scaling GPUs Comm.overhead Assembly  C/C++  Java … pthreads  OpenMP  TBB Shader  CUDA OpenCL  C++ and Java Modern Application Single-thread Performance Performance Performance Throughput ? we are we are here here we are here Time Time (# of processors) Time (Data-parallel exploitation) © Copyright 2012 HSA Foundation. All Rights Reserved. 4
  • 5. EVOLUTION OF HETEROGENEOUS COMPUTING Architecture Maturity & Programmer Accessibility Architected Era Excellent Standards Drivers Era Heterogeneous System Architecture GPU Peer Processor OpenCL™, DirectCompute  Mainstream programmers Proprietary Drivers Era Driver-based APIs  Full C++  Expert programmers  GPU as a co-processor Graphics & Proprietary  C and C++ subsets  Unified coherent address space Driver-based APIs  Task parallel runtimes  Compute centric APIs , data types  Nested Data Parallel programs  “Adventurous” programmers  User mode dispatch  Multiple address spaces with explicit data movement  Pre-emption and context  Exploit early programmable  Specialized work queue based switching “shader cores” in the GPU structures  Make your program look like  Kernel mode dispatch “graphics” to the GPU Poor  CUDA™, Brook+, etc 2002 - 2008 2009 - 2011 2012 - 2020 2002 - 2008 2009 - 2011 2012 - 2020 © Copyright 2012 HSA Foundation. All Rights Reserved. 5
  • 6. HSA FEATURE ROADMAP Physical Optimized Architectural System Integration Platforms Integration Integration Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute in silicon support for CPU and GPU context switch GPU uses pageable Unified Memory GPU graphics User mode scheduling system memory via Controller pre-emption CPU pointers Common Bi-Directional Power Fully coherent memory Quality of Service Manufacturing Mgmt between CPU between CPU & GPU Technology and GPU © Copyright 2012 HSA Foundation. All Rights Reserved. 6
  • 7. HETEROGENEOUS SYSTEM ARCHITECTURE – AN OPEN PLATFORM  Open Architecture, published specifications  HSAIL virtual ISA  HSA memory model  HSA system architecture  ISA agnostic for both CPU and GPU  HSA Foundation formed in June 2012  Inviting partners to join us, in all areas  Hardware companies  Operating Systems  Tools and Middleware  Applications © Copyright 2012 HSA Foundation. All Rights Reserved. 7
  • 8. HSA INTERMEDIATE LAYER - HSAIL  HSAIL is a virtual ISA for parallel programs  Finalized to ISA by a JIT compiler or “Finalizer”  ISA independent by design  Explicitly parallel  Designed for data parallel programming  Support for exceptions, virtual functions, and other high level language features  Syscall methods  GPU code can call directly to system services, IO, printf, etc  Debugging support © Copyright 2012 HSA Foundation. All Rights Reserved. 8
  • 9. HSA MEMORY MODEL  Designed to be compatible with C++11, Java and .NET Memory Models  Relaxed consistency memory model for parallel compute performance  Loads and stores can be re-ordered by the finalizer  Visibility controlled by:  Load.Acquire  Store.Release  Barriers © Copyright 2012 HSA Foundation. All Rights Reserved. 9
  • 10. THE HSA FOUNDATION Founders Promoters Supporters Contributors Associates
  • 11. HSA SOFTWARE STACKS APPLICATIONS, SYSTEM SOFTWARE AND PROGRAMMING MODELS
  • 12. Driver Stack HSA Software Stack Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Domain Libraries HSA Domain Libraries OpenCL™ 1.x, DX Runtimes, HSA Runtime User Mode Drivers Task Queuing HSA JIT Libraries HSA Kernel Graphics Kernel Mode Driver Mode Driver Hardware - APUs, CPUs, GPUs User mode component Kernel mode component Components contributed by third parties © Copyright 2012 HSA Foundation. All Rights Reserved. 12
  • 13. HSA SOLUTION STACK Application Domain Specific Libs Application SW (Bolt, OpenCV™, … many others) OpenCL™ DirectX Other Runtime Runtime Runtime HSA Runtime Legacy Drivers HSA Software HSAIL Ctl Drivers HSA Finalizer Knl Driver GPU ISA Other Differentiated HW CPU(s) GPU(s) Accelerators
  • 14. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  HSA also exposes a lower level programming interface, for those that want the ultimate in control and performance  Can be exposed through OpenCL extensions © Copyright 2012 HSA Foundation. All Rights Reserved. 14
  • 15. MAKE GPUS EASIER TO PROGRAM: PRIMARY PROGRAMMING MODELS • Microsoft C++AMP • Address large population of Windows developers • Integrated in Visual Studio and WinRT • Microsoft Community Promise License for open platform use • Java acceleration • Aparapi on OpenCL today • Project Sumatra to add HSA support in an OpenJDK for Java 8 • Driving to have Sumatra absorbed into Java 9 © Copyright 2012 HSA Foundation. All Rights Reserved. 15
  • 16. AMD’S OPEN SOURCE COMMITMENT TO HSA  We will open source our linux execution and compilation stack  Jump start the ecosystem  Allow a single shared implementation where appropriate  Enable university research in all areas Component Name AMD Rationale Specific HSA Bolt Library No Enable understanding and debug HSAIL Code Generator No Enable research LLVM Contributions No Industry and academic collaboration HSA Assembler No Enable understanding and debug HSA Runtime No Standardize on a single runtime HSA Finalizer Yes Enable research and debug HSA Kernel Driver Yes For inclusion in linux distros © Copyright 2012 HSA Foundation. All Rights Reserved. 16
  • 18. HAAR Face Detection CORNERSTONE TECHNOLOGY FOR COMPUTERVISION
  • 19. LOOKING FOR FACES ONE PLACE AT A TIME Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million
  • 20. LOOKING FOR DIFFERENT SIZE FACES – BY SCALING THE VIDEO FRAME More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million © Copyright 2012 HSA Foundation. All Rights Reserved. 20
  • 21. HAAR CASCADE STAGES Feature k Feature l Stage N Feature m Face still Yes possible? Feature p No Feature r Stage N+1 Feature q REJECT FRAME © Copyright 2012 HSA Foundation. All Rights Reserved. 21
  • 22. 22 CASCADE STAGES, EARLY OUT BETWEEN EACH FACE STAGE 1 STAGE 2 STAGE 21 STAGE 22 CONFIRMED NO FACE Final HD Calculations Calculation Rate Search squares = 3.8 million 30 frames/sec = 1.4TCalcs/second Average features per square = 124 60 frames/sec = 2.8TCalcs/second Calculations per feature = 100 Calculations per frame = 47 GCalcs …and this only gets front-facing faces © Copyright 2012 HSA Foundation. All Rights Reserved. 22
  • 23. CASCADE DEPTH ANALYSIS Cascade Depth 25 20 15 10 5 20-25 0 15-20 10-15 5-10 0-5 © Copyright 2012 HSA Foundation. All Rights Reserved. 23
  • 24. PROCESSING TIME/STAGE “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) 100 90 80 70 Time (ms) 60 50 40 30 GPU 20 10 CPU 0 1 2 3 4 5 6 7 8 9-22 Cascade Stage AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1) © Copyright 2012 HSA Foundation. All Rights Reserved. 24
  • 25. PERFORMANCE CPU-VS-GPU “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) 12 10 8 Images/Sec 6 4 CPU HSA 2 GPU 0 0 1 2 3 4 5 6 7 8 22 Number of Cascade Stages on GPU AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1) © Copyright 2012 HSA Foundation. All Rights Reserved. 25
  • 26. HAAR SOLUTION – RUN DIFFERENT CASCADES ON GPU AND CPU By seamlessly sharing data between CPU and GPU, HSA allows the right processor to handle its appropriate workload +2.5x -2.5x INCREASED DECREASED ENERGY PERFORMANCE PER FRAME © Copyright 2012 HSA Foundation. All Rights Reserved. 26
  • 28. B+TREE SEARCHES  B+Trees are used by enterprise DB applications  SQL: SQLite, MySQL, Oracle, and others  No-SQL: CouchDB, Tokyo Cabinet, and others  Audio search, video copy detection  DB Size: Can be many times larger than GPU memory! A simple B+Tree linking the keys 1-7. The linked list (red) allows rapid in-order traversal.  B+Trees are a fundamental data structure  Used to reduce memory & disk access to locate a key  Can support index- and range-based queries  Can be updated efficiently © Copyright 2012 HSA Foundation. All Rights Reserved. 28
  • 29. PARALLEL B+TREE SEARCHES ON HSA With HSA, DB can be larger than GPU HSA increases performance versus Multi memory, and can be shared with CPU. Threaded CPU, even for tree structures that reside in pinned host memory.  HSA lets us move compute to data Millions of Queries Per Second (MPQ) by B+Tree “order”  Parallel search can move to GPU 80  Sequential updates can remain on CPU 70 60 50 MQPS Platform Size < 3 GB Size > 3 GB 40 CPU 30 dGPU ✓ ✗ 20 APU (memory size = 3GB) 10 HSA ✓ ✓ 0 4 8 16 32 64 128 “Order” of B+Tree Node M. Daga, and M. Nutter, “Exploiting Coarse-Grained Parallelism in B+Tree Searches on an APU”, Accepted at ”Second Workshop on Irregular Applications: Algorithms and Architectures, (IA3)” November 2012. AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM © Copyright 2012 HSA Foundation. All Rights Reserved. 29
  • 31. SUFFIX ARRAYS  Suffix Arrays are used to accelerate in-memory cloud workloads  Full text index search  Lossless data compression  Bio-informatics  Suffix Arrays are a fundamental data structure  Designed for efficient searching of a large text  Quickly locate every occurrence of a substring S in a text T  Suffix Array Construction contains parallel & sequential steps  Sorting, ideal for GPU  Nested recursion, ideal for CPU © Copyright 2012 HSA Foundation. All Rights Reserved. 31
  • 32. ACCELERATED SUFFIX ARRAY CONSTRUCTION ON HSA By efficiently sharing data between CPU and By offloading data parallel computations to GPU, HSA lets us move compute to data GPU, HSA increases performance and without penalty of intermediate copies. reduces energy for Suffix Array Construction versus single threaded execution. Skew Algorithm for Compute SA Radix Sort::GPU +5.8x Lexical Rank::CPU Compute SA::CPU Radix Sort::GPU -5x Merge Sort::GPU INCREASED DECREASED PERFORMANCE ENERGY M. Deo, “Parallel Suffix Array Construction and Least Common Prefix for the GPU”, Submitted to ”Principles and Practice of Parallel Programming, (PPoPP’13)” February 2013. AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM © Copyright 2012 HSA Foundation. All Rights Reserved. 32
  • 34. VASCULATURE IMAGE ENHANCEMENT  Automatic Image Enhancement  Reduces expertise required to analyze the image  Reduces time to diagnosis Vasculature Image Enhancement: Input Image Output Image GPU::Gaussian Convolution GPU::Hessian Matrix Compute Intermediate copied to CPU GPU::Eigen Decomposition Intermediate copied to CPU GPU::Vascular Network Compute Intermediate copied to CPU CPU::Analysis © Copyright 2012 HSA Foundation. All Rights Reserved. 34
  • 35. IMPROVED PERFORMANCE AND ENERGY HSA increases performance and reduces energy versus single threaded execution by offloading data parallel compute to GPU +14x -14x INCREASED DECREASED PERFORMANCE ENERGY AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1) © Copyright 2012 HSA Foundation. All Rights Reserved. 35
  • 36. EASE OF PROGRAMMING CODE COMPLEXITY VS. PERFORMANCE
  • 37. LINES-OF-CODE AND PERFORMANCE FOR DIFFERENT PROGRAMMING MODELS 350 35.00 (Exemplary ISV “Hessian” Kernel) 300 30.00 Init. 250 25.00 Launch Performance 200 Compile 20.00 LOC Compile Copy Copy 150 15.00 Launch Launch Launch Algorithm Launch 100 10.00 Launch Algorithm Algorithm Algorithm Launch 50 5.00 Algorithm Algorithm Algorithm Copy-back Copy-back Copy-back 0 0 Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt Copy-back Algorithm Launch Copy Compile Init Performance AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM. Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta © Copyright 2012 HSA Foundation. All Rights Reserved. 37
  • 38. THE HSA FUTURE Highly productive programmers + Scalable performance + Power efficiency = AMAZING USER EXPERIENCES © Copyright 2012 HSA Foundation. All Rights Reserved. 38
  • 39. THE HSA FOUNDATION Founders Promoters Supporters Contributors Associates
  • 40. THANK YOU! www.hsafoundation.com © Copyright 2012 HSA Foundation. All Rights Reserved. 40