SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Exploiting Multicores to
                         Optimize Business Process
                                 Execution
                                       Daniele Bonetta,
                        Achille Peternier, Cesare Pautasso, Walter Binder
                                     Faculty of Informatics
                                   University of Lugano - USI
                                          Switzerland
                                     http://sosoa.inf.usi.ch
                                                           daniele.bonetta@usi.ch
Tuesday, December 14, 2010
BP Execution Engine
                             Focus: Business Process Runtime
                                 Execution Environment
                                                            Web
                                                           Service
                                       Composite
                                          BP                  Web
                  Client                 Web
                                       Execution             Service
                                        Service
                                        Engine                  Web
                                                               Service




Tuesday, December 14, 2010
How to scale?

                    Client
            ClientClient
                                               Web
               Client
      Client                                  Service
           Client                                Web
                                  Composite
                                     BP
      Client
          Client                                Service
                                    Web
                                  Execution
      Client                       Service
                                   Engine          Web
          Client
                                                  Service
      Client
           Client
                Client
            Client


Tuesday, December 14, 2010
Client                 How to scale?
 Client
              Client
    Client Client Client
Client   Client
             Client
       Client Client                           Web
ent           Client
     Client                                   Service
ent       Client
              Client                             Web
   Client                   Composite
                                BP
     Client
          Client                                Service
ent                            Web
                             Execution
  Client       Client         Service              Web
     Client                   Engine
 ient Client
   Client                                         Service
     Client Client
      Client
           Client
     Client
Client
       Client Client
   Client    ClientClient
          Client      Client
   Client       Client
 Tuesday, December 14, 2010
Client                 How to scale?
 Client
              Client
    Client Client Client
Client   Client
             Client
       Client Client                           Web
ent           Client
     Client                                   Service
ent       Client
              Client                             Web
   Client                   Composite
                             Service
      More clients == More BP Instances
     Client
          Client              Web
                           Composition          Service
ent
  Client       Client
     Client                  Service
                             Engine                Web
 ient Client
   Client                                         Service
     Client Client
      Client
           Client
     Client
Client
       Client Client
   Client    ClientClient
          Client      Client
   Client       Client
 Tuesday, December 14, 2010
Client                 How to scale?
 Client
              Client
    Client Client Client
Client   Client
             Client
       Client Client                           Web
ent           Client
     Client                                   Service
ent       Client
              Client                             Web
   Client                   Composite
                             Service
      More clients <= More BP Instances
     Client
          Client              Web
                           Composition          Service
ent
  Client       Client
     Client                  Service
                             Engine                Web
 ient Client
   Client                                         Service
     Client Client
      Client
           Client
     Client
Client
       Client Client
   Client    ClientClient
          Client      Client
   Client       Client
 Tuesday, December 14, 2010
Multicores

                             core     core           core      core




                             core      core          core      core

                  core         core   core    core      core   core
                                                                      IBM Power7
Tuesday, December 14, 2010
Outline
                1. Multicore Issues

                2. JOpera Business Process Execution Engine

                             1. Thread Level Parallelism

                             2. CPU/Core Level Parallelism

                3. Experimental Results

                4. Conclusion

Tuesday, December 14, 2010
Multicore Issues

                                        • Number of cores
                                        • Type of cores (e.g.
                                          SMT)
                                        • On Chip Caching
                                          Layout (e.g. L2, L3...)
                                        • On Board Memory
                                          Layout (e.g. NUMA,
                                          NUMA-CC, ...)


Tuesday, December 14, 2010
Multicore Issues



             • Cores Num                 Th Migrations,
             • Cores Type                 Ctx Switches


             • Cache Layout
             • Memory Layout
Tuesday, December 14, 2010
Multicore Issues



             • Cores Num                 Th Migrations,
             • Cores Type                 Ctx Switches


             • Cache Layout                     Data Locality,
             • Memory Layout                     Contention

Tuesday, December 14, 2010
BP Execution Engine




                             Java Business Process Execution Engine




Tuesday, December 14, 2010
3 Layers Approach
                                Concurrent Business
                                 Process Instances



                                    OS Threads



                                 Hardware Cores



Tuesday, December 14, 2010
Abstraction Layers
                                Concurrent Business
                                 Process Instances



                                    OS Threads



                                 Hardware Cores



Tuesday, December 14, 2010
Engine Architecture

                             Request       Kernel     Invoker
                             Handler




                                 Request       Execution
                                 Queue          Queue


Tuesday, December 14, 2010
Abstraction Layers
                                Concurrent Business
                                 Process Instances



                                    OS Threads



                                 Hardware Cores



Tuesday, December 14, 2010
Engine Architecture

                                  Kernel    Invoker


                      Request
                      Handler




                                 Request   Execution
                                 Queue      Queue

Tuesday, December 14, 2010
BP Execution

                                  Kernel     Invoker


                      Request
                      Handler




                                 Request    Execution
                                 Queue       Queue

Tuesday, December 14, 2010
BP Execution

                                  Kernel     Invoker


                      Request
                      Handler




                                 Request    Execution
                                 Queue       Queue

Tuesday, December 14, 2010
BP Execution

                                  Kernel     Invoker


                      Request
                      Handler




                                 Request    Execution
                                 Queue       Queue

Tuesday, December 14, 2010
BP Execution

                                  Kernel     Invoker


                      Request
                      Handler




                                 Request    Execution
                                 Queue       Queue

Tuesday, December 14, 2010
Abstraction Layers
                                Concurrent Business
                                 Process Instances



                                    OS Threads



                                 Hardware Cores



Tuesday, December 14, 2010
Deployment on Multicores
                                                 Kernel   Invoker

                                       Request
                                       handler




                         // threads



                             core core core               core core core
                             core core core               core core core



Tuesday, December 14, 2010
Deployment on Multicores
                                                 Kernel   Invoker

                                       Request
                                       handler




                         // threads

                                      How?
                             core core core               core core core
                             core core core               core core core



Tuesday, December 14, 2010
OverHPC Library

                                 Jopera Engine (Java)

                                OverHPC (JNI, C, Java)

                                        libpfm

                                    Linux Kernel

                                 Multicore Hardware




Tuesday, December 14, 2010
OverHPC Library

                                 Jopera Engine (Java)

            1. Control and Change (JNI, C, Java) scheduling
                         OverHPC per-thread

                                       libpfm

                                    Linux Kernel
            2. Measure low level thread performance data
                                Multicore Hardware




Tuesday, December 14, 2010
OverHPC Library API

                 1) Control and Change per-thread scheduling
                             Thread-Core Dynamic Affinity Binding
                                       getThreadPID()
                                      getThreadAffinity()
                                      setThreadAffinity()
                                       getAffinityInfo()




Tuesday, December 14, 2010
OverHPC Library API

                      2) Measure low level thread performance:
                             Hardware Performance Counters
                                  getEventsFromCache()
                                  getEventsFromThread()
                                   bindEventsToCore()
                                  bindEventsToThread()




Tuesday, December 14, 2010
Evaluation
                     <flow>                                                   <flow>
                                                B                        D
                                                              C
                               A                        B
                                                    A
                                                D
                                   C



                                       DAG              Parallel



        <sequence>                           A              Inc
                                                                             <while>
                               B
                                             C                    Test

                                   D



                                   Sequential           Loop




Tuesday, December 14, 2010
Hardware Setup

                     6 cores, 3 cache levels, 1 last level cache

                                            L3 Cache
                                  L2   L2   L2   L2    L2   L2
                             2x   L1   L1   L1   L1    L1   L1
                                  C1   C2   C3   C4    C5   C6




Tuesday, December 14, 2010
Experimental Setup

                Concurrent Business
                 Process Instances          Up to 30’000


                             OS Threads
                                                 k

                     Hardware Cores
                                                12


Tuesday, December 14, 2010
Thread-level Parallelism
                                 How many threads?

                      Just increase the number of
                      parallel concurrent threads
                     in the pools for an increasing
                          number of instances?



Tuesday, December 14, 2010
Thread-level Parallelism
                                                               Just increasing the number of threads...

                                                        1800                                                           ForEach
                                                                                                                     Sequential
                                                        1600                                                            Parallel
                                                                                                                          Loop
      Throughput (req/s)




                                                        1400
                           Throughput (Instances/sec)




                                                        1200

                                                        1000

                                                        800

                                                        600

                                                        400

                                                        200

                                                               0    20     40      60           80             100    120          140
                                                                                Number of threads (per pool)
                                                                                # of threads
Tuesday, December 14, 2010
Experimental Setup

                Concurrent Business
                 Process Instances          Up to 30’000


                             OS Threads
                                                24

                     Hardware Cores
                                                12


Tuesday, December 14, 2010
Experimental Setup
                      6 cores, 3 cache levels, 1 last level cache

                                               L3 Cache
                                   L2     L2   L2    L2   L2   L2
                             2x    L1     L1   L1    L1   L1   L1
                                  C1      C2   C3    C4   C5   C6

                                       2 Thread pools:

                              Kernel                Invoker


Tuesday, December 14, 2010
CPU Affinity Binding
                                             Policy 1: Default

            Unconstrained scheduling of threads by the OS

                                  L3 Cache                            L3 Cache
                L2           L2   L2   L2    L2   L2       L2    L2   L2   L2    L2   L2
                L1           L1   L1   L1    L1   L1       L1    L1   L1   L1    L1   L1

                C1       C2       C3   C4    C5   C6       C1    C2   C3   C4    C5   C6




Tuesday, December 14, 2010
CPU Affinity Binding
                                             Policy 2: per CPU
                    Constrain each thread pool within a CPU

                                  L3 Cache                          L3 Cache
                L2           L2   L2   L2     L2   L2     L2   L2   L2   L2    L2   L2
                L1           L1   L1   L1     L1   L1     L1   L1   L1   L1    L1   L1

                C1       C2       C3   C4    C5    C6     C1   C2   C3   C4    C5   C6




Tuesday, December 14, 2010
CPU Affinity Binding
                                             Policy 3: per Core
       Policy 2 + Constrain each thread on a specific core

                                  L3 Cache                          L3 Cache
                L2           L2   L2   L2     L2   L2     L2   L2   L2   L2    L2   L2
                L1           L1   L1   L1     L1   L1     L1   L1   L1   L1    L1   L1

                C1       C2       C3   C4    C5    C6     C1   C2   C3   C4    C5   C6




Tuesday, December 14, 2010
CPU Affinity Binding
                                            Policy 4: Interleaved
                                  Mix thread pools across CPUs

                                  L3 Cache                           L3 Cache
                L2           L2   L2   L2     L2   L2      L2   L2   L2   L2    L2   L2
                L1           L1   L1   L1     L1   L1      L1   L1   L1   L1    L1   L1

                C1       C2       C3   C4    C5    C6      C1   C2   C3   C4    C5   C6




Tuesday, December 14, 2010
Experimental Setup

                Concurrent Business
                 Process Instances          Up to 30’000


                             OS Threads
                                                24

                     Hardware Cores
                                                12


Tuesday, December 14, 2010
Performance Layers

                Concurrent Business
                 Process Instances          5’000 - 30’000
                          Throughput, Walltime, ...


                             OS Threads
                                                     24

                        Hardware Performance Counters:
                    Hardware Thread Migrations, Context sw, ...
                   Cache miss, Cores
                                                     12


Tuesday, December 14, 2010
Experimental Results
                                     Relative Speedup with 30k instances
                                                        30000 Instances
                               1.3                                                   Default
                                                                                    Per CPU
                                                                                    Per core
                               1.2                                               Interleaved
            Relative Speedup




                               1.1


                                1


                               0.9


                               0.8


                               0.7
                                      DAG    Parallel    Sequential       Loop    Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
HPC-Based Validation

                              Ineffective sw prefetches
                             A prefetch request for a memory
                               address already in the cache

                            L3 cache evictions
                     Data that needs to be stored in the
                   cache is bigger than free available space




Tuesday, December 14, 2010
Experimental Results
                                                             Ineffective SW prefetches
                                                                           30000 Instances
                                                 1.2                                                    Default
                                                                                                       Per CPU
            Relative Ineffective SW Prefetches




                                                                                                       Per core
                                                 1.1                                                Interleaved



                                                  1


                                                 0.9


                                                 0.8


                                                 0.7
                                                       DAG      Parallel    Sequential       Loop    Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                                        L3 Cache evictions
                                                                   30000 Instances
                                              Default
                                     2       Per CPU
                                             Per core
                                    1.8   Interleaved
            Relative L3 Evictions




                                    1.6

                                    1.4

                                    1.2

                                     1

                                    0.8

                                    0.6

                                    0.4
                                            DAG         Parallel    Sequential       Loop   Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                     Relative Speedup with 5k instances
                                                        10000 Instances
                               1.2                                                   Default
                                                                                    Per CPU
                                                                                    Per core
                               1.1                                               Interleaved
            Relative Speedup




                                1


                               0.9


                               0.8


                               0.7
                                     DAG     Parallel    Sequential       Loop    Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                                             Ineffective SW prefetches
                                                                           10000 Instances
                                                 1.2                                                    Default
                                                                                                       Per CPU
            Relative Ineffective SW Prefetches




                                                                                                       Per core
                                                 1.1                                                Interleaved



                                                  1


                                                 0.9


                                                 0.8


                                                 0.7
                                                       DAG      Parallel    Sequential       Loop    Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                                        L3 Cache evictions
                                                                   10000 Instances
                                    1.8       Default
                                             Per CPU
                                    1.6      Per core
                                          Interleaved
            Relative L3 Evictions




                                    1.4

                                    1.2

                                     1

                                    0.8

                                    0.6

                                    0.4

                                            DAG         Parallel    Sequential       Loop   Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                     Relative Speedup with 10k instances
                                                        5000 Instances
                               1.3                                                  Default
                                                                                   Per CPU
                                                                                   Per core
                               1.2                                              Interleaved
            Relative Speedup




                               1.1


                                1


                               0.9


                               0.8


                               0.7
                                      DAG    Parallel   Sequential       Loop    Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                                             Ineffective SW prefetches
                                                                           5000 Instances
                                                 1.2                                                   Default
                                                                                                      Per CPU
            Relative Ineffective SW Prefetches




                                                                                                      Per core
                                                 1.1                                               Interleaved



                                                  1


                                                 0.9


                                                 0.8


                                                 0.7
                                                       DAG      Parallel   Sequential       Loop    Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                                                        L3 Cache evictions
                                                                   5000 Instances
                                              Default
                                     2       Per CPU
                                             Per core
                                    1.8   Interleaved
            Relative L3 Evictions




                                    1.6

                                    1.4

                                    1.2

                                     1

                                    0.8

                                    0.6

                                    0.4
                                            DAG         Parallel   Sequential       Loop   Geomean



         2 x AMD Barcelona 6 cores processors with 2 LLC
Tuesday, December 14, 2010
Experimental Results
                              Correlation Coefficients
                        (Hardware events - JOpera throughput)

                     Workload Size        Ineffective   L3 Cache
                  (Number of Instances)    SW Pref      Evictions

                              5000         0.9842        0.9456
                             10000         0.9125        0.9883
                             30000         0.9661        0.9946




Tuesday, December 14, 2010
Conclusion

               • Multicore machines offer powerful hardware
                      parallelism, but what matters is not just the
                      number of PEs
               • The performance depends on how a limited
                      amount of threads are mapped to the HW
               • Multicore Aware Thread Scheduling
                      significantly impacts the performance (up to
                      10% speedup)


Tuesday, December 14, 2010
Thank you!
                                OverHPC Library:
                               http://sosoa.inf.usi.ch

                  JOpera business process execution engine:
                           http://www.jopera.org

                                    Twitter:
                                  @jopera_org

                                        me:
                              daniele.bonetta@usi.ch


Tuesday, December 14, 2010

Weitere ähnliche Inhalte

Was ist angesagt?

Rawsthorne Dan - from usability studies to stories
Rawsthorne Dan -  from usability studies to storiesRawsthorne Dan -  from usability studies to stories
Rawsthorne Dan - from usability studies to storiesMagneta AI
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
CRM_ AIS Case Study
CRM_ AIS Case StudyCRM_ AIS Case Study
CRM_ AIS Case StudyLita Pena
 
Multinational Corporations - Banking Landscape
Multinational Corporations - Banking LandscapeMultinational Corporations - Banking Landscape
Multinational Corporations - Banking LandscapeGherda Stephens
 

Was ist angesagt? (7)

Rawsthorne Dan - from usability studies to stories
Rawsthorne Dan -  from usability studies to storiesRawsthorne Dan -  from usability studies to stories
Rawsthorne Dan - from usability studies to stories
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
CRM_ AIS Case Study
CRM_ AIS Case StudyCRM_ AIS Case Study
CRM_ AIS Case Study
 
Wpf Tech Overview2009
Wpf Tech Overview2009Wpf Tech Overview2009
Wpf Tech Overview2009
 
Multinational Corporations - Banking Landscape
Multinational Corporations - Banking LandscapeMultinational Corporations - Banking Landscape
Multinational Corporations - Banking Landscape
 
Gu3112991305
Gu3112991305Gu3112991305
Gu3112991305
 
Lifetime Case Study
Lifetime Case StudyLifetime Case Study
Lifetime Case Study
 

Ähnlich wie Exploiting Multicores to Optimize Business Process Execution

2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...
2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...
2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...Club Cloud des Partenaires
 
When Content Meets Applications
When Content Meets ApplicationsWhen Content Meets Applications
When Content Meets ApplicationsCraig Randall
 
Service Catalog & Request Fulfillment, the cornerstone of IT Service Management
Service Catalog & Request Fulfillment, the cornerstone of IT Service ManagementService Catalog & Request Fulfillment, the cornerstone of IT Service Management
Service Catalog & Request Fulfillment, the cornerstone of IT Service ManagementBMC Software
 
Manage Agility through Manage-ability – Introducing Design Time at Run Time ...
Manage Agility through Manage-ability – Introducing Design Time at Run Time ...Manage Agility through Manage-ability – Introducing Design Time at Run Time ...
Manage Agility through Manage-ability – Introducing Design Time at Run Time ...Lucas Jellema
 
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...Real-Time Innovations (RTI)
 
Apache Etch Introduction @ FOSDEM 2011
Apache Etch Introduction @ FOSDEM 2011Apache Etch Introduction @ FOSDEM 2011
Apache Etch Introduction @ FOSDEM 2011grandyho
 
OSGi Remote Services With Sca
OSGi Remote Services With ScaOSGi Remote Services With Sca
OSGi Remote Services With Scamfrancis
 
Michael Adobe Flex Java 1 London
Michael Adobe Flex Java 1 LondonMichael Adobe Flex Java 1 London
Michael Adobe Flex Java 1 LondonSkills Matter
 
Bank 2.0 -- Making It Happen
Bank 2.0 -- Making It HappenBank 2.0 -- Making It Happen
Bank 2.0 -- Making It HappenBackbase
 
Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...
Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...
Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...Lucas Jellema
 
2010 Software Licensing and Pricing Survey Results and 2011 Predictions
2010 Software Licensing and Pricing Survey Results and 2011 Predictions2010 Software Licensing and Pricing Survey Results and 2011 Predictions
2010 Software Licensing and Pricing Survey Results and 2011 PredictionsFlexera
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationC4Media
 
Embracing the Convergence of IT Service & Asset Management
Embracing the Convergence of IT Service & Asset ManagementEmbracing the Convergence of IT Service & Asset Management
Embracing the Convergence of IT Service & Asset ManagementBMC Software
 
Performance Management In The New Frontier Of Rich Internet Applications
Performance Management In The New Frontier Of Rich Internet ApplicationsPerformance Management In The New Frontier Of Rich Internet Applications
Performance Management In The New Frontier Of Rich Internet ApplicationsBen Rushlo
 
BMC Remedy ITSM 8.0 What's New
BMC Remedy ITSM 8.0 What's NewBMC Remedy ITSM 8.0 What's New
BMC Remedy ITSM 8.0 What's NewBMC Software
 
OrchestratedFUEL Overview Presentation
OrchestratedFUEL Overview PresentationOrchestratedFUEL Overview Presentation
OrchestratedFUEL Overview PresentationOrchestra LLC
 
El video en un mundo de colaboración
El video en un mundo de colaboraciónEl video en un mundo de colaboración
El video en un mundo de colaboraciónMundo Contact
 
OnePointech company profile
OnePointech company profileOnePointech company profile
OnePointech company profileOnePointech
 

Ähnlich wie Exploiting Multicores to Optimize Business Process Execution (20)

2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...
2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...
2012.11.20 - Managed Service Providers - RV des Experts du Club Cloud des Par...
 
Ispcms.ppt
Ispcms.pptIspcms.ppt
Ispcms.ppt
 
When Content Meets Applications
When Content Meets ApplicationsWhen Content Meets Applications
When Content Meets Applications
 
Service Catalog & Request Fulfillment, the cornerstone of IT Service Management
Service Catalog & Request Fulfillment, the cornerstone of IT Service ManagementService Catalog & Request Fulfillment, the cornerstone of IT Service Management
Service Catalog & Request Fulfillment, the cornerstone of IT Service Management
 
Manage Agility through Manage-ability – Introducing Design Time at Run Time ...
Manage Agility through Manage-ability – Introducing Design Time at Run Time ...Manage Agility through Manage-ability – Introducing Design Time at Run Time ...
Manage Agility through Manage-ability – Introducing Design Time at Run Time ...
 
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...First Operational Technology (OT) High Performance Messaging Patterns for Ent...
First Operational Technology (OT) High Performance Messaging Patterns for Ent...
 
Apache Etch Introduction @ FOSDEM 2011
Apache Etch Introduction @ FOSDEM 2011Apache Etch Introduction @ FOSDEM 2011
Apache Etch Introduction @ FOSDEM 2011
 
OSGi Remote Services With Sca
OSGi Remote Services With ScaOSGi Remote Services With Sca
OSGi Remote Services With Sca
 
Michael Adobe Flex Java 1 London
Michael Adobe Flex Java 1 LondonMichael Adobe Flex Java 1 London
Michael Adobe Flex Java 1 London
 
Bank 2.0 -- Making It Happen
Bank 2.0 -- Making It HappenBank 2.0 -- Making It Happen
Bank 2.0 -- Making It Happen
 
Tps company profile
Tps company profileTps company profile
Tps company profile
 
Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...
Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...
Instant Agility in Oracle Fusion Middleware through Design Time @ Run Time (O...
 
2010 Software Licensing and Pricing Survey Results and 2011 Predictions
2010 Software Licensing and Pricing Survey Results and 2011 Predictions2010 Software Licensing and Pricing Survey Results and 2011 Predictions
2010 Software Licensing and Pricing Survey Results and 2011 Predictions
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful Degradation
 
Embracing the Convergence of IT Service & Asset Management
Embracing the Convergence of IT Service & Asset ManagementEmbracing the Convergence of IT Service & Asset Management
Embracing the Convergence of IT Service & Asset Management
 
Performance Management In The New Frontier Of Rich Internet Applications
Performance Management In The New Frontier Of Rich Internet ApplicationsPerformance Management In The New Frontier Of Rich Internet Applications
Performance Management In The New Frontier Of Rich Internet Applications
 
BMC Remedy ITSM 8.0 What's New
BMC Remedy ITSM 8.0 What's NewBMC Remedy ITSM 8.0 What's New
BMC Remedy ITSM 8.0 What's New
 
OrchestratedFUEL Overview Presentation
OrchestratedFUEL Overview PresentationOrchestratedFUEL Overview Presentation
OrchestratedFUEL Overview Presentation
 
El video en un mundo de colaboración
El video en un mundo de colaboraciónEl video en un mundo de colaboración
El video en un mundo de colaboración
 
OnePointech company profile
OnePointech company profileOnePointech company profile
OnePointech company profile
 

Mehr von Cesare Pautasso

Beautiful APIs - SOSE2021 Keynote
Beautiful APIs - SOSE2021 KeynoteBeautiful APIs - SOSE2021 Keynote
Beautiful APIs - SOSE2021 KeynoteCesare Pautasso
 
How do you back up and consistently recover your microservice architecture?
How do you back up and consistently recover your microservice architecture?How do you back up and consistently recover your microservice architecture?
How do you back up and consistently recover your microservice architecture?Cesare Pautasso
 
Microservices: An Eventually Inconsistent Architectural Style?
Microservices: An Eventually Inconsistent Architectural Style?Microservices: An Eventually Inconsistent Architectural Style?
Microservices: An Eventually Inconsistent Architectural Style?Cesare Pautasso
 
Disaster Recovery and Microservices: The BAC Theorem
Disaster Recovery and Microservices: The BAC TheoremDisaster Recovery and Microservices: The BAC Theorem
Disaster Recovery and Microservices: The BAC TheoremCesare Pautasso
 
The Blockchain as a Software Connector
The Blockchain as a Software ConnectorThe Blockchain as a Software Connector
The Blockchain as a Software ConnectorCesare Pautasso
 
Team Situational Awareness and Architectural Decision Making with the Softwar...
Team Situational Awareness and Architectural Decision Making with the Softwar...Team Situational Awareness and Architectural Decision Making with the Softwar...
Team Situational Awareness and Architectural Decision Making with the Softwar...Cesare Pautasso
 
JOpera - Eclipse-based Visual Composition Environment featuring a general lan...
JOpera - Eclipse-based Visual Composition Environment featuring a general lan...JOpera - Eclipse-based Visual Composition Environment featuring a general lan...
JOpera - Eclipse-based Visual Composition Environment featuring a general lan...Cesare Pautasso
 
Push-Enabling RESTful Business Processes
Push-Enabling RESTful Business ProcessesPush-Enabling RESTful Business Processes
Push-Enabling RESTful Business ProcessesCesare Pautasso
 
Atomic Transactions for the REST of us
Atomic Transactions for the REST of usAtomic Transactions for the REST of us
Atomic Transactions for the REST of usCesare Pautasso
 
Service Oriented Architectures and Web Services
Service Oriented Architectures and Web ServicesService Oriented Architectures and Web Services
Service Oriented Architectures and Web ServicesCesare Pautasso
 
Real-time Mashups di Web Service Geografici
Real-time Mashups di Web Service GeograficiReal-time Mashups di Web Service Geografici
Real-time Mashups di Web Service GeograficiCesare Pautasso
 
Towards Scalable Service Composition on Multicores
Towards Scalable Service Composition on MulticoresTowards Scalable Service Composition on Multicores
Towards Scalable Service Composition on MulticoresCesare Pautasso
 
WS-* vs. RESTful Services
WS-* vs. RESTful ServicesWS-* vs. RESTful Services
WS-* vs. RESTful ServicesCesare Pautasso
 
RESTful Service Composition with JOpera
RESTful Service Composition with JOperaRESTful Service Composition with JOpera
RESTful Service Composition with JOperaCesare Pautasso
 
USI SCUBE Associate Member
USI SCUBE Associate MemberUSI SCUBE Associate Member
USI SCUBE Associate MemberCesare Pautasso
 
Lighweight Collaboration Management (Mashups09@OOPSLA)
Lighweight Collaboration Management (Mashups09@OOPSLA)Lighweight Collaboration Management (Mashups09@OOPSLA)
Lighweight Collaboration Management (Mashups09@OOPSLA)Cesare Pautasso
 

Mehr von Cesare Pautasso (20)

Beautiful APIs - SOSE2021 Keynote
Beautiful APIs - SOSE2021 KeynoteBeautiful APIs - SOSE2021 Keynote
Beautiful APIs - SOSE2021 Keynote
 
How do you back up and consistently recover your microservice architecture?
How do you back up and consistently recover your microservice architecture?How do you back up and consistently recover your microservice architecture?
How do you back up and consistently recover your microservice architecture?
 
Microservices: An Eventually Inconsistent Architectural Style?
Microservices: An Eventually Inconsistent Architectural Style?Microservices: An Eventually Inconsistent Architectural Style?
Microservices: An Eventually Inconsistent Architectural Style?
 
Disaster Recovery and Microservices: The BAC Theorem
Disaster Recovery and Microservices: The BAC TheoremDisaster Recovery and Microservices: The BAC Theorem
Disaster Recovery and Microservices: The BAC Theorem
 
The Blockchain as a Software Connector
The Blockchain as a Software ConnectorThe Blockchain as a Software Connector
The Blockchain as a Software Connector
 
Team Situational Awareness and Architectural Decision Making with the Softwar...
Team Situational Awareness and Architectural Decision Making with the Softwar...Team Situational Awareness and Architectural Decision Making with the Softwar...
Team Situational Awareness and Architectural Decision Making with the Softwar...
 
JOpera - Eclipse-based Visual Composition Environment featuring a general lan...
JOpera - Eclipse-based Visual Composition Environment featuring a general lan...JOpera - Eclipse-based Visual Composition Environment featuring a general lan...
JOpera - Eclipse-based Visual Composition Environment featuring a general lan...
 
Push-Enabling RESTful Business Processes
Push-Enabling RESTful Business ProcessesPush-Enabling RESTful Business Processes
Push-Enabling RESTful Business Processes
 
BPMN for REST
BPMN for RESTBPMN for REST
BPMN for REST
 
SOA with REST
SOA with RESTSOA with REST
SOA with REST
 
Atomic Transactions for the REST of us
Atomic Transactions for the REST of usAtomic Transactions for the REST of us
Atomic Transactions for the REST of us
 
Service Oriented Architectures and Web Services
Service Oriented Architectures and Web ServicesService Oriented Architectures and Web Services
Service Oriented Architectures and Web Services
 
Real-time Mashups di Web Service Geografici
Real-time Mashups di Web Service GeograficiReal-time Mashups di Web Service Geografici
Real-time Mashups di Web Service Geografici
 
Towards Scalable Service Composition on Multicores
Towards Scalable Service Composition on MulticoresTowards Scalable Service Composition on Multicores
Towards Scalable Service Composition on Multicores
 
BPM with REST
BPM with RESTBPM with REST
BPM with REST
 
WS-* vs. RESTful Services
WS-* vs. RESTful ServicesWS-* vs. RESTful Services
WS-* vs. RESTful Services
 
RESTful Service Composition with JOpera
RESTful Service Composition with JOperaRESTful Service Composition with JOpera
RESTful Service Composition with JOpera
 
SOA2010 SOA with REST
SOA2010 SOA with RESTSOA2010 SOA with REST
SOA2010 SOA with REST
 
USI SCUBE Associate Member
USI SCUBE Associate MemberUSI SCUBE Associate Member
USI SCUBE Associate Member
 
Lighweight Collaboration Management (Mashups09@OOPSLA)
Lighweight Collaboration Management (Mashups09@OOPSLA)Lighweight Collaboration Management (Mashups09@OOPSLA)
Lighweight Collaboration Management (Mashups09@OOPSLA)
 

Kürzlich hochgeladen

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Kürzlich hochgeladen (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Exploiting Multicores to Optimize Business Process Execution

  • 1. Exploiting Multicores to Optimize Business Process Execution Daniele Bonetta, Achille Peternier, Cesare Pautasso, Walter Binder Faculty of Informatics University of Lugano - USI Switzerland http://sosoa.inf.usi.ch daniele.bonetta@usi.ch Tuesday, December 14, 2010
  • 2. BP Execution Engine Focus: Business Process Runtime Execution Environment Web Service Composite BP Web Client Web Execution Service Service Engine Web Service Tuesday, December 14, 2010
  • 3. How to scale? Client ClientClient Web Client Client Service Client Web Composite BP Client Client Service Web Execution Client Service Engine Web Client Service Client Client Client Client Tuesday, December 14, 2010
  • 4. Client How to scale? Client Client Client Client Client Client Client Client Client Client Web ent Client Client Service ent Client Client Web Client Composite BP Client Client Service ent Web Execution Client Client Service Web Client Engine ient Client Client Service Client Client Client Client Client Client Client Client Client ClientClient Client Client Client Client Tuesday, December 14, 2010
  • 5. Client How to scale? Client Client Client Client Client Client Client Client Client Client Web ent Client Client Service ent Client Client Web Client Composite Service More clients == More BP Instances Client Client Web Composition Service ent Client Client Client Service Engine Web ient Client Client Service Client Client Client Client Client Client Client Client Client ClientClient Client Client Client Client Tuesday, December 14, 2010
  • 6. Client How to scale? Client Client Client Client Client Client Client Client Client Client Web ent Client Client Service ent Client Client Web Client Composite Service More clients <= More BP Instances Client Client Web Composition Service ent Client Client Client Service Engine Web ient Client Client Service Client Client Client Client Client Client Client Client Client ClientClient Client Client Client Client Tuesday, December 14, 2010
  • 7. Multicores core core core core core core core core core core core core core core IBM Power7 Tuesday, December 14, 2010
  • 8. Outline 1. Multicore Issues 2. JOpera Business Process Execution Engine 1. Thread Level Parallelism 2. CPU/Core Level Parallelism 3. Experimental Results 4. Conclusion Tuesday, December 14, 2010
  • 9. Multicore Issues • Number of cores • Type of cores (e.g. SMT) • On Chip Caching Layout (e.g. L2, L3...) • On Board Memory Layout (e.g. NUMA, NUMA-CC, ...) Tuesday, December 14, 2010
  • 10. Multicore Issues • Cores Num Th Migrations, • Cores Type Ctx Switches • Cache Layout • Memory Layout Tuesday, December 14, 2010
  • 11. Multicore Issues • Cores Num Th Migrations, • Cores Type Ctx Switches • Cache Layout Data Locality, • Memory Layout Contention Tuesday, December 14, 2010
  • 12. BP Execution Engine Java Business Process Execution Engine Tuesday, December 14, 2010
  • 13. 3 Layers Approach Concurrent Business Process Instances OS Threads Hardware Cores Tuesday, December 14, 2010
  • 14. Abstraction Layers Concurrent Business Process Instances OS Threads Hardware Cores Tuesday, December 14, 2010
  • 15. Engine Architecture Request Kernel Invoker Handler Request Execution Queue Queue Tuesday, December 14, 2010
  • 16. Abstraction Layers Concurrent Business Process Instances OS Threads Hardware Cores Tuesday, December 14, 2010
  • 17. Engine Architecture Kernel Invoker Request Handler Request Execution Queue Queue Tuesday, December 14, 2010
  • 18. BP Execution Kernel Invoker Request Handler Request Execution Queue Queue Tuesday, December 14, 2010
  • 19. BP Execution Kernel Invoker Request Handler Request Execution Queue Queue Tuesday, December 14, 2010
  • 20. BP Execution Kernel Invoker Request Handler Request Execution Queue Queue Tuesday, December 14, 2010
  • 21. BP Execution Kernel Invoker Request Handler Request Execution Queue Queue Tuesday, December 14, 2010
  • 22. Abstraction Layers Concurrent Business Process Instances OS Threads Hardware Cores Tuesday, December 14, 2010
  • 23. Deployment on Multicores Kernel Invoker Request handler // threads core core core core core core core core core core core core Tuesday, December 14, 2010
  • 24. Deployment on Multicores Kernel Invoker Request handler // threads How? core core core core core core core core core core core core Tuesday, December 14, 2010
  • 25. OverHPC Library Jopera Engine (Java) OverHPC (JNI, C, Java) libpfm Linux Kernel Multicore Hardware Tuesday, December 14, 2010
  • 26. OverHPC Library Jopera Engine (Java) 1. Control and Change (JNI, C, Java) scheduling OverHPC per-thread libpfm Linux Kernel 2. Measure low level thread performance data Multicore Hardware Tuesday, December 14, 2010
  • 27. OverHPC Library API 1) Control and Change per-thread scheduling Thread-Core Dynamic Affinity Binding getThreadPID() getThreadAffinity() setThreadAffinity() getAffinityInfo() Tuesday, December 14, 2010
  • 28. OverHPC Library API 2) Measure low level thread performance: Hardware Performance Counters getEventsFromCache() getEventsFromThread() bindEventsToCore() bindEventsToThread() Tuesday, December 14, 2010
  • 29. Evaluation <flow> <flow> B D C A B A D C DAG Parallel <sequence> A Inc <while> B C Test D Sequential Loop Tuesday, December 14, 2010
  • 30. Hardware Setup 6 cores, 3 cache levels, 1 last level cache L3 Cache L2 L2 L2 L2 L2 L2 2x L1 L1 L1 L1 L1 L1 C1 C2 C3 C4 C5 C6 Tuesday, December 14, 2010
  • 31. Experimental Setup Concurrent Business Process Instances Up to 30’000 OS Threads k Hardware Cores 12 Tuesday, December 14, 2010
  • 32. Thread-level Parallelism How many threads? Just increase the number of parallel concurrent threads in the pools for an increasing number of instances? Tuesday, December 14, 2010
  • 33. Thread-level Parallelism Just increasing the number of threads... 1800 ForEach Sequential 1600 Parallel Loop Throughput (req/s) 1400 Throughput (Instances/sec) 1200 1000 800 600 400 200 0 20 40 60 80 100 120 140 Number of threads (per pool) # of threads Tuesday, December 14, 2010
  • 34. Experimental Setup Concurrent Business Process Instances Up to 30’000 OS Threads 24 Hardware Cores 12 Tuesday, December 14, 2010
  • 35. Experimental Setup 6 cores, 3 cache levels, 1 last level cache L3 Cache L2 L2 L2 L2 L2 L2 2x L1 L1 L1 L1 L1 L1 C1 C2 C3 C4 C5 C6 2 Thread pools: Kernel Invoker Tuesday, December 14, 2010
  • 36. CPU Affinity Binding Policy 1: Default Unconstrained scheduling of threads by the OS L3 Cache L3 Cache L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 Tuesday, December 14, 2010
  • 37. CPU Affinity Binding Policy 2: per CPU Constrain each thread pool within a CPU L3 Cache L3 Cache L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 Tuesday, December 14, 2010
  • 38. CPU Affinity Binding Policy 3: per Core Policy 2 + Constrain each thread on a specific core L3 Cache L3 Cache L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 Tuesday, December 14, 2010
  • 39. CPU Affinity Binding Policy 4: Interleaved Mix thread pools across CPUs L3 Cache L3 Cache L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6 Tuesday, December 14, 2010
  • 40. Experimental Setup Concurrent Business Process Instances Up to 30’000 OS Threads 24 Hardware Cores 12 Tuesday, December 14, 2010
  • 41. Performance Layers Concurrent Business Process Instances 5’000 - 30’000 Throughput, Walltime, ... OS Threads 24 Hardware Performance Counters: Hardware Thread Migrations, Context sw, ... Cache miss, Cores 12 Tuesday, December 14, 2010
  • 42. Experimental Results Relative Speedup with 30k instances 30000 Instances 1.3 Default Per CPU Per core 1.2 Interleaved Relative Speedup 1.1 1 0.9 0.8 0.7 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 43. HPC-Based Validation Ineffective sw prefetches A prefetch request for a memory address already in the cache L3 cache evictions Data that needs to be stored in the cache is bigger than free available space Tuesday, December 14, 2010
  • 44. Experimental Results Ineffective SW prefetches 30000 Instances 1.2 Default Per CPU Relative Ineffective SW Prefetches Per core 1.1 Interleaved 1 0.9 0.8 0.7 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 45. Experimental Results L3 Cache evictions 30000 Instances Default 2 Per CPU Per core 1.8 Interleaved Relative L3 Evictions 1.6 1.4 1.2 1 0.8 0.6 0.4 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 46. Experimental Results Relative Speedup with 5k instances 10000 Instances 1.2 Default Per CPU Per core 1.1 Interleaved Relative Speedup 1 0.9 0.8 0.7 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 47. Experimental Results Ineffective SW prefetches 10000 Instances 1.2 Default Per CPU Relative Ineffective SW Prefetches Per core 1.1 Interleaved 1 0.9 0.8 0.7 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 48. Experimental Results L3 Cache evictions 10000 Instances 1.8 Default Per CPU 1.6 Per core Interleaved Relative L3 Evictions 1.4 1.2 1 0.8 0.6 0.4 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 49. Experimental Results Relative Speedup with 10k instances 5000 Instances 1.3 Default Per CPU Per core 1.2 Interleaved Relative Speedup 1.1 1 0.9 0.8 0.7 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 50. Experimental Results Ineffective SW prefetches 5000 Instances 1.2 Default Per CPU Relative Ineffective SW Prefetches Per core 1.1 Interleaved 1 0.9 0.8 0.7 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 51. Experimental Results L3 Cache evictions 5000 Instances Default 2 Per CPU Per core 1.8 Interleaved Relative L3 Evictions 1.6 1.4 1.2 1 0.8 0.6 0.4 DAG Parallel Sequential Loop Geomean 2 x AMD Barcelona 6 cores processors with 2 LLC Tuesday, December 14, 2010
  • 52. Experimental Results Correlation Coefficients (Hardware events - JOpera throughput) Workload Size Ineffective L3 Cache (Number of Instances) SW Pref Evictions 5000 0.9842 0.9456 10000 0.9125 0.9883 30000 0.9661 0.9946 Tuesday, December 14, 2010
  • 53. Conclusion • Multicore machines offer powerful hardware parallelism, but what matters is not just the number of PEs • The performance depends on how a limited amount of threads are mapped to the HW • Multicore Aware Thread Scheduling significantly impacts the performance (up to 10% speedup) Tuesday, December 14, 2010
  • 54. Thank you! OverHPC Library: http://sosoa.inf.usi.ch JOpera business process execution engine: http://www.jopera.org Twitter: @jopera_org me: daniele.bonetta@usi.ch Tuesday, December 14, 2010