SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
© 2012 IBM Corporation
Towards Approximate Computing:
Programming with Relaxed Synchronization
Lakshminarayanan Renganarayana
Vijayalakshmi Srinivasan
Ravi Nair (presenting)
Dan Prener
IBM T.J. Watson Research Center
October 21, 2012
© 2012 IBM
CorporationRN 10/21/12 2
Emerging Use of Computing
  Emerging important applications
  Search
  Inexact answers to inexact queries
  Proactive delivery of information
  Advertisements, offers, news, alerts, advise
  Media
  Approximate rendition of audio, video, or images
  Stream Processing
  Fast decisions based on possibly incomplete data
  These applications benefit in cost-performance through
the use of prediction techniques
  They allow trading off accuracy of results for speed of
delivering results
© 2012 IBM
CorporationRN 10/21/12 3
Approximate Computing
  Current techniques
  Exact (hence energy-inefficient) computation done on exact
(hence expensive) hardware to produce results that need not be
exact
  With increasing unreliability and variability of devices in new
generations of technology, the cost of such computation will
increase dramatically
  What is needed: Approximate Computing
  Computational models that allow computational results to fall in
some acceptable range, and
  Hardware that produces results that fall in some acceptable
range
© 2012 IBM
CorporationRN 10/21/12 4
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today
Dimensions of Approximate Computing
© 2012 IBM
CorporationRN 10/21/12 5
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today
Future model
Human Brain
Path to
Efficiency
Improving Efficiency through Approximation
© 2012 IBM
CorporationRN 10/21/12 6
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today Relaxed
Synchronization
Human Brain
Path to
Efficiency
Relaxed Synchronization
© 2012 IBM
CorporationRN 10/21/12 7
Relaxing Synchronization: Challenges and Opportunities
  Principal uses of synchronization
  Ensuring that all threads see consistent values of shared
variables (e.g. atomic updates with locks)
  Best candidate for relaxing synchronization
  Ensuring that threads reach various points in their
execution in a predictable manner (e.g. barrier)
  Can relax (depending on context)
  Parallel update of data structures (e.g. linked lists)
  Difficult to relax (can lead to fatal crash of the program)
  Key questions about relaxed synchronization
  Will the program produce results that are acceptable?
  If acceptable, would the results be produced in
considerably less time?
© 2012 IBM
CorporationRN 10/21/12 8
  Partition data into k clusters
Randomly
select
K “initial”
means
Create k clusters by
associating every
data point with the
nearest centroid
Recompute
centroids of k
clusters
Iterate until centroids
do not change
8
  Widely used
  Market segmentation, data mining, computer vision, astronomy
Kmeans Clustering
© 2012 IBM
CorporationRN 10/21/12 9
Kmeans Synchronization Overhead
  Kmeans : 8 clusters with large input set
  OpenMP based parallel code
  Parallel execution on an IBM Power7 machine
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8
# of threads
PercentClusteringTime
Synchronization
Compute
Synchronization time as high as 90% for 8 threads
© 2012 IBM
CorporationRN 10/21/12 10
Relaxing Synchronization in Kmeans (NU-MineBench)
  Parallel evaluation of all points moving between clusters
  Synchronization is used to update list of points moving
  Stopping criterion:
} while (delta > threshold && loop++ < 500);
  Number of points moving (delta) < 0.1% of total points (threshold)
  Subject to a maximum of 500 iterations
  This is already an approximation
  What would happen if synchronization were relaxed?
Performance for 8 Clusters
0
10
20
30
40
50
60
2 3 4 5 6 7 8
Number of Threads
ExecutionTime(secs)
0
1
2
3
4
5
6
7
8
9
10
Speedup
Original
Relaxed
Speedup
© 2012 IBM
CorporationRN 10/21/12 11
Performance Improvement
Performance Speedup with Relaxed Synchronization
0
2
4
6
8
10
12
14
4 5 6 7 8 9 10 11 12 13
Number of Clusters
Speedup
2 threads
3 threads
4 threads
5 threads
6 threads
7 threads
8 threads
Significant speedup observed in many combinations.
No combination showed a degradation in performance.
© 2012 IBM
CorporationRN 10/21/12 12
Quality of Solution
  Computed total sum of
distances of all points from
the centroids of clusters
they belong to
  Good indicator of quality of
solution
  Each bar in graph
represents a different
relaxed run
  Non-determinism is a side-
effect of relaxation
Quality of solution is not degraded due to relaxed synchronization
Degradation	
  in	
  Quality	
  of	
  Solution	
  -­‐	
  8	
  clusters,	
  8	
  thds
0.00%
0.05%
0.10%
0.15%
0.20%
0.25%
0.30%
0.35%
1 2 3 4 5 6 7 8 9 10
Run Number
Differencebetween
RelaxedcostandOriginalCost
© 2012 IBM
CorporationRN 10/21/12 13
Graph500
  Breadth-first search (BFS) of a very large scale-free graph
  Representative of certain real-life analytics problems
  E.g. Mail a flyer to all people who are connected directly
and transitively to a certain person or sets of persons
  Experiments performed on version of award-winning IBM
solution
Sample Scale-Free Graphs
© 2012 IBM
CorporationRN 10/21/12 14
Initial Experiment: Relaxing Synchronization
  The parallel examination of nodes on a wavefront
requires an atomic update of a data structure
representing list of visited nodes
  Atomic update is a synchronizing operation
  Question: What would happen if we updated the
data structure with a simple OR, rather than an
atomic-OR operation?
© 2012 IBM
CorporationRN 10/21/12 15
Sample Result
  Each column represents a different relaxed run
  Relaxed results are non-deterministic
  Very few vertices missing
  Results
  Very few vertices with wrong level numbers
  Better than 3x the performance
  In most cases, it was possible to detect all missing vertices by simply
validating the obtained tree
CPUS 8
THREADS 16
SCALE 20
Time taken with accurate sync 0.248 0.248 0.248 0.248 0.248
Time taken with relaxed sync 0.077 0.078 0.078 0.077 0.078
Performance gain with relaxed sync 3.20 3.20 3.20 3.20 3.19
Vertices visited in accurate list 646056 646056 646056 646056 646056
Vertices visited in relaxed list 645985 645991 645990 646002 645993
Vertices common in both 645985 645991 645990 646002 645993
Missing vertices 29 35 34 46 37
Other vertices with wrong level numbers 71 49 66 34 59
© 2012 IBM
CorporationRN 10/21/12 16
Relax-and-Check: A Framework to Exploit Relaxed
Synchronization
  Assume there is a criterion that allows us to estimate the quality of a
solution without adding high cost
  Relax-and-Check
  Modifies an existing program
  Run initially without synchronizations
  Revert to fully synchronized version if quality does not fall within acceptable limits
  Such criteria are often already explicit in programs
  E.g. convergence criteria
  Other “syndromes” can often be identified
© 2012 IBM
CorporationRN 10/21/12 17
Sample Results
  Three benchmarks from
STAMP suite
  Acceptability criteria of results
identical to original
  Significant reduction of run
time due to relaxation of
synchronization requirement
SSCA2
Kmeans (STAMP) Labyrinth
© 2012 IBM
CorporationRN 10/21/12 18
Relax-and-Check Strategy for BFS: A Slight Twist
  Very high correlation
between validation errors
and time to solution
  Possibly because of
redundant visit of vertices
  Also, these occurred
principally when hardware
parameters were not suitable
for problem size
  Use validation errors as
measure of quality
  Validate the solution
produced by approximate
technique
  If number of errors exceeds a
threshold, say 1000, revert to
synchronized solution for
future instances with similar
problem size
Performance vs. validation errors
0
0.5
1
1.5
2
2.5
3
3.5
0 500 1000 1500 2000 2500 3000 3500 4000
Number of validation errors
Performancerelativeto
accuratesolution
scale = 18
scale = 20
scale = 22
scale = 24
Relax-and-Check provides a technique to
exploit the advantages of relaxed
synchronization with a quality safety-net
© 2012 IBM
CorporationRN 10/21/12 19
Applications Suitable for Relax-and-Check
Domain / Applications Examples
Scientific Finite Difference Methods, Successive over-
relaxation, N-body methods
Data Mining, Text Mining, Risk Analysis, Machine
Learning
K-means, Hierarchical clustering, Multinomial
regression, Feature reduction
Numerical optimization, Forecasting, Financial
Modeling, and Mathematical Programming
Non-convex optimization solvers, constrained
optimization, multi-dimensional cubing
EDA VLSI place and route
Graph processing and Social Network Analysis Mesh refinement, Metis style graph partitioning,
Graph traversals
Program analyses in compilers and software
engineering
Iterative data flow analysis
A large majority of applications in these domains choose a good solution from
many correct solutions, and use acceptance criteria such as convergence tests,
fixed-point tests, or boolean acceptability tests
© 2012 IBM
CorporationRN 10/21/12 20
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today
Future model
Human Brain
Path to
Efficiency
Move Towards Attributes of the Human Brain

Weitere ähnliche Inhalte

Ähnlich wie Programming with Relaxed Synchronization

Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
Nesma
 

Ähnlich wie Programming with Relaxed Synchronization (20)

Mixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressMixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of Progress
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalability
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
13 2017.03.30 freeman 7th pvpmc iec 61853 presentation
13 2017.03.30 freeman 7th pvpmc iec 61853 presentation13 2017.03.30 freeman 7th pvpmc iec 61853 presentation
13 2017.03.30 freeman 7th pvpmc iec 61853 presentation
 
Recent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor ModelRecent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor Model
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
Bosch ConnectedWorld 2017: Striving for Zero DPPM
Bosch ConnectedWorld 2017: Striving for Zero DPPMBosch ConnectedWorld 2017: Striving for Zero DPPM
Bosch ConnectedWorld 2017: Striving for Zero DPPM
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Models
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Advanced Econometrics L13-14.pptx
Advanced Econometrics L13-14.pptxAdvanced Econometrics L13-14.pptx
Advanced Econometrics L13-14.pptx
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Umit hw6
Umit hw6Umit hw6
Umit hw6
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop ComputingMichael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Programming with Relaxed Synchronization

  • 1. © 2012 IBM Corporation Towards Approximate Computing: Programming with Relaxed Synchronization Lakshminarayanan Renganarayana Vijayalakshmi Srinivasan Ravi Nair (presenting) Dan Prener IBM T.J. Watson Research Center October 21, 2012
  • 2. © 2012 IBM CorporationRN 10/21/12 2 Emerging Use of Computing   Emerging important applications   Search   Inexact answers to inexact queries   Proactive delivery of information   Advertisements, offers, news, alerts, advise   Media   Approximate rendition of audio, video, or images   Stream Processing   Fast decisions based on possibly incomplete data   These applications benefit in cost-performance through the use of prediction techniques   They allow trading off accuracy of results for speed of delivering results
  • 3. © 2012 IBM CorporationRN 10/21/12 3 Approximate Computing   Current techniques   Exact (hence energy-inefficient) computation done on exact (hence expensive) hardware to produce results that need not be exact   With increasing unreliability and variability of devices in new generations of technology, the cost of such computation will increase dramatically   What is needed: Approximate Computing   Computational models that allow computational results to fall in some acceptable range, and   Hardware that produces results that fall in some acceptable range
  • 4. © 2012 IBM CorporationRN 10/21/12 4 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Dimensions of Approximate Computing
  • 5. © 2012 IBM CorporationRN 10/21/12 5 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Future model Human Brain Path to Efficiency Improving Efficiency through Approximation
  • 6. © 2012 IBM CorporationRN 10/21/12 6 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Relaxed Synchronization Human Brain Path to Efficiency Relaxed Synchronization
  • 7. © 2012 IBM CorporationRN 10/21/12 7 Relaxing Synchronization: Challenges and Opportunities   Principal uses of synchronization   Ensuring that all threads see consistent values of shared variables (e.g. atomic updates with locks)   Best candidate for relaxing synchronization   Ensuring that threads reach various points in their execution in a predictable manner (e.g. barrier)   Can relax (depending on context)   Parallel update of data structures (e.g. linked lists)   Difficult to relax (can lead to fatal crash of the program)   Key questions about relaxed synchronization   Will the program produce results that are acceptable?   If acceptable, would the results be produced in considerably less time?
  • 8. © 2012 IBM CorporationRN 10/21/12 8   Partition data into k clusters Randomly select K “initial” means Create k clusters by associating every data point with the nearest centroid Recompute centroids of k clusters Iterate until centroids do not change 8   Widely used   Market segmentation, data mining, computer vision, astronomy Kmeans Clustering
  • 9. © 2012 IBM CorporationRN 10/21/12 9 Kmeans Synchronization Overhead   Kmeans : 8 clusters with large input set   OpenMP based parallel code   Parallel execution on an IBM Power7 machine 0% 20% 40% 60% 80% 100% 2 3 4 5 6 7 8 # of threads PercentClusteringTime Synchronization Compute Synchronization time as high as 90% for 8 threads
  • 10. © 2012 IBM CorporationRN 10/21/12 10 Relaxing Synchronization in Kmeans (NU-MineBench)   Parallel evaluation of all points moving between clusters   Synchronization is used to update list of points moving   Stopping criterion: } while (delta > threshold && loop++ < 500);   Number of points moving (delta) < 0.1% of total points (threshold)   Subject to a maximum of 500 iterations   This is already an approximation   What would happen if synchronization were relaxed? Performance for 8 Clusters 0 10 20 30 40 50 60 2 3 4 5 6 7 8 Number of Threads ExecutionTime(secs) 0 1 2 3 4 5 6 7 8 9 10 Speedup Original Relaxed Speedup
  • 11. © 2012 IBM CorporationRN 10/21/12 11 Performance Improvement Performance Speedup with Relaxed Synchronization 0 2 4 6 8 10 12 14 4 5 6 7 8 9 10 11 12 13 Number of Clusters Speedup 2 threads 3 threads 4 threads 5 threads 6 threads 7 threads 8 threads Significant speedup observed in many combinations. No combination showed a degradation in performance.
  • 12. © 2012 IBM CorporationRN 10/21/12 12 Quality of Solution   Computed total sum of distances of all points from the centroids of clusters they belong to   Good indicator of quality of solution   Each bar in graph represents a different relaxed run   Non-determinism is a side- effect of relaxation Quality of solution is not degraded due to relaxed synchronization Degradation  in  Quality  of  Solution  -­‐  8  clusters,  8  thds 0.00% 0.05% 0.10% 0.15% 0.20% 0.25% 0.30% 0.35% 1 2 3 4 5 6 7 8 9 10 Run Number Differencebetween RelaxedcostandOriginalCost
  • 13. © 2012 IBM CorporationRN 10/21/12 13 Graph500   Breadth-first search (BFS) of a very large scale-free graph   Representative of certain real-life analytics problems   E.g. Mail a flyer to all people who are connected directly and transitively to a certain person or sets of persons   Experiments performed on version of award-winning IBM solution Sample Scale-Free Graphs
  • 14. © 2012 IBM CorporationRN 10/21/12 14 Initial Experiment: Relaxing Synchronization   The parallel examination of nodes on a wavefront requires an atomic update of a data structure representing list of visited nodes   Atomic update is a synchronizing operation   Question: What would happen if we updated the data structure with a simple OR, rather than an atomic-OR operation?
  • 15. © 2012 IBM CorporationRN 10/21/12 15 Sample Result   Each column represents a different relaxed run   Relaxed results are non-deterministic   Very few vertices missing   Results   Very few vertices with wrong level numbers   Better than 3x the performance   In most cases, it was possible to detect all missing vertices by simply validating the obtained tree CPUS 8 THREADS 16 SCALE 20 Time taken with accurate sync 0.248 0.248 0.248 0.248 0.248 Time taken with relaxed sync 0.077 0.078 0.078 0.077 0.078 Performance gain with relaxed sync 3.20 3.20 3.20 3.20 3.19 Vertices visited in accurate list 646056 646056 646056 646056 646056 Vertices visited in relaxed list 645985 645991 645990 646002 645993 Vertices common in both 645985 645991 645990 646002 645993 Missing vertices 29 35 34 46 37 Other vertices with wrong level numbers 71 49 66 34 59
  • 16. © 2012 IBM CorporationRN 10/21/12 16 Relax-and-Check: A Framework to Exploit Relaxed Synchronization   Assume there is a criterion that allows us to estimate the quality of a solution without adding high cost   Relax-and-Check   Modifies an existing program   Run initially without synchronizations   Revert to fully synchronized version if quality does not fall within acceptable limits   Such criteria are often already explicit in programs   E.g. convergence criteria   Other “syndromes” can often be identified
  • 17. © 2012 IBM CorporationRN 10/21/12 17 Sample Results   Three benchmarks from STAMP suite   Acceptability criteria of results identical to original   Significant reduction of run time due to relaxation of synchronization requirement SSCA2 Kmeans (STAMP) Labyrinth
  • 18. © 2012 IBM CorporationRN 10/21/12 18 Relax-and-Check Strategy for BFS: A Slight Twist   Very high correlation between validation errors and time to solution   Possibly because of redundant visit of vertices   Also, these occurred principally when hardware parameters were not suitable for problem size   Use validation errors as measure of quality   Validate the solution produced by approximate technique   If number of errors exceeds a threshold, say 1000, revert to synchronized solution for future instances with similar problem size Performance vs. validation errors 0 0.5 1 1.5 2 2.5 3 3.5 0 500 1000 1500 2000 2500 3000 3500 4000 Number of validation errors Performancerelativeto accuratesolution scale = 18 scale = 20 scale = 22 scale = 24 Relax-and-Check provides a technique to exploit the advantages of relaxed synchronization with a quality safety-net
  • 19. © 2012 IBM CorporationRN 10/21/12 19 Applications Suitable for Relax-and-Check Domain / Applications Examples Scientific Finite Difference Methods, Successive over- relaxation, N-body methods Data Mining, Text Mining, Risk Analysis, Machine Learning K-means, Hierarchical clustering, Multinomial regression, Feature reduction Numerical optimization, Forecasting, Financial Modeling, and Mathematical Programming Non-convex optimization solvers, constrained optimization, multi-dimensional cubing EDA VLSI place and route Graph processing and Social Network Analysis Mesh refinement, Metis style graph partitioning, Graph traversals Program analyses in compilers and software engineering Iterative data flow analysis A large majority of applications in these domains choose a good solution from many correct solutions, and use acceptance criteria such as convergence tests, fixed-point tests, or boolean acceptability tests
  • 20. © 2012 IBM CorporationRN 10/21/12 20 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Future model Human Brain Path to Efficiency Move Towards Attributes of the Human Brain