SlideShare a Scribd company logo
1 of 12
A FRAMEWORK FOR PRACTICAL
PARALLEL FAST MATRIX MULTIPLICATION
code and paper:
github.com/arbenson/fast-matmul
1
Austin Benson
arbenson@stanford.edu
Stanford University
Joint work with
Grey Ballard, Sandia
SIAM CSE 2015
Salt Lake City, UT 9000 10000 11000 12000 13000
14
16
18
20
22
Dimension (N)
EffectiveGFLOPS/core
Performance (24 cores) on N x N x N
MKL
STRASSEN
S<4,3,3>
<4,2,2>
<3,2,3>
<3,3,2>
<5,2,2>
<2,5,2>
Fast matrix multiplication:
bridging theory and practice
• There are a number of Strassen-like algorithms for matrix
multiplication that have only been “discovered” recently.
[Smirnov13], [Benson&Ballard14]
• How well do they work in practice?
2
32 2.81
[Strassen79]
2.37
[Le Gall14]
xxx xx xx xx
Strassen’s algorithm
3
4
[Smirnov13]
[Strassen69]
All implemented with code generation
Sequential performance =
5
Effective GFLOPS for M x K x N multiplies
= 1e-9 * 2 * MKN / time in seconds
Classical
peak
0 2000 4000 6000 8000
16
18
20
22
24
26
28
Dimension (N)
EffectiveGFLOPS
Sequential performance on N x N x N
MKL
STRASSEN
<4,2,2>
<3,2,3>
<3,3,2>
<5,2,2>
S<4,3,3>
2000 4000 6000 8000 10000 12000
20
22
24
26
28
dimension (N)
EffectiveGFLOPS
Sequential performance on N x 1600 x N
MKL
<4,2,4>
<4,3,3>
<3,2,3>
<4,2,3>
STRASSEN
Sequential performance =
• Almost all algorithms beat MKL
• <4, 2, 4> and <3, 2, 3> tend to perform the best
6
DFS Parallelization
C
M1 M7
+
M2 …
M1 M7
+
M2 …
All threads
Use parallel MKL
+ Easy to implement
- Need large base
cases for high
performance
7
BFS Parallelization
C
M1 M7
+
M2 …
M1 M7
+
M2 …
omp taskwait
omp taskwait
1 thread
+ High performance for smaller base cases
- Sometimes harder to load balance: 24 threads, 49 subproblems
- More memory
1 thread 1 thread
8
HYBRID parallelization
C
M1 M7
+
M2 …
M1 M7
+
M2 …
omp taskwait
omp taskwait
1 thread 1 thread all threads
+ Better load balancing
- Explicit synchronization or else we can over-subscribe threads
9
Parallel performance =
10
9000 10000 11000 12000 13000
18
20
22
24
26
28
Dimension (N)
EffectiveGFLOPS/core
Performance (6 cores) on N x N x N
MKL
STRASSEN
S<4,3,3>
<4,2,2>
<3,2,3>
<3,3,2>
<5,2,2>
<2,5,2>
9000 10000 11000 12000 13000
14
16
18
20
22
Dimension (N)
EffectiveGFLOPS/core
Performance (24 cores) on N x N x N
MKL
STRASSEN
S<4,3,3>
<4,2,2>
<3,2,3>
<3,3,2>
<5,2,2>
<2,5,2>
• 6 cores: similar performance to sequential
• 24 cores: can sometimes edge out MKL
10000 15000 20000 10000 15000
18
19
20
21
22
23
24
dimension (N)
EffectiveGFLOPS/core
Performance (6 cores) on N x 2800 x N
MKL
<4,2,4>
<4,3,3>
<3,2,3>
<4,2,3>
STRASSEN
5000 10000 15000 20000
12
14
16
18
20
dimension (N)
EffectiveGFLOPS/core
Performance (24 cores) on N x 2800 x N
MKL
<4,2,4>
<4,3,3>
<3,2,3>
<4,2,3>
STRASSEN
Parallel performance =
• 6 cores: similar performance to sequential
• 24 cores: MKL best for large problems
11
A FRAMEWORK FOR PRACTICAL
PARALLEL FAST MATRIX MULTIPLICATION
code and paper:
github.com/arbenson/fast-matmul
12
Austin Benson
arbenson@stanford.edu
Stanford University
Joint work with
Grey Ballard, Sandia
SIAM CSE 2015
Salt Lake City, UT 9000 10000 11000 12000 13000
14
16
18
20
22
Dimension (N)
EffectiveGFLOPS/core
Performance (24 cores) on N x N x N
MKL
STRASSEN
S<4,3,3>
<4,2,2>
<3,2,3>
<3,3,2>
<5,2,2>
<2,5,2>

More Related Content

What's hot

Karatsuba algorithm for fast mltiplication
Karatsuba algorithm for fast mltiplicationKaratsuba algorithm for fast mltiplication
Karatsuba algorithm for fast mltiplicationAtul Singh
 
My presentation minimum spanning tree
My presentation minimum spanning treeMy presentation minimum spanning tree
My presentation minimum spanning treeAlona Salva
 
Section 3.8 solving equations and formulas (alg)
Section 3.8 solving equations and formulas (alg)Section 3.8 solving equations and formulas (alg)
Section 3.8 solving equations and formulas (alg)Algebra / Mathematics
 
Assignment 2 DUM 3272 MATHEMATICS 3
Assignment 2 DUM 3272 MATHEMATICS 3Assignment 2 DUM 3272 MATHEMATICS 3
Assignment 2 DUM 3272 MATHEMATICS 3MARA
 
Computing DFT using Matrix method
Computing DFT using Matrix methodComputing DFT using Matrix method
Computing DFT using Matrix methodSarang Joshi
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual networkThyrixYang1
 
Semi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networksSemi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networks品媛 陳
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
A NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLE
A NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLEA NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLE
A NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLEsatya kishore
 
Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...
Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...
Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...Edureka!
 
2-Rainbow Domination of Hexagonal Mesh Networks
2-Rainbow Domination of Hexagonal Mesh Networks2-Rainbow Domination of Hexagonal Mesh Networks
2-Rainbow Domination of Hexagonal Mesh Networksijcoa
 
Study of-ndvi-land-surface-temperature-using-landsat-tm-data
Study of-ndvi-land-surface-temperature-using-landsat-tm-dataStudy of-ndvi-land-surface-temperature-using-landsat-tm-data
Study of-ndvi-land-surface-temperature-using-landsat-tm-dataJosé Pasapera Gonzales
 
Wind/ Solar Power Forecasting
Wind/ Solar Power Forecasting  Wind/ Solar Power Forecasting
Wind/ Solar Power Forecasting Das A. K.
 
Lab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed SystemsLab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed SystemsRuochun Tzeng
 

What's hot (20)

Karatsuba algorithm for fast mltiplication
Karatsuba algorithm for fast mltiplicationKaratsuba algorithm for fast mltiplication
Karatsuba algorithm for fast mltiplication
 
Spanning trees
Spanning treesSpanning trees
Spanning trees
 
My presentation minimum spanning tree
My presentation minimum spanning treeMy presentation minimum spanning tree
My presentation minimum spanning tree
 
Section 3.8 solving equations and formulas (alg)
Section 3.8 solving equations and formulas (alg)Section 3.8 solving equations and formulas (alg)
Section 3.8 solving equations and formulas (alg)
 
Assignment 2 DUM 3272 MATHEMATICS 3
Assignment 2 DUM 3272 MATHEMATICS 3Assignment 2 DUM 3272 MATHEMATICS 3
Assignment 2 DUM 3272 MATHEMATICS 3
 
Computing DFT using Matrix method
Computing DFT using Matrix methodComputing DFT using Matrix method
Computing DFT using Matrix method
 
The reversible residual network
The reversible residual networkThe reversible residual network
The reversible residual network
 
Semi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networksSemi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networks
 
17 prims-kruskals (1)
17 prims-kruskals (1)17 prims-kruskals (1)
17 prims-kruskals (1)
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
A NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLE
A NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLEA NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLE
A NOVEL IMAGE SCRAMBLING BASED ON SUDOKU PUZZLE
 
Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...
Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...
Neural Network Tutorial | Introduction to Neural Network | Deep Learning Tuto...
 
Regression
RegressionRegression
Regression
 
post119s1-file2
post119s1-file2post119s1-file2
post119s1-file2
 
2-Rainbow Domination of Hexagonal Mesh Networks
2-Rainbow Domination of Hexagonal Mesh Networks2-Rainbow Domination of Hexagonal Mesh Networks
2-Rainbow Domination of Hexagonal Mesh Networks
 
karnaugh maps
karnaugh mapskarnaugh maps
karnaugh maps
 
Study of-ndvi-land-surface-temperature-using-landsat-tm-data
Study of-ndvi-land-surface-temperature-using-landsat-tm-dataStudy of-ndvi-land-surface-temperature-using-landsat-tm-data
Study of-ndvi-land-surface-temperature-using-landsat-tm-data
 
Wind/ Solar Power Forecasting
Wind/ Solar Power Forecasting  Wind/ Solar Power Forecasting
Wind/ Solar Power Forecasting
 
Lab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed SystemsLab: Foundation of Concurrent and Distributed Systems
Lab: Foundation of Concurrent and Distributed Systems
 
Dsp lecture vol 4 digital filters
Dsp lecture vol 4 digital filtersDsp lecture vol 4 digital filters
Dsp lecture vol 4 digital filters
 

Viewers also liked

Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)Austin Benson
 
NYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemNYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemAL500745425
 
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)Austin Benson
 
Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Austin Benson
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...Austin Benson
 
Suzlon takes a wise decision to go for CDR
Suzlon takes a wise decision to go for CDR Suzlon takes a wise decision to go for CDR
Suzlon takes a wise decision to go for CDR Himanshu Sharma
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Austin Benson
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Austin Benson
 
Silent error resilience in numerical time-stepping schemes
Silent error resilience in numerical time-stepping schemesSilent error resilience in numerical time-stepping schemes
Silent error resilience in numerical time-stepping schemesAustin Benson
 
Silent error detection in numerical time stepping schemes (SIAM PP 2014)
Silent error detection in numerical time stepping schemes (SIAM PP 2014)Silent error detection in numerical time stepping schemes (SIAM PP 2014)
Silent error detection in numerical time stepping schemes (SIAM PP 2014)Austin Benson
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral ClusteringAustin Benson
 

Viewers also liked (15)

Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
 
NYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemNYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop Echosystem
 
Natura si echilibrul sau
Natura si echilibrul sauNatura si echilibrul sau
Natura si echilibrul sau
 
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
 
Ucapan aluan
Ucapan aluanUcapan aluan
Ucapan aluan
 
Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
Ucapan aluan
Ucapan aluanUcapan aluan
Ucapan aluan
 
Suzlon takes a wise decision to go for CDR
Suzlon takes a wise decision to go for CDR Suzlon takes a wise decision to go for CDR
Suzlon takes a wise decision to go for CDR
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
 
Silent error resilience in numerical time-stepping schemes
Silent error resilience in numerical time-stepping schemesSilent error resilience in numerical time-stepping schemes
Silent error resilience in numerical time-stepping schemes
 
Silent error detection in numerical time stepping schemes (SIAM PP 2014)
Silent error detection in numerical time stepping schemes (SIAM PP 2014)Silent error detection in numerical time stepping schemes (SIAM PP 2014)
Silent error detection in numerical time stepping schemes (SIAM PP 2014)
 
426 anaerobicdigesterdesign
426 anaerobicdigesterdesign426 anaerobicdigesterdesign
426 anaerobicdigesterdesign
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 

Similar to fast-matmul-cse15

A framework for practical fast matrix multiplication (BLIS retreat)
A framework for practical fast matrix multiplication� (BLIS retreat)A framework for practical fast matrix multiplication� (BLIS retreat)
A framework for practical fast matrix multiplication (BLIS retreat)Austin Benson
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Karen Pao
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingAntonio Maria Fiscarelli
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsAkisato Kimura
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcastingRudra Narayan Paul
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用NVIDIA Taiwan
 
DATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHM
DATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHMDATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHM
DATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHMABIRAMIS87
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...Amr Kamel Deklel
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...Abolfazl Asudeh
 

Similar to fast-matmul-cse15 (20)

A framework for practical fast matrix multiplication (BLIS retreat)
A framework for practical fast matrix multiplication� (BLIS retreat)A framework for practical fast matrix multiplication� (BLIS retreat)
A framework for practical fast matrix multiplication (BLIS retreat)
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcasting
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
 
DATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHM
DATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHMDATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHM
DATA STRUCTURE AND ALGORITHM LMS MST KRUSKAL'S ALGORITHM
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...
Transfer learning with LTANN-MEM & NSA for solving multi-objective symbolic r...
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
 

More from Austin Benson

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Austin Benson
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networksAustin Benson
 
Computational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisComputational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisAustin Benson
 
Higher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingHigher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingAustin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsAustin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsAustin Benson
 
Higher-order link prediction
Higher-order link predictionHigher-order link prediction
Higher-order link predictionAustin Benson
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionAustin Benson
 
Three hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesThree hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesAustin Benson
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flowsAustin Benson
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graphAustin Benson
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureAustin Benson
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExAustin Benson
 
Higher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseHigher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseAustin Benson
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structureAustin Benson
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Austin Benson
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionAustin Benson
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsAustin Benson
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifsAustin Benson
 
Set prediction three ways
Set prediction three waysSet prediction three ways
Set prediction three waysAustin Benson
 

More from Austin Benson (20)

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networks
 
Computational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisComputational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data Analysis
 
Higher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingHigher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modeling
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Higher-order link prediction
Higher-order link predictionHigher-order link prediction
Higher-order link prediction
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Three hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesThree hypergraph eigenvector centralities
Three hypergraph eigenvector centralities
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flows
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graph
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structure
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphEx
 
Higher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseHigher-order Link Prediction Syracuse
Higher-order Link Prediction Syracuse
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structure
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusions
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
 
Set prediction three ways
Set prediction three waysSet prediction three ways
Set prediction three ways
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

fast-matmul-cse15

  • 1. A FRAMEWORK FOR PRACTICAL PARALLEL FAST MATRIX MULTIPLICATION code and paper: github.com/arbenson/fast-matmul 1 Austin Benson arbenson@stanford.edu Stanford University Joint work with Grey Ballard, Sandia SIAM CSE 2015 Salt Lake City, UT 9000 10000 11000 12000 13000 14 16 18 20 22 Dimension (N) EffectiveGFLOPS/core Performance (24 cores) on N x N x N MKL STRASSEN S<4,3,3> <4,2,2> <3,2,3> <3,3,2> <5,2,2> <2,5,2>
  • 2. Fast matrix multiplication: bridging theory and practice • There are a number of Strassen-like algorithms for matrix multiplication that have only been “discovered” recently. [Smirnov13], [Benson&Ballard14] • How well do they work in practice? 2 32 2.81 [Strassen79] 2.37 [Le Gall14] xxx xx xx xx
  • 5. Sequential performance = 5 Effective GFLOPS for M x K x N multiplies = 1e-9 * 2 * MKN / time in seconds Classical peak 0 2000 4000 6000 8000 16 18 20 22 24 26 28 Dimension (N) EffectiveGFLOPS Sequential performance on N x N x N MKL STRASSEN <4,2,2> <3,2,3> <3,3,2> <5,2,2> S<4,3,3>
  • 6. 2000 4000 6000 8000 10000 12000 20 22 24 26 28 dimension (N) EffectiveGFLOPS Sequential performance on N x 1600 x N MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN Sequential performance = • Almost all algorithms beat MKL • <4, 2, 4> and <3, 2, 3> tend to perform the best 6
  • 7. DFS Parallelization C M1 M7 + M2 … M1 M7 + M2 … All threads Use parallel MKL + Easy to implement - Need large base cases for high performance 7
  • 8. BFS Parallelization C M1 M7 + M2 … M1 M7 + M2 … omp taskwait omp taskwait 1 thread + High performance for smaller base cases - Sometimes harder to load balance: 24 threads, 49 subproblems - More memory 1 thread 1 thread 8
  • 9. HYBRID parallelization C M1 M7 + M2 … M1 M7 + M2 … omp taskwait omp taskwait 1 thread 1 thread all threads + Better load balancing - Explicit synchronization or else we can over-subscribe threads 9
  • 10. Parallel performance = 10 9000 10000 11000 12000 13000 18 20 22 24 26 28 Dimension (N) EffectiveGFLOPS/core Performance (6 cores) on N x N x N MKL STRASSEN S<4,3,3> <4,2,2> <3,2,3> <3,3,2> <5,2,2> <2,5,2> 9000 10000 11000 12000 13000 14 16 18 20 22 Dimension (N) EffectiveGFLOPS/core Performance (24 cores) on N x N x N MKL STRASSEN S<4,3,3> <4,2,2> <3,2,3> <3,3,2> <5,2,2> <2,5,2> • 6 cores: similar performance to sequential • 24 cores: can sometimes edge out MKL
  • 11. 10000 15000 20000 10000 15000 18 19 20 21 22 23 24 dimension (N) EffectiveGFLOPS/core Performance (6 cores) on N x 2800 x N MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN 5000 10000 15000 20000 12 14 16 18 20 dimension (N) EffectiveGFLOPS/core Performance (24 cores) on N x 2800 x N MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN Parallel performance = • 6 cores: similar performance to sequential • 24 cores: MKL best for large problems 11
  • 12. A FRAMEWORK FOR PRACTICAL PARALLEL FAST MATRIX MULTIPLICATION code and paper: github.com/arbenson/fast-matmul 12 Austin Benson arbenson@stanford.edu Stanford University Joint work with Grey Ballard, Sandia SIAM CSE 2015 Salt Lake City, UT 9000 10000 11000 12000 13000 14 16 18 20 22 Dimension (N) EffectiveGFLOPS/core Performance (24 cores) on N x N x N MKL STRASSEN S<4,3,3> <4,2,2> <3,2,3> <3,3,2> <5,2,2> <2,5,2>

Editor's Notes

  1. \omega
  2. \begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \cdot \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} S_1 &= A_{11} + A_{22} \\ S_2 &= A_{21} + A_{22} \\ S_3 &= A_{11} \\ S_4 &= A_{22} \\ S_5 &= A_{11} + A_{12} \\ S_6 &= A_{21} - A_{11} \\ S_7 &= A_{12} - A_{22} \\ T_1 &= B_{11} + B_{22} \\ T_2 &= B_{11} \\ T_3 &= B_{12} - B_{22} \\ T_4 &= B_{21} - B_{11} \\ T_5 &= B_{22} \\ T_6 &= B_{11} + B_{12} \\ T_7 &= B_{21} + B_{22} \\
  3. \begin{table} \begin{tabular}{l c c c c c} Algorithm & Multiples & Multiplies & Speedup per & \\ base case & (fast) & (classical) & recursive step & Exponent\\ $\langle 2,2,3\rangle $ & 11 & 12 & 9\% & 2.89 \\ $\langle 2,2,5\rangle $ & 18 & 20 & 11\% & 2.89\\ $\langle 2,2,2\rangle $ & 7 & 8 & 14\% & 2.81 \\ $\langle 2,2,4\rangle $ & 14 & 16 & 14\% & 2.85\\ $\langle 3,3,3\rangle $ & 23 & 27 & 17\% & 2.85 \\ $\langle 2,3,3\rangle $ & 15 & 18 & 20\% & 2.81 \\ $\langle 2,3,4\rangle $ & 20 & 24 & 20\% & 2.83\\ $\langle 2,4,4\rangle $ & 26 & 32 & 23\% & 2.82 \\ $\langle 3,3,4\rangle $ & 29 & 36 & 24\% & 2.82 \\ $\langle 3,4,4\rangle $ & 38 & 48 & 26\% & 2.82 \\ $\langle 3,3,6\rangle $ & 40 & 54 & 35\% & 2.77 \\ \end{tabular} \end{table}
  4. \begin{eqnarray*} S_7 &=& A_{12} - A_{22} \\ T_7 &=& B_{21} + B_{22} \\ M_7 &=& S_7 \cdot T_7 \end{eqnarray*}
  5. \begin{eqnarray*} S_7 &=& A_{12} - A_{22} \\ T_7 &=& B_{21} + B_{22} \\ M_7 &=& S_7 \cdot T_7 \end{eqnarray*}