SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
DASH: A C++ PGAS Library for
Distributed Data Structures and
Parallel Algorithms
www.dash-project.org
Tobias Fuchs
fuchst@nm.ifi.lmu.de
Ludwig-Maximilians-Universität München, MNM-Team
| 2DASH
DASH - Overview
 DASH is a C++ template library that offers
– distributed data structures and parallel algorithms
– a complete PGAS (part. global address space) programming
system without a custom (pre-)compiler
 PGAS Terminology – ShMem Analogy
Unit: The individual participants in a DASH
program, usually full OS processes.
Private
Shared
Unit 0 Unit 1 Unit N-1
int b;
int c;
dash::Array a(1000);
int a;
…
dash::Shared s;
10..190..9 ..999
Shared data:
managed by DASH
in a virtual global
address space
Private data:
managed by regular
C/C++ mechanisms
| 3DASH
DASH - Partitioned Global Address Space
 Data affinity
– data has well-defined owner but can be accessed by any unit
– data locality important for performance
– support for the owner computes execution model
 DASH:
– unified access to
local and remote
data in global
memory space
| 4DASH
DASH - Partitioned Global Address Space
 Data affinity
– data has well-defined owner but can be accessed by any unit
– data locality important for performance
– support for the owner computes execution model
 DASH:
– unified access to
local and remote
data in global
memory space
– and explicit views
on local memory
space
| 5DASH
DASH Project Structure
Phase I (2013-2015) Phase II (2016-2018)
LMU Munich
Project lead,
C++ template library
Project lead,
C++ template library,
data dock
TU Dresden
Libraries and
interfaces, tools
Smart data
structures, resilience
HLRS Stuttgart DART runtime DART runtime
KIT Karlsruhe Application studies
IHR Stuttgart
Smart deployment,
Application studies
| 6DASH
DART: The DASH Runtime
 The DART Interface
– plain C (99) interface
– SPMD execution model
– global memory abstraction
– one-sided RDMA operations and
synchronization communication
– topology discovery, locality hierarchy
 Several implementations
– DART-SHMEM shared-memory based implementation
– DART-CUDA supports GPUs, based on DART-SHMEM
– DART-GASPI Initial implementation using GASPI
– DART-MPI MPI-3 RDMA “workhorse” implementation
| 7DASH
Distributed Data Structures
 DASH offers distributed data structures
– Flexible data distribution schemes
– Example: dash::Array<T>
dash::Array<int> arr(100);
if (dash::myid() == 0) {
for (auto i=0; i < arr.size(); i++)
arr[i] = i;
}
arr.barrier();
if (dash::myid() == 1) {
for (auto e: arr)
cout << (int)e << " ";
cout << endl;
}
DASH global array of 100 integers,
distributed over all units,
default distribution is BLOCKED
unit 0 writes to the array using the
global index i. Operator [] is
overloaded for the dash::Array.
$ mpirun -n 4 ./array
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
88 89 90 91 92 93 94 95 96 97 98 99
unit 1 executes a range-based for loop
over the DASH array
| 8DASH
Accessing Local Data
 Access to local ranges in container:
local-view proxy .local
dash::Array<int> arr(100);
for (int i=0; i < arr.lsize(); i++)
arr.local[i] = dash::myid();
arr.barrier();
if (dash::myid() == 0) {
for (auto e: arr)
cout << (int)e << " ";
cout << endl;
}
$ mpirun -n 4 ./array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3
.local is a proxy object that
represents the part of the data
that is local to a unit.
 no communication here!
.lsize() is short hand for
.local.size() and returns the
number of local elements
| 9DASH
Efficient Local Access
 IGUPs Benchmark: independent parallel updates
| 10DASH
Using STL Algorithms
int main(int argc, char* argv[])
{
dash::init(&argc, &argv);
dash::Array<int> a(1000);
if (dash::myid() == 0) {
// global iterators and std. algorithms
std::sort(a.begin(), a.end());
}
// local access using local iterators
std::fill(a.lbegin(), a.lend(), 23+dash::myid());
dash::finalize();
}
Collective constructor, all
units involved
Standard library algorithms
work with DASH global
iterator ranges …
… as well as DASH local
iterator ranges
 STL algorithms can be used with DASH containers
… on both local and global ranges
| 11DASH
DASH Distributed Data Structures Overview
Container Description Data distribution
Array<T> 1D Array static, configurable
NArray<T, N> N-dim. Array static, configurable
Shared<T> Shared scalar fixed (at 0)
Directory(*)<T> Variable-size,
locally indexed
array
manual,
load-balanced
List<T> Variable-size
linked list
dynamic,
load-balanced
Map<T> Variable-size
associative map
dynamic, balanced
by hash function
(*) Under construction
| 12DASH
Data Distribution Patterns
 Data distribution patterns are configurable
 Assume 4 units
 Custom data distributions:
– just implement the DASH Pattern concept
dash::Array<int> arr1(20); // default: BLOCKED
dash::Array<int> arr2(20, dash::BLOCKED)
dash::Array<int> arr3(20, dash::CYCLIC)
dash::Array<int> arr4(20, dash::BLOCKCYCLIC(3))
// use your own data distribution:
dash::Array<int, MyPattern> arr5(20, MyPattern(…))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 8 12 164 51 13 179 102 6 18143 7 11 15 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
BLOCKED
CYCLIC
BLOCKCYCLIC(3)
arr1, arr2
arr3
arr4
| 13DASH
Multidimensional Data Distribution (1)
 dash::Pattern<N> specifies N-dim data distribution
– Blocked, cyclic, and block-cyclic in multiple dimensions
Pattern<2>(20, 15)
(BLOCKED,
NONE)
(NONE,
BLOCKCYCLIC(2))
(BLOCKED,
BLOCKCYCLIC(3))
Extent in first and
second dimension
Distribution in first and
second dimension
| 14DASH
Multidimensional Data Distribution (2)
 Example: tiled and tile-shifted data distribution
(TILE(4), TILE(3))
ShiftTilePattern<2>(32, 24)TilePattern<2, COL_MAJOR>(20, 15)
(TILE(5), TILE(5))
| 15DASH
The N-Dimensional Array
 Distributed Multidimensional Array Abstraction
dash::NArray (dash::Matrix)
– Dimension is a template parameter
– Element access using coordinates or linear index
– Support for custom index types
– Support for row-major and column-major storage
dash::NArray<int, 2> mat(40, 30); // 1200 elements
int a = mat(i,j); // Fortran style access
int b = mat[i][j]; // chained subscripts
auto loc = mat.local;
int c = mat.local[i][j];
int d = *(mat.local.begin()); // local iterator
| 16DASH
Multidimensional Views
 Lightweight Multidimensional Views
// 8x8 2D array
dash::NArray<int, 2> mat(8,8);
// linear access using iterators
dash::distance(mat.begin(), mat.end()) == 64
// create 2x5 region view
auto reg = matrix.cols(2,5).rows(3,2);
// region can be used just like 2D array
cout << reg[1][2] << endl; // ‘7’
dash::distance(reg.begin(), reg.end()) == 10
| 17DASH
DASH Algorithms (1)
 Growing number of DASH equivalents for STL algorithms:
 Examples of STL algorithms ported to DASH
(which also work for multidimensional ranges!)
– dash::copy range[i] <- range2[i]
– dash::fill range[i] <- val
– dash::generate range[i] <- func()
– dash::for_each func(range[i])
– dash::transform range[i] = func(range2[i])
– dash::accumulate sum(range[i]) (0<=i<=n-1)
– dash::min_element min(range[i]) (0<=i<=n-1)
dash::GlobIter<T> dash::fill(GlobIter<T> begin,
GlobIter<T> end,
T val);
| 18DASH
DASH Algorithms (2)
 Example: Find the min. element in a distributed array
dash::Array<int> arr(100, dash::BLOCKED);
// ...
auto min = dash::min_element(arr.begin(),
arr.end());
if (dash::myid() == 0) {
cout << “Global minimum: “ << (int)*min
<< endl;
}
Collective call,
returns global pointer
to min. element
 reduce results of
std::min_element
of all local ranges
 Features
– Still works when using CYCLIC or any other distribution
– Still works when using a range other than [begin, end)
| 19DASH
Performance of dash::min_element() (int)
| 20DASH
Asynchronous Communication
 Realized via two mechanisms:
– Asynchronous execution policy, e.g. dash::copy_async()
– .async proxy object on DASH containers
for (i = dash::myid(); i < NUPDATE; i += dash::size()) {
ran = (ran << 1) ^ (((int64_t) ran < 0) ? POLY : 0);
// using .async proxy object
// .async proxy object for async. communication
Table.async[ran & (TableSize-1)] ^= ran;
}
Table.flush();
// async. copy of global range to local memory
std::vector<int> lcopy(block.size());
auto fut = dash::copy_async(block.begin(), block.end(),
lcopy.begin());
...
fut.wait();
| 21DASH
Hierarchical Machines
 Machines are getting increasingly hierarchical
– Both within nodes and between nodes
– Data locality is the most crucial factor for performance and
energy efficiency
Hierarchical locality not well supported by current
approaches. PGAS languages usually only offer a two-level
differentiation (local vs. remote).
| 22DASH
dash::Team& t0 = dash::Team::All();
dash::Array<int> arr1(100);
// same as
dash::Array<int> arr2(100, t0);
// allocate array over t1
auto t1 = t0.split(2)
dash::Array<int> arr3(100, t1);
Teams
 Teams are everywhere in DASH (but not always visible)
– Team: ordered set of units
– Default team: dash::Team::All() - all units that exist at startup
– team.split(N) – split an existing team into N sub-teams
DART_TEAM_ALL
{0,…,7}
Node 0 {0,…,3} Node 1 {4,…,7}
ND 0 {0,1} ND 1 {2,3} ND 0 {4,5} ND 1 {6,7}
ID=2
ID=0
ID=1
ID=2 ID=3 ID=3 ID=4
| 25DASH
Porting LULESH
 LULESH: A shock-hydrodynamics mini-application
class Domain // local data
{
private:
std::vector<Real_t> m_x;
// many more...
public:
Domain() { // C’tor
m_x.resize(numNodes);
//...
}
Real_t& x(Index_t idx) {
return m_x[idx];
}
};
class Domain // global data
{
private:
dash::NArray<Real_t, 3> m_x;
// many more...
public:
Domain() { // C’tor
dash::Pattern<3> nodePat(
nx()*px(), ny()*py(), nz()*pz(),
BLOCKED, BLOCKED, BLOCKED);
m_x.allocate(nodePat);
}
Real_t& x(Index_t idx) {
return m_x.lbegin()[idx];
}
};
MPI Version DASH Version
With DASH, you specify data distribution
once and explicitly instead of implicitly
and repeatedly in your algorithm!
| 26DASH
Porting LULESH (goals)
 Easy to remove limitations of MPI domain decomposition
– MPI version: cubic number of MPI processes only (P x P x P),
cubic per-processor grid required
– DASH port: allows any P x Q x R configuration
 Avoid replication, manual index calculation, bookkeeping
manual index calculation
manual bookkeeping
implicit assumptions
… replicated about 80x in MPI code
| 27DASH
Performance of the DASH Version (using put)
 Performance and scalability (weak scaling) of LULESH,
implemented in MPI and DASH
Karl Fürlinger, Tobias Fuchs, and Roger Kowalewski. DASH: A C++ PGAS Library for Distributed Data
Structures and Parallel Algorithms. In Proceedings of the 18th IEEE International Conference on High
Performance Computing and Communications (HPCC 2016). Sydney, Australia, December 2016.
| 28DASH
DASH: On-going and Future Work
DASH Runtime (DART)
DASH C++ Template Library
DASH Application
ToolsandInterfaces
Hardware: Network, Processor,
Memory, Storage
One-sided Communication
Substrate
MPI GASnet GASPIARMCI
DART API
Task-based
execution
LULESH port
DASH algorithms
DASH Halo Matrix
DASH for graph
problems
Topology
discovery,
multilevel
locality
Memory spaces
(NVRAM and HBW)
Dynamic
data structures
| 29DASH
Summary
 DASH is
– a complete data-oriented PGAS
programming system: entire applications
can be written in DASH
– a library that provides distributed data
structures: DASH can be integrated into
existing MPI applications
 More information
– Upcoming talk on multi-dimensional array:
Session 9-B in 30min!
– http://www.dash-project.org/
– http://github.com/dash-project/
| 30DASH
Acknowledgements
DASH on GitHub:
https://github.com/dash-project/dash/
 Funding
 The DASH Team
T. Fuchs (LMU), R. Kowalewski (LMU), D. Hünich (TUD), A. Knüpfer
(TUD), J. Gracia (HLRS), C. Glass (HLRS), H. Zhou (HLRS), K. Idrees
(HLRS), J. Schuchart (HLRS), F. Mößbauer (LMU), K. Fürlinger (LMU)

Weitere ähnliche Inhalte

Was ist angesagt?

Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4Andrea Antonello
 
Variables in matlab
Variables in matlabVariables in matlab
Variables in matlabTUOS-Sam
 
Social network-analysis-in-python
Social network-analysis-in-pythonSocial network-analysis-in-python
Social network-analysis-in-pythonJoe OntheRocks
 
Lecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsLecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsAakash deep Singhal
 
How Matlab Helps
How Matlab HelpsHow Matlab Helps
How Matlab Helpssiufu
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
"contour.py" module
"contour.py" module"contour.py" module
"contour.py" moduleArulalan T
 
Linear models
Linear modelsLinear models
Linear modelsFAO
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shangBBKuhn
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingGabriele Angeletti
 
Edit/correct India Map In Cdat Documentation - With Edited World Map Data
Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data
Edit/correct India Map In Cdat Documentation - With Edited World Map Data Arulalan T
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Algo labpresentation a_group
Algo labpresentation a_groupAlgo labpresentation a_group
Algo labpresentation a_groupUmme habiba
 

Was ist angesagt? (20)

Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
Variables in matlab
Variables in matlabVariables in matlab
Variables in matlab
 
Social network-analysis-in-python
Social network-analysis-in-pythonSocial network-analysis-in-python
Social network-analysis-in-python
 
Lecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsLecture 14 data structures and algorithms
Lecture 14 data structures and algorithms
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
How Matlab Helps
How Matlab HelpsHow Matlab Helps
How Matlab Helps
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
"contour.py" module
"contour.py" module"contour.py" module
"contour.py" module
 
Linear models
Linear modelsLinear models
Linear models
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shang
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Vector in R
Vector in RVector in R
Vector in R
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive Hashing
 
Edit/correct India Map In Cdat Documentation - With Edited World Map Data
Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data
Edit/correct India Map In Cdat Documentation - With Edited World Map Data
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Algo labpresentation a_group
Algo labpresentation a_groupAlgo labpresentation a_group
Algo labpresentation a_group
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
 

Ähnlich wie C++ PGAS Library for Distributed Data Structures and Parallel Algorithms

Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide trainingSpark Summit
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkDatio Big Data
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Satalia
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型wang xing
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020InfluxData
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 

Ähnlich wie C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (20)

Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 

Kürzlich hochgeladen

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Kürzlich hochgeladen (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

C++ PGAS Library for Distributed Data Structures and Parallel Algorithms

  • 1. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms www.dash-project.org Tobias Fuchs fuchst@nm.ifi.lmu.de Ludwig-Maximilians-Universität München, MNM-Team
  • 2. | 2DASH DASH - Overview  DASH is a C++ template library that offers – distributed data structures and parallel algorithms – a complete PGAS (part. global address space) programming system without a custom (pre-)compiler  PGAS Terminology – ShMem Analogy Unit: The individual participants in a DASH program, usually full OS processes. Private Shared Unit 0 Unit 1 Unit N-1 int b; int c; dash::Array a(1000); int a; … dash::Shared s; 10..190..9 ..999 Shared data: managed by DASH in a virtual global address space Private data: managed by regular C/C++ mechanisms
  • 3. | 3DASH DASH - Partitioned Global Address Space  Data affinity – data has well-defined owner but can be accessed by any unit – data locality important for performance – support for the owner computes execution model  DASH: – unified access to local and remote data in global memory space
  • 4. | 4DASH DASH - Partitioned Global Address Space  Data affinity – data has well-defined owner but can be accessed by any unit – data locality important for performance – support for the owner computes execution model  DASH: – unified access to local and remote data in global memory space – and explicit views on local memory space
  • 5. | 5DASH DASH Project Structure Phase I (2013-2015) Phase II (2016-2018) LMU Munich Project lead, C++ template library Project lead, C++ template library, data dock TU Dresden Libraries and interfaces, tools Smart data structures, resilience HLRS Stuttgart DART runtime DART runtime KIT Karlsruhe Application studies IHR Stuttgart Smart deployment, Application studies
  • 6. | 6DASH DART: The DASH Runtime  The DART Interface – plain C (99) interface – SPMD execution model – global memory abstraction – one-sided RDMA operations and synchronization communication – topology discovery, locality hierarchy  Several implementations – DART-SHMEM shared-memory based implementation – DART-CUDA supports GPUs, based on DART-SHMEM – DART-GASPI Initial implementation using GASPI – DART-MPI MPI-3 RDMA “workhorse” implementation
  • 7. | 7DASH Distributed Data Structures  DASH offers distributed data structures – Flexible data distribution schemes – Example: dash::Array<T> dash::Array<int> arr(100); if (dash::myid() == 0) { for (auto i=0; i < arr.size(); i++) arr[i] = i; } arr.barrier(); if (dash::myid() == 1) { for (auto e: arr) cout << (int)e << " "; cout << endl; } DASH global array of 100 integers, distributed over all units, default distribution is BLOCKED unit 0 writes to the array using the global index i. Operator [] is overloaded for the dash::Array. $ mpirun -n 4 ./array 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 unit 1 executes a range-based for loop over the DASH array
  • 8. | 8DASH Accessing Local Data  Access to local ranges in container: local-view proxy .local dash::Array<int> arr(100); for (int i=0; i < arr.lsize(); i++) arr.local[i] = dash::myid(); arr.barrier(); if (dash::myid() == 0) { for (auto e: arr) cout << (int)e << " "; cout << endl; } $ mpirun -n 4 ./array 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 .local is a proxy object that represents the part of the data that is local to a unit.  no communication here! .lsize() is short hand for .local.size() and returns the number of local elements
  • 9. | 9DASH Efficient Local Access  IGUPs Benchmark: independent parallel updates
  • 10. | 10DASH Using STL Algorithms int main(int argc, char* argv[]) { dash::init(&argc, &argv); dash::Array<int> a(1000); if (dash::myid() == 0) { // global iterators and std. algorithms std::sort(a.begin(), a.end()); } // local access using local iterators std::fill(a.lbegin(), a.lend(), 23+dash::myid()); dash::finalize(); } Collective constructor, all units involved Standard library algorithms work with DASH global iterator ranges … … as well as DASH local iterator ranges  STL algorithms can be used with DASH containers … on both local and global ranges
  • 11. | 11DASH DASH Distributed Data Structures Overview Container Description Data distribution Array<T> 1D Array static, configurable NArray<T, N> N-dim. Array static, configurable Shared<T> Shared scalar fixed (at 0) Directory(*)<T> Variable-size, locally indexed array manual, load-balanced List<T> Variable-size linked list dynamic, load-balanced Map<T> Variable-size associative map dynamic, balanced by hash function (*) Under construction
  • 12. | 12DASH Data Distribution Patterns  Data distribution patterns are configurable  Assume 4 units  Custom data distributions: – just implement the DASH Pattern concept dash::Array<int> arr1(20); // default: BLOCKED dash::Array<int> arr2(20, dash::BLOCKED) dash::Array<int> arr3(20, dash::CYCLIC) dash::Array<int> arr4(20, dash::BLOCKCYCLIC(3)) // use your own data distribution: dash::Array<int, MyPattern> arr5(20, MyPattern(…)) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 8 12 164 51 13 179 102 6 18143 7 11 15 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 BLOCKED CYCLIC BLOCKCYCLIC(3) arr1, arr2 arr3 arr4
  • 13. | 13DASH Multidimensional Data Distribution (1)  dash::Pattern<N> specifies N-dim data distribution – Blocked, cyclic, and block-cyclic in multiple dimensions Pattern<2>(20, 15) (BLOCKED, NONE) (NONE, BLOCKCYCLIC(2)) (BLOCKED, BLOCKCYCLIC(3)) Extent in first and second dimension Distribution in first and second dimension
  • 14. | 14DASH Multidimensional Data Distribution (2)  Example: tiled and tile-shifted data distribution (TILE(4), TILE(3)) ShiftTilePattern<2>(32, 24)TilePattern<2, COL_MAJOR>(20, 15) (TILE(5), TILE(5))
  • 15. | 15DASH The N-Dimensional Array  Distributed Multidimensional Array Abstraction dash::NArray (dash::Matrix) – Dimension is a template parameter – Element access using coordinates or linear index – Support for custom index types – Support for row-major and column-major storage dash::NArray<int, 2> mat(40, 30); // 1200 elements int a = mat(i,j); // Fortran style access int b = mat[i][j]; // chained subscripts auto loc = mat.local; int c = mat.local[i][j]; int d = *(mat.local.begin()); // local iterator
  • 16. | 16DASH Multidimensional Views  Lightweight Multidimensional Views // 8x8 2D array dash::NArray<int, 2> mat(8,8); // linear access using iterators dash::distance(mat.begin(), mat.end()) == 64 // create 2x5 region view auto reg = matrix.cols(2,5).rows(3,2); // region can be used just like 2D array cout << reg[1][2] << endl; // ‘7’ dash::distance(reg.begin(), reg.end()) == 10
  • 17. | 17DASH DASH Algorithms (1)  Growing number of DASH equivalents for STL algorithms:  Examples of STL algorithms ported to DASH (which also work for multidimensional ranges!) – dash::copy range[i] <- range2[i] – dash::fill range[i] <- val – dash::generate range[i] <- func() – dash::for_each func(range[i]) – dash::transform range[i] = func(range2[i]) – dash::accumulate sum(range[i]) (0<=i<=n-1) – dash::min_element min(range[i]) (0<=i<=n-1) dash::GlobIter<T> dash::fill(GlobIter<T> begin, GlobIter<T> end, T val);
  • 18. | 18DASH DASH Algorithms (2)  Example: Find the min. element in a distributed array dash::Array<int> arr(100, dash::BLOCKED); // ... auto min = dash::min_element(arr.begin(), arr.end()); if (dash::myid() == 0) { cout << “Global minimum: “ << (int)*min << endl; } Collective call, returns global pointer to min. element  reduce results of std::min_element of all local ranges  Features – Still works when using CYCLIC or any other distribution – Still works when using a range other than [begin, end)
  • 19. | 19DASH Performance of dash::min_element() (int)
  • 20. | 20DASH Asynchronous Communication  Realized via two mechanisms: – Asynchronous execution policy, e.g. dash::copy_async() – .async proxy object on DASH containers for (i = dash::myid(); i < NUPDATE; i += dash::size()) { ran = (ran << 1) ^ (((int64_t) ran < 0) ? POLY : 0); // using .async proxy object // .async proxy object for async. communication Table.async[ran & (TableSize-1)] ^= ran; } Table.flush(); // async. copy of global range to local memory std::vector<int> lcopy(block.size()); auto fut = dash::copy_async(block.begin(), block.end(), lcopy.begin()); ... fut.wait();
  • 21. | 21DASH Hierarchical Machines  Machines are getting increasingly hierarchical – Both within nodes and between nodes – Data locality is the most crucial factor for performance and energy efficiency Hierarchical locality not well supported by current approaches. PGAS languages usually only offer a two-level differentiation (local vs. remote).
  • 22. | 22DASH dash::Team& t0 = dash::Team::All(); dash::Array<int> arr1(100); // same as dash::Array<int> arr2(100, t0); // allocate array over t1 auto t1 = t0.split(2) dash::Array<int> arr3(100, t1); Teams  Teams are everywhere in DASH (but not always visible) – Team: ordered set of units – Default team: dash::Team::All() - all units that exist at startup – team.split(N) – split an existing team into N sub-teams DART_TEAM_ALL {0,…,7} Node 0 {0,…,3} Node 1 {4,…,7} ND 0 {0,1} ND 1 {2,3} ND 0 {4,5} ND 1 {6,7} ID=2 ID=0 ID=1 ID=2 ID=3 ID=3 ID=4
  • 23. | 25DASH Porting LULESH  LULESH: A shock-hydrodynamics mini-application class Domain // local data { private: std::vector<Real_t> m_x; // many more... public: Domain() { // C’tor m_x.resize(numNodes); //... } Real_t& x(Index_t idx) { return m_x[idx]; } }; class Domain // global data { private: dash::NArray<Real_t, 3> m_x; // many more... public: Domain() { // C’tor dash::Pattern<3> nodePat( nx()*px(), ny()*py(), nz()*pz(), BLOCKED, BLOCKED, BLOCKED); m_x.allocate(nodePat); } Real_t& x(Index_t idx) { return m_x.lbegin()[idx]; } }; MPI Version DASH Version With DASH, you specify data distribution once and explicitly instead of implicitly and repeatedly in your algorithm!
  • 24. | 26DASH Porting LULESH (goals)  Easy to remove limitations of MPI domain decomposition – MPI version: cubic number of MPI processes only (P x P x P), cubic per-processor grid required – DASH port: allows any P x Q x R configuration  Avoid replication, manual index calculation, bookkeeping manual index calculation manual bookkeeping implicit assumptions … replicated about 80x in MPI code
  • 25. | 27DASH Performance of the DASH Version (using put)  Performance and scalability (weak scaling) of LULESH, implemented in MPI and DASH Karl Fürlinger, Tobias Fuchs, and Roger Kowalewski. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms. In Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications (HPCC 2016). Sydney, Australia, December 2016.
  • 26. | 28DASH DASH: On-going and Future Work DASH Runtime (DART) DASH C++ Template Library DASH Application ToolsandInterfaces Hardware: Network, Processor, Memory, Storage One-sided Communication Substrate MPI GASnet GASPIARMCI DART API Task-based execution LULESH port DASH algorithms DASH Halo Matrix DASH for graph problems Topology discovery, multilevel locality Memory spaces (NVRAM and HBW) Dynamic data structures
  • 27. | 29DASH Summary  DASH is – a complete data-oriented PGAS programming system: entire applications can be written in DASH – a library that provides distributed data structures: DASH can be integrated into existing MPI applications  More information – Upcoming talk on multi-dimensional array: Session 9-B in 30min! – http://www.dash-project.org/ – http://github.com/dash-project/
  • 28. | 30DASH Acknowledgements DASH on GitHub: https://github.com/dash-project/dash/  Funding  The DASH Team T. Fuchs (LMU), R. Kowalewski (LMU), D. Hünich (TUD), A. Knüpfer (TUD), J. Gracia (HLRS), C. Glass (HLRS), H. Zhou (HLRS), K. Idrees (HLRS), J. Schuchart (HLRS), F. Mößbauer (LMU), K. Fürlinger (LMU)