Henri Ball describes how high performance computing is driven by the demands of large scale data problems. He also describes his links to other computer science disciplines within the DSRC.
3. DS
RC
Developments
• Multiple types of data explosions:
– Big data: huge processing/transportation demands
– Complex heterogeneous data
10-100 x global internet
traffic per year,
exascale processing
Complex data
5. DS
RC
VU HPDC GROUP
• Bridge the gap between demanding
applications and complex infrastructure
• Distributed programming systems for
–
–
–
–
Clusters, grids, clouds
Heterogeneous systems (``Jungles”)
Accelerators (GPUs)
Clouds & mobile devices
• Applications: multimedia, semantic web,
model checking, games, astronomy,
astrophysics, climate modeling ….
6. DS
RC
Highlights VU-HPDC group
889Billion
game
states 2002
Solved Awari
Multimedia
data
AAAI-VC 2007
Multimedia
data
Semantic
web
3rd Prize: ISWC 2008
Astronomy
data
DACH 2008 - BS
DACH 2008 - FT
Semantic
web
1st Prize: SCALE 2008
1st Prize: SCALE 2010
EYR 2011
Sustainability award
7. DS
RC
Links to data science cycle
Visual
Analytics
Perception
Cognition
Decision
Theory
Understand
and decide
Distributed reasoning
Distributed
Processing
Reasoning
Knowledge
representati
on
Large Scale
Databases
Store and
process
Software
Eng.
System /
Network
Eng.
Analyze
and model
Multimedia
Retrieval
Modeling
and
simulation
Information
Retrieval
Machine
Learning
8. DS
RC
Reasoning – Semantic Web
• Make the Web smarter by injecting meaning
so that machines can “understand” it.
o initial idea by Tim Berners-Lee in 2001
• Now attracted the interest of big IT
companies
11. DS
RC
Distributed Reasoning
• WebPIE: web-scale distributed reasoner
doing full materialization
• QueryPIE: distributed reasoning with
backward-chaining + pre-materialization of
schema-triples
• DynamiTE: maintains materialization after
updates (additions & removals)
Challenge: real-time incremental
reasoning on web scale, combining new
(streaming) data & existing historic data
With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen
COMMIT/
12. DS
R C Distributed Computing
• Jungle computing with Ibis
– Distributed, heterogeneous, hierarchical systems
• Programming accelerators
With: NLeSC (Frank Seinstra, Rob van Nieuwpoort et al.)
14. DS
RC
Accelerators (GPUs)
Host Interface
GigaThread Engine
GPC
GPC
SM
SM
SM
SM
SM
GPC
SM
SM
SM
SM
SM
SM
SM
GPC
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
Memory Controller
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
Polymorph Engine
L2 Cache
Polymorph Engine
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
SM
GPC
SM
Polymorph Engine
Polymorph Engine
SM
SM
SM
SM
SM
Raster Engine
GPC
SM
SM
SM
SM
SM
GPC
SM
Raster Engine
GPC
• Methodology for efficient GPU programming
– Stepwise refinement, different levels of hardware
abstraction
– Compiler feedback at each level
Challenge: getting grip on performance
Memory Controller
Memory Controller
SM
Memory Controller
– Multimedia content analysis
– Climate modeling
– LOFAR (pulsar pipelines)
Raster Engine
SM
Memory Controller
• Use cases
Memory Controller
Raster Engine
SM
15. DS
RC
Glasswing: MapReduce
on Accelerators
• Use accelerators (OpenCL) as mainstream
feature
• Massive out-of-core data sets
• Scale vertically & horizontally
• Maintain MapReduce abstraction
With: Ismail El Helw, Rutger Hofman, UvA-SNE
17. DS
RC
Evaluation (DAS-4, EC2)
• Compute-bound applications benefit
dramatically from GPUs (up to 107×)
• Better scalability than Hadoop
• Runs on a variety of accelerators & clouds
Challenge: real-world (compute-intensive) applications
18. DS
RC
Conclusions
• Strong links with Big data & Complex data
Visual
Analytics
Perception
Cognition
Decision
Theory
Understand
and decide
Distributed
Processing
Reasoning
Knowledge
representati
on
Large Scale
Databases
Store and
process
Software
Eng.
System /
Network
Eng.
Analyze
and model
Multimedia
Retrieval
Modeling
and
simulation
Information
Retrieval
Machine
Learning