SlideShare a Scribd company logo
1 of 18
DS
RC

Data Science
Research Center

High Performance Distributed
Computing
Henri Bal
Vrije Universiteit Amsterdam
DS
RC

Outline

1. Development of the field
2. Highlights VU-HPDC group
3. Links to data science cycle
4. Conclusions
DS
RC

Developments

• Multiple types of data explosions:
– Big data: huge processing/transportation demands
– Complex heterogeneous data

10-100 x global internet
traffic per year,
exascale processing

Complex data
DS
RC

Developments

• Infrastructure explosion
– High complexity: heterogeneous systems with
diversity of processors, systems, networks
DS
RC

VU HPDC GROUP

• Bridge the gap between demanding
applications and complex infrastructure
• Distributed programming systems for
–
–
–
–

Clusters, grids, clouds
Heterogeneous systems (``Jungles”)
Accelerators (GPUs)
Clouds & mobile devices

• Applications: multimedia, semantic web,
model checking, games, astronomy,
astrophysics, climate modeling ….
DS
RC

Highlights VU-HPDC group

889Billion
game
states 2002
Solved Awari

Multimedia
data
AAAI-VC 2007
Multimedia
data

Semantic
web
3rd Prize: ISWC 2008

Astronomy
data
DACH 2008 - BS

DACH 2008 - FT

Semantic
web
1st Prize: SCALE 2008

1st Prize: SCALE 2010

EYR 2011
Sustainability award
DS
RC

Links to data science cycle
Visual
Analytics
Perception
Cognition

Decision
Theory

Understand
and decide

Distributed reasoning
Distributed
Processing

Reasoning
Knowledge
representati
on

Large Scale
Databases

Store and
process
Software
Eng.
System /
Network
Eng.

Analyze
and model

Multimedia
Retrieval

Modeling
and
simulation

Information
Retrieval
Machine
Learning
DS
RC

Reasoning – Semantic Web

• Make the Web smarter by injecting meaning
so that machines can “understand” it.
o initial idea by Tim Berners-Lee in 2001

• Now attracted the interest of big IT
companies
DS
RC

Google Example
DS
RC

Google Example
DS
RC

Distributed Reasoning

• WebPIE: web-scale distributed reasoner
doing full materialization
• QueryPIE: distributed reasoning with
backward-chaining + pre-materialization of
schema-triples
• DynamiTE: maintains materialization after
updates (additions & removals)
 Challenge: real-time incremental
reasoning on web scale, combining new
(streaming) data & existing historic data
With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen

COMMIT/
DS
R C Distributed Computing
• Jungle computing with Ibis
– Distributed, heterogeneous, hierarchical systems

• Programming accelerators

With: NLeSC (Frank Seinstra, Rob van Nieuwpoort et al.)
DS
RC

Ibis

• Computational
Astrophysics (Leiden)

gravitational
dynamics
stellar
evolution

AMUSE
radiative
transport

• Climate Modeling (Utrecht)
• Multimedia Content Analysis (UvA)

hydrodynamics
DS
RC

Accelerators (GPUs)
Host Interface
GigaThread Engine
GPC

GPC
SM

SM

SM

SM

SM

GPC
SM

SM

SM

SM

SM

SM

SM

GPC

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

Memory Controller

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

Polymorph Engine
Polymorph Engine

L2 Cache

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

SM

Polymorph Engine
Polymorph Engine

SM

GPC

SM

Polymorph Engine
Polymorph Engine

SM

SM

SM

SM

SM

Raster Engine

GPC

SM

SM

SM

SM

SM

GPC

SM

Raster Engine

GPC

• Methodology for efficient GPU programming
– Stepwise refinement, different levels of hardware
abstraction
– Compiler feedback at each level
 Challenge: getting grip on performance

Memory Controller

Memory Controller

SM

Memory Controller

– Multimedia content analysis
– Climate modeling
– LOFAR (pulsar pipelines)

Raster Engine

SM

Memory Controller

• Use cases

Memory Controller

Raster Engine
SM
DS
RC

Glasswing: MapReduce
on Accelerators

• Use accelerators (OpenCL) as mainstream
feature
• Massive out-of-core data sets
• Scale vertically & horizontally
• Maintain MapReduce abstraction

With: Ismail El Helw, Rutger Hofman, UvA-SNE
DS
RC

Glasswing Pipeline

• Overlaps computation, communication &
disk access
• Supports multiple buffering levels
DS
RC

Evaluation (DAS-4, EC2)

• Compute-bound applications benefit
dramatically from GPUs (up to 107×)
• Better scalability than Hadoop
• Runs on a variety of accelerators & clouds

 Challenge: real-world (compute-intensive) applications
DS
RC

Conclusions

• Strong links with Big data & Complex data
Visual
Analytics
Perception
Cognition

Decision
Theory

Understand
and decide

Distributed
Processing

Reasoning
Knowledge
representati
on

Large Scale
Databases

Store and
process
Software
Eng.
System /
Network
Eng.

Analyze
and model

Multimedia
Retrieval

Modeling
and
simulation

Information
Retrieval
Machine
Learning

More Related Content

Similar to High Performance Distributed Computing and Data Science

Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
 
5g, gpu and fpga
5g, gpu and fpga5g, gpu and fpga
5g, gpu and fpgaRichard Kuo
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Geoffrey Fox
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceObject Automation
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...InfluxData
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCObject Automation
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
 
ACM HPDC 2010参加報告
ACM HPDC 2010参加報告ACM HPDC 2010参加報告
ACM HPDC 2010参加報告Ryousei Takano
 
OLPC Mesh networking improvements
OLPC Mesh networking improvementsOLPC Mesh networking improvements
OLPC Mesh networking improvementsOSLL
 
Background scenario drivers and critical issues with a focus on technology ...
Background   scenario drivers and critical issues with a focus on technology ...Background   scenario drivers and critical issues with a focus on technology ...
Background scenario drivers and critical issues with a focus on technology ...bdemchak
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH dongwook159
 

Similar to High Performance Distributed Computing and Data Science (20)

Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
5g, gpu and fpga
5g, gpu and fpga5g, gpu and fpga
5g, gpu and fpga
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
 
Grid computing
Grid computingGrid computing
Grid computing
 
ACM HPDC 2010参加報告
ACM HPDC 2010参加報告ACM HPDC 2010参加報告
ACM HPDC 2010参加報告
 
OLPC Mesh networking improvements
OLPC Mesh networking improvementsOLPC Mesh networking improvements
OLPC Mesh networking improvements
 
Background scenario drivers and critical issues with a focus on technology ...
Background   scenario drivers and critical issues with a focus on technology ...Background   scenario drivers and critical issues with a focus on technology ...
Background scenario drivers and critical issues with a focus on technology ...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

High Performance Distributed Computing and Data Science

  • 1. DS RC Data Science Research Center High Performance Distributed Computing Henri Bal Vrije Universiteit Amsterdam
  • 2. DS RC Outline 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions
  • 3. DS RC Developments • Multiple types of data explosions: – Big data: huge processing/transportation demands – Complex heterogeneous data 10-100 x global internet traffic per year, exascale processing Complex data
  • 4. DS RC Developments • Infrastructure explosion – High complexity: heterogeneous systems with diversity of processors, systems, networks
  • 5. DS RC VU HPDC GROUP • Bridge the gap between demanding applications and complex infrastructure • Distributed programming systems for – – – – Clusters, grids, clouds Heterogeneous systems (``Jungles”) Accelerators (GPUs) Clouds & mobile devices • Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….
  • 6. DS RC Highlights VU-HPDC group 889Billion game states 2002 Solved Awari Multimedia data AAAI-VC 2007 Multimedia data Semantic web 3rd Prize: ISWC 2008 Astronomy data DACH 2008 - BS DACH 2008 - FT Semantic web 1st Prize: SCALE 2008 1st Prize: SCALE 2010 EYR 2011 Sustainability award
  • 7. DS RC Links to data science cycle Visual Analytics Perception Cognition Decision Theory Understand and decide Distributed reasoning Distributed Processing Reasoning Knowledge representati on Large Scale Databases Store and process Software Eng. System / Network Eng. Analyze and model Multimedia Retrieval Modeling and simulation Information Retrieval Machine Learning
  • 8. DS RC Reasoning – Semantic Web • Make the Web smarter by injecting meaning so that machines can “understand” it. o initial idea by Tim Berners-Lee in 2001 • Now attracted the interest of big IT companies
  • 11. DS RC Distributed Reasoning • WebPIE: web-scale distributed reasoner doing full materialization • QueryPIE: distributed reasoning with backward-chaining + pre-materialization of schema-triples • DynamiTE: maintains materialization after updates (additions & removals)  Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen COMMIT/
  • 12. DS R C Distributed Computing • Jungle computing with Ibis – Distributed, heterogeneous, hierarchical systems • Programming accelerators With: NLeSC (Frank Seinstra, Rob van Nieuwpoort et al.)
  • 13. DS RC Ibis • Computational Astrophysics (Leiden) gravitational dynamics stellar evolution AMUSE radiative transport • Climate Modeling (Utrecht) • Multimedia Content Analysis (UvA) hydrodynamics
  • 14. DS RC Accelerators (GPUs) Host Interface GigaThread Engine GPC GPC SM SM SM SM SM GPC SM SM SM SM SM SM SM GPC Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine Memory Controller Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine L2 Cache Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine SM Polymorph Engine Polymorph Engine SM GPC SM Polymorph Engine Polymorph Engine SM SM SM SM SM Raster Engine GPC SM SM SM SM SM GPC SM Raster Engine GPC • Methodology for efficient GPU programming – Stepwise refinement, different levels of hardware abstraction – Compiler feedback at each level  Challenge: getting grip on performance Memory Controller Memory Controller SM Memory Controller – Multimedia content analysis – Climate modeling – LOFAR (pulsar pipelines) Raster Engine SM Memory Controller • Use cases Memory Controller Raster Engine SM
  • 15. DS RC Glasswing: MapReduce on Accelerators • Use accelerators (OpenCL) as mainstream feature • Massive out-of-core data sets • Scale vertically & horizontally • Maintain MapReduce abstraction With: Ismail El Helw, Rutger Hofman, UvA-SNE
  • 16. DS RC Glasswing Pipeline • Overlaps computation, communication & disk access • Supports multiple buffering levels
  • 17. DS RC Evaluation (DAS-4, EC2) • Compute-bound applications benefit dramatically from GPUs (up to 107×) • Better scalability than Hadoop • Runs on a variety of accelerators & clouds  Challenge: real-world (compute-intensive) applications
  • 18. DS RC Conclusions • Strong links with Big data & Complex data Visual Analytics Perception Cognition Decision Theory Understand and decide Distributed Processing Reasoning Knowledge representati on Large Scale Databases Store and process Software Eng. System / Network Eng. Analyze and model Multimedia Retrieval Modeling and simulation Information Retrieval Machine Learning