SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
SDVIS AND IN-SITU VISUALIZATION
ON TACC’S STAMPEDE-KNL
Paul A. Navrátil, Ph.D.
Manager – Scalable Visualization Technologies
pnav@tacc.utexas.edu
1
2High-Fidelity Visualization Natively on Xeon and Xeon Phi
OUTLINE
„ Stampede Architecture
„ Stampede – Sandy Bridge
„ Stampede - KNL
„ Stampede 2 – KNL + Skylake
„ Software-Defined Visualization Stack
„ VNC
„ OpenSWR
„ OSPRay
„ Path to In-Situ
„ ParaView Catalyst
„ VisIt Libsim
„ Direct renderer integration
3
4
STAMPEDE ARCHITECTURE
STAMPEDE SANDY BRIDGE
„ Mellanox FDR Interconnect
„ 6400 compute nodes, each with:
„ 2x Intel Xeon E5-2680
“Sandy Bridge”
„ 1x Intel Xeon Phi SE10P
„ 32 GB RAM / 8 GB Phi RAM
„ 16 Large Memory nodes, each with:
„ 4x Intel Xeon E5-4650 “Sandy Bridge”
„ 2x NVIDIA Quadro 2000 “Fermi”
„ 1 TB RAM
„ 128 GPU nodes, each with:
„ 2x Intel Xeon E5-2680
„ 1x Intel Xeon Phi SE10P
„ 1x NVIDIA Tesla K20 “Kepler”
„ 32 GB RAM
5
STAMPEDE KNL
„ Deployed in 2016 as planned
upgrade to Stampede
„ First KNL-based system in Top500
„ Intel OmniPath interconnect
„ 508 nodes, each with:
„ 1x Intel Xeon Phi 7250
„ 96 GB RAM + 16 GB MCDRAM
„ Notes:
„ Shared $WORK and $SCRATCH
Separate $HOME directories
„ Separate Login Node
login-knl1.stampede.tacc.utexas.edu
„ Login is Intel Xeon E5-2695
“Haswell”
„ Compile on compute node
or use “–xMIC-AVX512” on login
„ “normal” and ”development”
queues are Cache-Quadrant
„ Other MCDRAM configs available
by queue name
6
7
STAMPEDE 2 (COMING 2017)
„ ~18 PF Dell Intel Xeon + Intel Xeon Phi system
„ Combine KNL + Skylake + OmniPath + 3D XPoint
„ Phase 1: Spring 2017
„ Stampede KNL + 4200 new KNL nodes + new filesystem
„ 60% of Stampede Sandy Bridge to remain operational during this phase
„ Phase 2: Fall 2017
„ 1736 Intel Skylake nodes
„ Phase 3: Spring 2018
„ Add 3D XPoint memory to subset of nodes
8
KEY ARCHITECTURAL TAKE-AWAY
„ Current and near-future cyberinfrastructure will use processors with many cores
„ Each core contains wide vector units: use them for max utilization (e.g., AVX512)
„ Fortunately the Software-Defined Visualization stack is optimized for such processors!
„ Use your preferred rendering method independent of the underlying hardware
„ Performant rasterization
„ Performant ray tracing
„ Visualization and analysis on the simulation machine
9
SOFTWARE-DEFINED VISUALIZATION – WHY?
10
FILE SIZE
100
GBPS
10 GBPS 1 GBPS 300 MBPS 54 MBPS
1 GB < 1 sec 1 sec 10 sec 35 sec 2.5 min
1 TB ~100 sec ~17 min ~3 hours ~10 hours ~43 hours
1 PB ~1 day ~12 days ~121 days >1 year ~5 years
Increasingly Difficult to Move Data from Simulation Machine
11
SOFTWARE-DEFINED VISUALIZATION
SOFTWARE-DEFINED VISUALIZATION – WHY?
12
SOFTWARE-DEFINED VISUALIZATION – WHY?
13
SOFTWARE-DEFINED VISUALIZATION – WHY?
14
SOFTWARE-DEFINED VISUALIZATION – WHY?
15
SOFTWARE-DEFINED VISUALIZATION STACK
„ OpenSWR Software Rasterizer
„ openswr.org
„ Performant rasterization for Xeon and Xeon Phi
„ Thread-parallel vector processing (previous parallel Mesa3D only has threaded fragments)
„ Support for wide vector instruction sets, particularly AVX2 (and soon AVX512)
„ Integrated into Mesa3D 12.0 as optional driver (mesa3d.org)
„ Best Uses
„ Lines
„ Graphs
„ User Interfaces
16
SOFTWARE-DEFINED VISUALIZATION STACK
„ OSPRay Ray Tracer
„ ospray.org
„ Performant ray tracing for Xeon and Xeon Phi incorporating Embree kernels
„ Thread- and wide-vector parallel using Intel ISPC (including AVX512 support)
„ Parallel rendering support via distributed framebuffer
„ Best Uses
„ Photorealistic rendering
„ Realistic lighting
„ Realistic material effects
„ Large geometry
„ Implicit geometry (e.g., molecular ”ball and stick” models)
17
SOFTWARE-DEFINED VISUALIZATION STACK
„ GraviT Scheduling Framework
„ tacc.github.io/GraviT/
„ Large-scale, data-distributed ray tracing (uses OSPRay for rendering engine target)
„ Parallel rendering support via distributed ray scheduling
„ Best Uses
„ Large distrubted data
„ Data outside of renderer control
„ Incoherent ray-intensive sampling (e.g., global illumination approximations)
18
OSPRAY TEST SUITE – SAMPLE IMAGES
19
Test 0 Test 1 Test 4
Test 2 Test 3 Test 5
Test 6Test 7 Test 8
OSPRAY TEST SUITE – MCDRAM PERFORMANCE RESULTS
20
PARAVIEW TEST SUITE – MANYSPHERES
21
22
Likely VNC
limited
23
Likely VNC
limited
24
Definitely
VNC limited!
FIU CORE SAMPLE – SAMPLE IMAGE
25
26
Likely VNC
limited
27
Likely VNC
limited
28
Likely hitting
VNC desktop
limits
Definitely
VNC limited!
29
PATH TO IN-SITU VISUALIZATION
WHY IN-SITU VISUALIZATION?
„ Processors (like KNL) enabling larger, more detailed simulations
„ File system technologies not scaling at same rate (if at all….)
„ Touching disk is expensive:
„ During simulation: time checkpointing is (often) not time computing
„ During analysis: loading the data is (often) the overwhelming majority of runtime
„ In-situ capabilities overcome this data bottleneck
„ Render directly from resident simulation data
„ Tightly coupled vis opens doors for online analysis, computational steering, etc
30
CURRENT IN-SITU OPTIONS
„ Simulation developer
„ Implement visualization API (ParaView Catalyst, VisIt libsim, VTK)
„ Implement data framework (ADIOS, etc)
„ Implement direct rendering calls (OSPRay API, etc)
„ Simulation user
„ Hope the developers do one of the above J
„ Do one of the above yourself L
„ Hope technology keeps post-hoc analysis viable (3D XPoint NVRAM might help)
31
IN-SITU VISUALIZATION APIS
„ ParaView Catalyst (and Cinema)
(www.paraview.org/in-situ/)
„ VisIt Libsim
(www.visitusers.org/index.php?title=Libsim_Batch)
„ Direct VTK integration
(www.vtk.org)
„ Visualization ops already implemented
„ Need coordination b/t teams to ensure
simulation and vis performance
32
Image courtesy of Kitware Inc.
IN-SITU-COMPATIBLE DATA FRAMEWORKS
„ ADIOS – https://www.olcf.ornl.gov/center-projects/adios/
„ Damaris – https://hal.inria.fr/hal-00859603/en
„ DIY – http://www.mcs.anl.gov/~tpeterka/software.html
„ GLEAN – http://www.mcs.anl.gov/project/glean-situ-visualization-and-analysis
„ SCIRun – http://www.sci.utah.edu/cibc-software/scirun.html
„ (Possibly) more invasive implementation effort
„ (Possibly) broader benefits beyond visualization (framework now controls data)
„ Requires engagement from simulation team to ensure performance and accuracy
33
IN-SITU DIRECT RENDERING
„ Render directly within simulation using API (e.g., OSPRay, OpenGL, etc)
„ Must implement visualization operations within simulation code
„ Lightest weight, lowest overhead
„ Requires visualization algorithm knowledge for efficient implementation
„ Locks in particular rendering and visualization modes
34
IN-SITU FUTURE?
35
Useful perspectives at ISAV – http://conferences.computer.org/isav/2016/
TACC/KITWARE IPCC – UNIMPEDED IN SITU
VISUALIZATION ON INTEL XEON AND INTEL XEON PHI
„ Optimize ParaView Catalyst for current and near-future Intel architectures
„ KNL, Skylake, Omnipath, 3D XPoint NVRAM
„ Use Stampede-KNL as testbed to target TACC Stampede 2, NERSC Cori, LANL Trinity
„ Focus on data and rendering paths for OpenSWR and OSPRay
„ Parallelize VTK data processing filters
„ Catalyst integration with simulation
„ Targeted algorithm improvements
„ Increase processor and memory utilization
36
THANK YOU!
pnav@tacc.utexas.edu
37

Weitere ähnliche Inhalte

Was ist angesagt?

Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Deepak Shankar
 

Was ist angesagt? (20)

BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
Lenovo HPC Strategy Update
Lenovo HPC Strategy UpdateLenovo HPC Strategy Update
Lenovo HPC Strategy Update
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft Processor
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 

Ähnlich wie SDVIs and In-Situ Visualization on TACC's Stampede

Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 

Ähnlich wie SDVIs and In-Situ Visualization on TACC's Stampede (20)

Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on OpenstackSummit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
 
Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
OpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetAppOpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetApp
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Possibilities of generative models
Possibilities of generative modelsPossibilities of generative models
Possibilities of generative models
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 

Mehr von Intel® Software

Mehr von Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

SDVIs and In-Situ Visualization on TACC's Stampede

  • 1. SDVIS AND IN-SITU VISUALIZATION ON TACC’S STAMPEDE-KNL Paul A. Navrátil, Ph.D. Manager – Scalable Visualization Technologies pnav@tacc.utexas.edu 1
  • 3. OUTLINE „ Stampede Architecture „ Stampede – Sandy Bridge „ Stampede - KNL „ Stampede 2 – KNL + Skylake „ Software-Defined Visualization Stack „ VNC „ OpenSWR „ OSPRay „ Path to In-Situ „ ParaView Catalyst „ VisIt Libsim „ Direct renderer integration 3
  • 5. STAMPEDE SANDY BRIDGE „ Mellanox FDR Interconnect „ 6400 compute nodes, each with: „ 2x Intel Xeon E5-2680 “Sandy Bridge” „ 1x Intel Xeon Phi SE10P „ 32 GB RAM / 8 GB Phi RAM „ 16 Large Memory nodes, each with: „ 4x Intel Xeon E5-4650 “Sandy Bridge” „ 2x NVIDIA Quadro 2000 “Fermi” „ 1 TB RAM „ 128 GPU nodes, each with: „ 2x Intel Xeon E5-2680 „ 1x Intel Xeon Phi SE10P „ 1x NVIDIA Tesla K20 “Kepler” „ 32 GB RAM 5
  • 6. STAMPEDE KNL „ Deployed in 2016 as planned upgrade to Stampede „ First KNL-based system in Top500 „ Intel OmniPath interconnect „ 508 nodes, each with: „ 1x Intel Xeon Phi 7250 „ 96 GB RAM + 16 GB MCDRAM „ Notes: „ Shared $WORK and $SCRATCH Separate $HOME directories „ Separate Login Node login-knl1.stampede.tacc.utexas.edu „ Login is Intel Xeon E5-2695 “Haswell” „ Compile on compute node or use “–xMIC-AVX512” on login „ “normal” and ”development” queues are Cache-Quadrant „ Other MCDRAM configs available by queue name 6
  • 7. 7
  • 8. STAMPEDE 2 (COMING 2017) „ ~18 PF Dell Intel Xeon + Intel Xeon Phi system „ Combine KNL + Skylake + OmniPath + 3D XPoint „ Phase 1: Spring 2017 „ Stampede KNL + 4200 new KNL nodes + new filesystem „ 60% of Stampede Sandy Bridge to remain operational during this phase „ Phase 2: Fall 2017 „ 1736 Intel Skylake nodes „ Phase 3: Spring 2018 „ Add 3D XPoint memory to subset of nodes 8
  • 9. KEY ARCHITECTURAL TAKE-AWAY „ Current and near-future cyberinfrastructure will use processors with many cores „ Each core contains wide vector units: use them for max utilization (e.g., AVX512) „ Fortunately the Software-Defined Visualization stack is optimized for such processors! „ Use your preferred rendering method independent of the underlying hardware „ Performant rasterization „ Performant ray tracing „ Visualization and analysis on the simulation machine 9
  • 10. SOFTWARE-DEFINED VISUALIZATION – WHY? 10 FILE SIZE 100 GBPS 10 GBPS 1 GBPS 300 MBPS 54 MBPS 1 GB < 1 sec 1 sec 10 sec 35 sec 2.5 min 1 TB ~100 sec ~17 min ~3 hours ~10 hours ~43 hours 1 PB ~1 day ~12 days ~121 days >1 year ~5 years Increasingly Difficult to Move Data from Simulation Machine
  • 16. SOFTWARE-DEFINED VISUALIZATION STACK „ OpenSWR Software Rasterizer „ openswr.org „ Performant rasterization for Xeon and Xeon Phi „ Thread-parallel vector processing (previous parallel Mesa3D only has threaded fragments) „ Support for wide vector instruction sets, particularly AVX2 (and soon AVX512) „ Integrated into Mesa3D 12.0 as optional driver (mesa3d.org) „ Best Uses „ Lines „ Graphs „ User Interfaces 16
  • 17. SOFTWARE-DEFINED VISUALIZATION STACK „ OSPRay Ray Tracer „ ospray.org „ Performant ray tracing for Xeon and Xeon Phi incorporating Embree kernels „ Thread- and wide-vector parallel using Intel ISPC (including AVX512 support) „ Parallel rendering support via distributed framebuffer „ Best Uses „ Photorealistic rendering „ Realistic lighting „ Realistic material effects „ Large geometry „ Implicit geometry (e.g., molecular ”ball and stick” models) 17
  • 18. SOFTWARE-DEFINED VISUALIZATION STACK „ GraviT Scheduling Framework „ tacc.github.io/GraviT/ „ Large-scale, data-distributed ray tracing (uses OSPRay for rendering engine target) „ Parallel rendering support via distributed ray scheduling „ Best Uses „ Large distrubted data „ Data outside of renderer control „ Incoherent ray-intensive sampling (e.g., global illumination approximations) 18
  • 19. OSPRAY TEST SUITE – SAMPLE IMAGES 19 Test 0 Test 1 Test 4 Test 2 Test 3 Test 5 Test 6Test 7 Test 8
  • 20. OSPRAY TEST SUITE – MCDRAM PERFORMANCE RESULTS 20
  • 21. PARAVIEW TEST SUITE – MANYSPHERES 21
  • 25. FIU CORE SAMPLE – SAMPLE IMAGE 25
  • 29. 29 PATH TO IN-SITU VISUALIZATION
  • 30. WHY IN-SITU VISUALIZATION? „ Processors (like KNL) enabling larger, more detailed simulations „ File system technologies not scaling at same rate (if at all….) „ Touching disk is expensive: „ During simulation: time checkpointing is (often) not time computing „ During analysis: loading the data is (often) the overwhelming majority of runtime „ In-situ capabilities overcome this data bottleneck „ Render directly from resident simulation data „ Tightly coupled vis opens doors for online analysis, computational steering, etc 30
  • 31. CURRENT IN-SITU OPTIONS „ Simulation developer „ Implement visualization API (ParaView Catalyst, VisIt libsim, VTK) „ Implement data framework (ADIOS, etc) „ Implement direct rendering calls (OSPRay API, etc) „ Simulation user „ Hope the developers do one of the above J „ Do one of the above yourself L „ Hope technology keeps post-hoc analysis viable (3D XPoint NVRAM might help) 31
  • 32. IN-SITU VISUALIZATION APIS „ ParaView Catalyst (and Cinema) (www.paraview.org/in-situ/) „ VisIt Libsim (www.visitusers.org/index.php?title=Libsim_Batch) „ Direct VTK integration (www.vtk.org) „ Visualization ops already implemented „ Need coordination b/t teams to ensure simulation and vis performance 32 Image courtesy of Kitware Inc.
  • 33. IN-SITU-COMPATIBLE DATA FRAMEWORKS „ ADIOS – https://www.olcf.ornl.gov/center-projects/adios/ „ Damaris – https://hal.inria.fr/hal-00859603/en „ DIY – http://www.mcs.anl.gov/~tpeterka/software.html „ GLEAN – http://www.mcs.anl.gov/project/glean-situ-visualization-and-analysis „ SCIRun – http://www.sci.utah.edu/cibc-software/scirun.html „ (Possibly) more invasive implementation effort „ (Possibly) broader benefits beyond visualization (framework now controls data) „ Requires engagement from simulation team to ensure performance and accuracy 33
  • 34. IN-SITU DIRECT RENDERING „ Render directly within simulation using API (e.g., OSPRay, OpenGL, etc) „ Must implement visualization operations within simulation code „ Lightest weight, lowest overhead „ Requires visualization algorithm knowledge for efficient implementation „ Locks in particular rendering and visualization modes 34
  • 35. IN-SITU FUTURE? 35 Useful perspectives at ISAV – http://conferences.computer.org/isav/2016/
  • 36. TACC/KITWARE IPCC – UNIMPEDED IN SITU VISUALIZATION ON INTEL XEON AND INTEL XEON PHI „ Optimize ParaView Catalyst for current and near-future Intel architectures „ KNL, Skylake, Omnipath, 3D XPoint NVRAM „ Use Stampede-KNL as testbed to target TACC Stampede 2, NERSC Cori, LANL Trinity „ Focus on data and rendering paths for OpenSWR and OSPRay „ Parallelize VTK data processing filters „ Catalyst integration with simulation „ Targeted algorithm improvements „ Increase processor and memory utilization 36