SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Johannes M Dieterich: High Performance Computing and
GPU acceleration: WhereDo We Stand?
Personal Background
● Computational Chemist
● high-performance computing developer and user
FreeBSD experience
● user since 6.1
● development platform since 7.1
● ports committer since Jan 2017
HPC
● large problems
HPC usage by chemistry subfield
● materials, ...
Cluster science
● experiment and simulation
Typical HPC Installations
● 3552 nodes, 85k CPU cores, 2.7 PFLOPS, 222TB Ram, 2.7PByte storage
HPC Ecosystem
● HW, OS, Tools, Numerical libs, Parallelization, Languages
Processor Architecture
● x86 dominates the top500
● a little bit of Power, Sparc64
OS
● Linux dominates
● a little bit of Unix (non x86 archs)
(Free)BSD and HPC
● Strengths
○ poudriere and pkg
■ custom, optimized packages for a whole cluster
○ libs, compilers setup ease
○ 31k+ ports
○ network and kernel performance
○ some automated regression testing
○ related: as virtualization host for contained HPC applications
Why Should We All Care?
● tech and visibility
○ GPU accel
○ numerical libs
○ performance
● Community capital
○ HPC developers
○ HPC users
○ new ideas
HPC Programming Languages
● Fortran dominates
● Some C/C++
● Tiny bit of Python
● Large part unknown (probably still Fortran)
● data from ARCHER
Compilers and Build Environments
● LLVM works well
● lang/gcc pitfalls:
○ -Wl,-rpath=/usr/local/lib/gcc5/
○ C++ standard library
○ OpenMP implementation mixing
● USES= fortran/compiler:openmp goes to GCC
● Intel icpc, icc available
HPC make environments
● normal makes (gmake, cmake, …)
● homebrew (bash, perl, python, ….)
● Makefiles hard code OS assumptions
devel/flang: LLVM-based
● 64bit-only Fortran compiler
● fortran 77, 90, 95, 2003 well supported
Status
● port works for amd64
● patches not upstream
● based on LLVM 5.0
“jrm” use flang by default on x86 for math/R
“jrm” added USES= fortran:flang
Parallelization: inter-node (MPI)
● MPI
○ message-based (sync/async)
○ high latency
○ explicit support required
○ (queue support required)
○ mpicc/mpif90 + magic
● Support in FreeBSD
○ OpenMP, mpich
○ fortran compiler limitations
Parallelization: intra-node (OpenMP)
#pragma omp parallel for default (none) shared(grid,sum)
● pragma-based
● low latency
● offloading capable
● $CC -fopenmp
Support in FreeBSD
● no support in base since LLVM switch
○ USES= compiler:openmp depends on lang/gcc
● D6362 use devel/llvm50 on x86
Software, Libs, and Capabilities
● Capabilities
○ queuing systems sysutils/slurm-wlm
○ networking
■ InfiniBand, Intel OmniPath
○ Libs
■ math/fftw3
■ science/libint, libxc (exchange-correlation for quantum chemistry)
■ math/tblis WIP tensor contraction framework
■ … many other non comp-chemistry libs ---
BLAS and LAPACK
Linear Algebra
● standard interface for Fortran/C
● Linux: vendor libs like Intel MKL
operations
● BLAS1; O(N)
○ e.g. dot product
● BLAS2: O(N^2)
○ e.g. matrix-vector product *gemv
● BLAS3: O(N^3)
○ e.g. matrix-matrix product
● LAPACK: complex ops
○ e.g. Choleskey-decomposition *syrk
BLAS/LAPACK on FreeBSD: Mk/Uses/blaslapack.mk
● netlib (reference impl, not fast enough)
● atlas
● openblas
● gotoblas (legacy)
● blis/libflame
dgemm performance on FreeBSD (Matrix-Matrix product)
● OpenBLAS g++/gfortran, ...
● (performance chart along M/N/K)
Porting HPC Apps to FreeBSD: Examples
● TigerCI
○ open-source
○ C++/Fortran + OpenMP
○ cmake
○ 10s os users
○ 74k SLOC
● ADF suite
○ commercial
○ Fortran, some C/C++, tcl + MPI
○ homebrew Python build system
● Challenges
○ time / funding
○ long-term viability
○ Linux assumptions
■ e.g. #!/bin/bash
○ BSDisms
■ e.g. -Wl,-rpath=
○ port dependencies
● Benefits
○ portability
○ compliance
○ exposure of bugs
Developing on FreeBSD
● Debug: gdb, lldb, valgrind
● Profiling: DTrace, hwpmc
HPC: accelerators
● none is still most common
● NVIDIA: CUDA
● Small bit of Xeon Phi
● Tiny bit of AMD OpenCL
GPU Accel
● CUDA
○ Fortran/C/C++, others
○ effectively proprietary
○ LLVM’s CUDA backend missing runtime libraries
● OpenCL
○ C/C++, others
○ general runtime loader ocl/loader
OpenCL on FreeBSD: CPU
● POCL
OpenCL: Intel
● beignet
● status: double precision support only experimental
○ double is critical for some algorithms
OpenCL: AMD
● clover
● status: doesn’t really work
Develop OpenCL On FreeBSD
● Tools
● devel/oclgrind: virtual OpenCL device simulator
● devel/cltune
● benchmarks/clpeak
● Libs:...
Language Bindings: java/aparapi
● JNI wrapper for OpenCL
Radeon Open Compute (ROCm) and FreeBSDDesktop
● Fundamentals
○ amdgpu: KMS driver
○ amdkfd: compute kernel driver
○ ROC Thunk: usermode library to amdkfd
○ runtime libraries
○ llvm fork for targetting ROCm
● Libs
○ OpenCL
○ HIP (CUDA emulator)
○ rocBLAS
○ rocFFT
○ MIOpen
○ hiptensorflow
Wishlist: near-term
● math/blis
● OpenMP
○ D6362 into tree
● OpenCL
○ fix lang/pocl and lang/clover?
● more people, ports
Wishlist: mid-term
● blaslapack:flame
● OpenMP
○ in base
● accel
○ amdkfd
○ ROCm ecosystem
● devel/flang
○ USES= fortran:flang for more ports
Wishlist: long-term
● blaslapack:flame
● OpenMP
○ non x86 archs
● accel
○ OpenCL via ROCm
● devel/flang
○ non x86 archs
● devel/linuxisms port?
Acknowledgements
(long list of contributors)
Q&A
● Audience remark: providing ports/linuxisms is not a problem at all. As long as no improper dependency
exists.
● Will this support multiple GPUs in a single system? A: yes, multiple DRM devices.
● Experience with TI co-processor? A: no first hand experience. lang/pocl seems to have some support
● Why are there so few accelerators in top500?
○ A: a lot of legacy code. Publication bias. Models may not scale on certain GPU hardware due to
memory bandwidth and size.
○ May change as language and frameworks evolve. Takes too much manual labor and thus
limited by developer time for now.

Weitere ähnliche Inhalte

Was ist angesagt?

Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdScott Tsai
 
BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 Linaro
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64Linaro
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSFernando Luiz Cola
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103Linaro
 
LAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALinaro
 
Clang Analyzer Tool Review
Clang Analyzer Tool ReviewClang Analyzer Tool Review
Clang Analyzer Tool ReviewDoug Schuster
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Kynetics
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandNicola La Gloria
 
BKK16-106 ODP Project Update
BKK16-106 ODP Project UpdateBKK16-106 ODP Project Update
BKK16-106 ODP Project UpdateLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
OpenZFS code repository
OpenZFS code repositoryOpenZFS code repository
OpenZFS code repositoryMatthew Ahrens
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXruchith
 
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
차세대컴파일러, VM의미래: 애플 오픈소스 LLVMJung Kim
 
BKK16-407 AOSP Toolchain Evolution and experimental languages on AOSP
BKK16-407 AOSP Toolchain Evolution and experimental languages on AOSPBKK16-407 AOSP Toolchain Evolution and experimental languages on AOSP
BKK16-407 AOSP Toolchain Evolution and experimental languages on AOSPLinaro
 
BKK16-100K1 George Grey, Linaro CEO Opening Keynote
BKK16-100K1 George Grey, Linaro CEO Opening KeynoteBKK16-100K1 George Grey, Linaro CEO Opening Keynote
BKK16-100K1 George Grey, Linaro CEO Opening KeynoteLinaro
 
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeLLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeTakeshi Yamamuro
 

Was ist angesagt? (20)

Clang: More than just a C/C++ Compiler
Clang: More than just a C/C++ CompilerClang: More than just a C/C++ Compiler
Clang: More than just a C/C++ Compiler
 
Where is LLVM Being Used Today?
Where is LLVM Being Used Today? Where is LLVM Being Used Today?
Where is LLVM Being Used Today?
 
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsd
 
BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
LAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVA
 
Clang Analyzer Tool Review
Clang Analyzer Tool ReviewClang Analyzer Tool Review
Clang Analyzer Tool Review
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
 
BKK16-106 ODP Project Update
BKK16-106 ODP Project UpdateBKK16-106 ODP Project Update
BKK16-106 ODP Project Update
 
A Close Look at ARM Code Size
A Close Look at ARM Code SizeA Close Look at ARM Code Size
A Close Look at ARM Code Size
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
OpenZFS code repository
OpenZFS code repositoryOpenZFS code repository
OpenZFS code repository
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIX
 
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
 
BKK16-407 AOSP Toolchain Evolution and experimental languages on AOSP
BKK16-407 AOSP Toolchain Evolution and experimental languages on AOSPBKK16-407 AOSP Toolchain Evolution and experimental languages on AOSP
BKK16-407 AOSP Toolchain Evolution and experimental languages on AOSP
 
BKK16-100K1 George Grey, Linaro CEO Opening Keynote
BKK16-100K1 George Grey, Linaro CEO Opening KeynoteBKK16-100K1 George Grey, Linaro CEO Opening Keynote
BKK16-100K1 George Grey, Linaro CEO Opening Keynote
 
LLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecodeLLJVM: LLVM bitcode to JVM bytecode
LLJVM: LLVM bitcode to JVM bytecode
 

Ähnlich wie Bsdtw17: johannes m dieterich: high performance computing and gpu acceleration: where do we stand?

Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Linaro
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
01 linux-quick-start
01 linux-quick-start01 linux-quick-start
01 linux-quick-startNguyen Vinh
 
Rlite software-architecture (1)
Rlite software-architecture (1)Rlite software-architecture (1)
Rlite software-architecture (1)ARCFIRE ICT
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLinaro
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205Linaro
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLinaro
 
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device driversBsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device driversScott Tsai
 
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAPLDAPCon
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookDanny Al-Gaaf
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019🔧 Loïc BLOT
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Embedded platform choices
Embedded platform choicesEmbedded platform choices
Embedded platform choicesTavish Naruka
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 

Ähnlich wie Bsdtw17: johannes m dieterich: high performance computing and gpu acceleration: where do we stand? (20)

Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
01 linux-quick-start
01 linux-quick-start01 linux-quick-start
01 linux-quick-start
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Rlite software-architecture (1)
Rlite software-architecture (1)Rlite software-architecture (1)
Rlite software-architecture (1)
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device driversBsdtw17: ruslan bukin: free bsd/risc-v and device drivers
Bsdtw17: ruslan bukin: free bsd/risc-v and device drivers
 
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAP
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Embedded platform choices
Embedded platform choicesEmbedded platform choices
Embedded platform choices
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 

Kürzlich hochgeladen

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Kürzlich hochgeladen (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

Bsdtw17: johannes m dieterich: high performance computing and gpu acceleration: where do we stand?

  • 1. Johannes M Dieterich: High Performance Computing and GPU acceleration: WhereDo We Stand? Personal Background ● Computational Chemist ● high-performance computing developer and user FreeBSD experience ● user since 6.1 ● development platform since 7.1 ● ports committer since Jan 2017 HPC ● large problems HPC usage by chemistry subfield ● materials, ... Cluster science ● experiment and simulation Typical HPC Installations ● 3552 nodes, 85k CPU cores, 2.7 PFLOPS, 222TB Ram, 2.7PByte storage HPC Ecosystem ● HW, OS, Tools, Numerical libs, Parallelization, Languages Processor Architecture ● x86 dominates the top500 ● a little bit of Power, Sparc64 OS ● Linux dominates ● a little bit of Unix (non x86 archs) (Free)BSD and HPC ● Strengths ○ poudriere and pkg ■ custom, optimized packages for a whole cluster ○ libs, compilers setup ease ○ 31k+ ports ○ network and kernel performance ○ some automated regression testing ○ related: as virtualization host for contained HPC applications Why Should We All Care? ● tech and visibility ○ GPU accel ○ numerical libs ○ performance ● Community capital ○ HPC developers ○ HPC users ○ new ideas HPC Programming Languages ● Fortran dominates ● Some C/C++ ● Tiny bit of Python ● Large part unknown (probably still Fortran) ● data from ARCHER
  • 2. Compilers and Build Environments ● LLVM works well ● lang/gcc pitfalls: ○ -Wl,-rpath=/usr/local/lib/gcc5/ ○ C++ standard library ○ OpenMP implementation mixing ● USES= fortran/compiler:openmp goes to GCC ● Intel icpc, icc available HPC make environments ● normal makes (gmake, cmake, …) ● homebrew (bash, perl, python, ….) ● Makefiles hard code OS assumptions devel/flang: LLVM-based ● 64bit-only Fortran compiler ● fortran 77, 90, 95, 2003 well supported Status ● port works for amd64 ● patches not upstream ● based on LLVM 5.0 “jrm” use flang by default on x86 for math/R “jrm” added USES= fortran:flang Parallelization: inter-node (MPI) ● MPI ○ message-based (sync/async) ○ high latency ○ explicit support required ○ (queue support required) ○ mpicc/mpif90 + magic ● Support in FreeBSD ○ OpenMP, mpich ○ fortran compiler limitations Parallelization: intra-node (OpenMP) #pragma omp parallel for default (none) shared(grid,sum) ● pragma-based ● low latency ● offloading capable ● $CC -fopenmp Support in FreeBSD ● no support in base since LLVM switch ○ USES= compiler:openmp depends on lang/gcc ● D6362 use devel/llvm50 on x86 Software, Libs, and Capabilities ● Capabilities ○ queuing systems sysutils/slurm-wlm ○ networking ■ InfiniBand, Intel OmniPath ○ Libs ■ math/fftw3 ■ science/libint, libxc (exchange-correlation for quantum chemistry) ■ math/tblis WIP tensor contraction framework ■ … many other non comp-chemistry libs --- BLAS and LAPACK Linear Algebra ● standard interface for Fortran/C
  • 3. ● Linux: vendor libs like Intel MKL operations ● BLAS1; O(N) ○ e.g. dot product ● BLAS2: O(N^2) ○ e.g. matrix-vector product *gemv ● BLAS3: O(N^3) ○ e.g. matrix-matrix product ● LAPACK: complex ops ○ e.g. Choleskey-decomposition *syrk BLAS/LAPACK on FreeBSD: Mk/Uses/blaslapack.mk ● netlib (reference impl, not fast enough) ● atlas ● openblas ● gotoblas (legacy) ● blis/libflame dgemm performance on FreeBSD (Matrix-Matrix product) ● OpenBLAS g++/gfortran, ... ● (performance chart along M/N/K) Porting HPC Apps to FreeBSD: Examples ● TigerCI ○ open-source ○ C++/Fortran + OpenMP ○ cmake ○ 10s os users ○ 74k SLOC ● ADF suite ○ commercial ○ Fortran, some C/C++, tcl + MPI ○ homebrew Python build system ● Challenges ○ time / funding ○ long-term viability ○ Linux assumptions ■ e.g. #!/bin/bash ○ BSDisms ■ e.g. -Wl,-rpath= ○ port dependencies ● Benefits ○ portability ○ compliance ○ exposure of bugs Developing on FreeBSD ● Debug: gdb, lldb, valgrind ● Profiling: DTrace, hwpmc HPC: accelerators ● none is still most common ● NVIDIA: CUDA ● Small bit of Xeon Phi ● Tiny bit of AMD OpenCL GPU Accel ● CUDA ○ Fortran/C/C++, others ○ effectively proprietary
  • 4. ○ LLVM’s CUDA backend missing runtime libraries ● OpenCL ○ C/C++, others ○ general runtime loader ocl/loader OpenCL on FreeBSD: CPU ● POCL OpenCL: Intel ● beignet ● status: double precision support only experimental ○ double is critical for some algorithms OpenCL: AMD ● clover ● status: doesn’t really work Develop OpenCL On FreeBSD ● Tools ● devel/oclgrind: virtual OpenCL device simulator ● devel/cltune ● benchmarks/clpeak ● Libs:... Language Bindings: java/aparapi ● JNI wrapper for OpenCL Radeon Open Compute (ROCm) and FreeBSDDesktop ● Fundamentals ○ amdgpu: KMS driver ○ amdkfd: compute kernel driver ○ ROC Thunk: usermode library to amdkfd ○ runtime libraries ○ llvm fork for targetting ROCm ● Libs ○ OpenCL ○ HIP (CUDA emulator) ○ rocBLAS ○ rocFFT ○ MIOpen ○ hiptensorflow Wishlist: near-term ● math/blis ● OpenMP ○ D6362 into tree ● OpenCL ○ fix lang/pocl and lang/clover? ● more people, ports Wishlist: mid-term ● blaslapack:flame ● OpenMP ○ in base ● accel ○ amdkfd ○ ROCm ecosystem ● devel/flang ○ USES= fortran:flang for more ports Wishlist: long-term ● blaslapack:flame ● OpenMP
  • 5. ○ non x86 archs ● accel ○ OpenCL via ROCm ● devel/flang ○ non x86 archs ● devel/linuxisms port? Acknowledgements (long list of contributors) Q&A ● Audience remark: providing ports/linuxisms is not a problem at all. As long as no improper dependency exists. ● Will this support multiple GPUs in a single system? A: yes, multiple DRM devices. ● Experience with TI co-processor? A: no first hand experience. lang/pocl seems to have some support ● Why are there so few accelerators in top500? ○ A: a lot of legacy code. Publication bias. Models may not scale on certain GPU hardware due to memory bandwidth and size. ○ May change as language and frameworks evolve. Takes too much manual labor and thus limited by developer time for now.