SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Understanding and optimizing
parallelism in NumPy-based programs
Ralf Gommers
21 April 2022
First make it work, then make it fast
>>> %timeit main()
50.1 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> # ... perform some optimizations
>>> %timeit main()
9.58 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> # break out your profiler (e.g., py-spy), optimize some more
>>> %timeit main()
2.83 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
`htop` output
Approaches for performant numerical
code (single-threaded)
Vectorization Use compiled code
Python compilers Python interpreters
Pythran
CPython
Plus Cinder, Pyston, and more -- very experimental,
and limited gains for numerical code
multiprocessing & multithreading
A key issue: oversubscription
Package A sees N CPU cores, and decides to use them all:
A key issue: oversubscription
Package B, which uses package A, or the end user decides to use
multiprocessing, 1 process per core:
The more CPU cores a machine has, the worse the effect is!
Parallel APIs & behavior: NumPy
NumPy is single-threaded, no code in NumPy is written for parallel execution.
However, most numpy.linalg functions (those using BLAS or LAPACK) execute in
parallel. They use all available cores on a machine.
NumPy does release the GIL wherever it can.
numpy.random has specific APIs to allow users to:
(a) Obtain independent streams for random number generation across
processes (local or distributed)
(b) Perform multithreaded random number generation
Parallel APIs & behavior: SciPy
SciPy is single-threaded by default (same as NumPy)
Calls to functionality using BLAS or LAPACK is again multithreaded:
● primarily in scipy.linalg and scipy.sparse.linalg,
● also higher-level functionality using linear algebra under the hood:
kernel density estimation, multivariate distributions etc. in scipy.stats,
vector quantization in scipy.cluster, interpolators in scipy.interpolate,
optimizers in scipy.optimize, and more
Some APIs have a workers=1 keyword, which allows the user to control the
number of processes or threads. Or pass in a custom Pool.
scipy.fft provides a context manager:
Parallel APIs & behavior: SciPy
An example using workers=:
Parallel APIs & behavior: scikit-learn
Scikit-learn is mostly single-threaded by default.
However, more and more functionality uses OpenMP for automatic
parallelization. This defaults to the number of virtual (not physical) CPU cores.
Many scikit-learn APIs offer a n_jobs= keyword to let user enable multiple
threads or processes via joblib.
Scikit-learn implements fairly complex control of NumPy/SciPy’s BLAS and
LAPACK libraries to prevent oversubscription in the presence of
multiprocessing on top of multi-threading. This is done via the threadpoolctl
package.
Controlling parallelism - packages
Dependencies (Conda)
Controlling parallelism - packages
Dependencies (PyPI)
Controlling parallelism - packages
Conda PyPI
Tuning the default behavior
Default behavior is inconsistent: too aggressive for linear algebra, and too
conservative for workers (SciPy) and n_jobs (scikit-learn)!
OpenBLAS, MKL and OpenMP don’t have a nice API, only environment variables:
For scikit-learn you can explicitly choose a backend (but defaults are usually fine):
Tuning the default behavior
NumPy, SciPy and scikit-learn all recommend using threadpoolctl in case you
want more granular control over threading behavior of BLAS, LAPACK and
OpenMP libraries (or cannot set environment variables):
A pitfall on multi-tenant machines
Multi-tenant machines: N “vCPU” (virtual CPU) cores for you, M in total.
CircleCI gives you 2 cores for a CI job, on a 64 core machine (and
os.cpu_count() reports 64). Set OPENBLAS_NUM_THREADS=2 to avoid problems!
GitHub Actions, Azure DevOps and other services are better behaved.
The impact can be severe:
Parallel random number generation
Parallel random number generation
First what not to do – simply drawing random numbers in different
subprocesses will give you the same numbers in each process:
Parallel random number generation
Use SeedSequence to obtain independent streams easily:
Parallel random number generation
Second option: use the .jumped() method of BitGenerator instances to obtain
independent streams easily:
Parallel random number generation
Where is NumPy going - technical
Interoperability
Array API standard support
Extensibility
Easier custom dtypes
Performance
SIMD acceleration on:
x86, arm64, PPC, …?
C++
Just dipping our toes in the
water here - so far it was just
Python and C
Platform support
PPC, AIX, s390x,
cross-compiling to embedded
ARM systems, ...
Type annotations
Main namespace annotations
just completed
Note what is not on this list: auto-parallelization
Resources to learn more
Scikit-learn:
https://scikit-learn.org/stable/computing/parallelism.html
https://joblib.readthedocs.io/en/latest/parallel.html
SciPy:
http://scipy.github.io/devdocs/dev/toolchain.html#openmp-support
http://scipy.github.io/devdocs/search.html?q=workers
NumPy:
https://numpy.org/doc/stable/reference/random/parallel.html
Relevant paper: Composable Multi-Threading and Multi-Processing for Numeric Libraries
Find me at: ralf.gommers@gmail.com, rgommers, ralfgommers
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트
[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트
[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트NAVER CLOUD PLATFORMㅣ네이버 클라우드 플랫폼
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsMahendra M
 
Introduction to Embedded Linux
Introduction to Embedded LinuxIntroduction to Embedded Linux
Introduction to Embedded LinuxHossain Reja
 
Introduction to the Python conda package manager
Introduction to the Python conda package managerIntroduction to the Python conda package manager
Introduction to the Python conda package managerDamien Garaud
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKMarian Marinov
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisBuland Singh
 
2.2 cycles de vie
2.2 cycles de vie2.2 cycles de vie
2.2 cycles de vieHarun Mouad
 
Introduction au Génie Logiciel
Introduction au Génie LogicielIntroduction au Génie Logiciel
Introduction au Génie Logicielguest0032c8
 
Analyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dumpAnalyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dumpGaëtan Trellu
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stackAnne Nicolas
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 

Was ist angesagt? (20)

[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트
[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트
[온라인교육시리즈] Jupyter를 이용한 분석 환경 구축하기 - 허창현 클라우드 솔루션 아키텍트
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Kernel crashdump
Kernel crashdumpKernel crashdump
Kernel crashdump
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded Systems
 
Introduction to Embedded Linux
Introduction to Embedded LinuxIntroduction to Embedded Linux
Introduction to Embedded Linux
 
Introduction to the Python conda package manager
Introduction to the Python conda package managerIntroduction to the Python conda package manager
Introduction to the Python conda package manager
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
 
Linux Kernel Overview
Linux Kernel OverviewLinux Kernel Overview
Linux Kernel Overview
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
2.2 cycles de vie
2.2 cycles de vie2.2 cycles de vie
2.2 cycles de vie
 
Introduction au Génie Logiciel
Introduction au Génie LogicielIntroduction au Génie Logiciel
Introduction au Génie Logiciel
 
Analyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dumpAnalyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dump
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stack
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 

Ähnlich wie Parallelism in a NumPy-based program

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Akihiro Hayashi
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeAlessio Coltellacci
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Distributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with RayDistributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with RayJan Margeta
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSPeterAndreasEntschev
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)Marcel Caraciolo
 

Ähnlich wie Parallelism in a NumPy-based program (20)

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient code
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Multicore
MulticoreMulticore
Multicore
 
Distributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with RayDistributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with Ray
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 

Mehr von Ralf Gommers

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfRalf Gommers
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with PythonRalf Gommers
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefitsRalf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with PythranRalf Gommers
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeRalf Gommers
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019Ralf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeRalf Gommers
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in PythonRalf Gommers
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related conceptsRalf Gommers
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumRalf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionRalf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeRalf Gommers
 

Mehr von Ralf Gommers (13)

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
 

Kürzlich hochgeladen

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Kürzlich hochgeladen (20)

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Parallelism in a NumPy-based program

  • 1. Understanding and optimizing parallelism in NumPy-based programs Ralf Gommers 21 April 2022
  • 2. First make it work, then make it fast >>> %timeit main() 50.1 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> # ... perform some optimizations >>> %timeit main() 9.58 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> # break out your profiler (e.g., py-spy), optimize some more >>> %timeit main() 2.83 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • 4. Approaches for performant numerical code (single-threaded) Vectorization Use compiled code Python compilers Python interpreters Pythran CPython Plus Cinder, Pyston, and more -- very experimental, and limited gains for numerical code
  • 6. A key issue: oversubscription Package A sees N CPU cores, and decides to use them all:
  • 7. A key issue: oversubscription Package B, which uses package A, or the end user decides to use multiprocessing, 1 process per core: The more CPU cores a machine has, the worse the effect is!
  • 8. Parallel APIs & behavior: NumPy NumPy is single-threaded, no code in NumPy is written for parallel execution. However, most numpy.linalg functions (those using BLAS or LAPACK) execute in parallel. They use all available cores on a machine. NumPy does release the GIL wherever it can. numpy.random has specific APIs to allow users to: (a) Obtain independent streams for random number generation across processes (local or distributed) (b) Perform multithreaded random number generation
  • 9. Parallel APIs & behavior: SciPy SciPy is single-threaded by default (same as NumPy) Calls to functionality using BLAS or LAPACK is again multithreaded: ● primarily in scipy.linalg and scipy.sparse.linalg, ● also higher-level functionality using linear algebra under the hood: kernel density estimation, multivariate distributions etc. in scipy.stats, vector quantization in scipy.cluster, interpolators in scipy.interpolate, optimizers in scipy.optimize, and more Some APIs have a workers=1 keyword, which allows the user to control the number of processes or threads. Or pass in a custom Pool. scipy.fft provides a context manager:
  • 10. Parallel APIs & behavior: SciPy An example using workers=:
  • 11. Parallel APIs & behavior: scikit-learn Scikit-learn is mostly single-threaded by default. However, more and more functionality uses OpenMP for automatic parallelization. This defaults to the number of virtual (not physical) CPU cores. Many scikit-learn APIs offer a n_jobs= keyword to let user enable multiple threads or processes via joblib. Scikit-learn implements fairly complex control of NumPy/SciPy’s BLAS and LAPACK libraries to prevent oversubscription in the presence of multiprocessing on top of multi-threading. This is done via the threadpoolctl package.
  • 12. Controlling parallelism - packages Dependencies (Conda)
  • 13. Controlling parallelism - packages Dependencies (PyPI)
  • 14. Controlling parallelism - packages Conda PyPI
  • 15. Tuning the default behavior Default behavior is inconsistent: too aggressive for linear algebra, and too conservative for workers (SciPy) and n_jobs (scikit-learn)! OpenBLAS, MKL and OpenMP don’t have a nice API, only environment variables: For scikit-learn you can explicitly choose a backend (but defaults are usually fine):
  • 16. Tuning the default behavior NumPy, SciPy and scikit-learn all recommend using threadpoolctl in case you want more granular control over threading behavior of BLAS, LAPACK and OpenMP libraries (or cannot set environment variables):
  • 17. A pitfall on multi-tenant machines Multi-tenant machines: N “vCPU” (virtual CPU) cores for you, M in total. CircleCI gives you 2 cores for a CI job, on a 64 core machine (and os.cpu_count() reports 64). Set OPENBLAS_NUM_THREADS=2 to avoid problems! GitHub Actions, Azure DevOps and other services are better behaved. The impact can be severe:
  • 19. Parallel random number generation First what not to do – simply drawing random numbers in different subprocesses will give you the same numbers in each process:
  • 20. Parallel random number generation Use SeedSequence to obtain independent streams easily:
  • 21. Parallel random number generation Second option: use the .jumped() method of BitGenerator instances to obtain independent streams easily:
  • 23. Where is NumPy going - technical Interoperability Array API standard support Extensibility Easier custom dtypes Performance SIMD acceleration on: x86, arm64, PPC, …? C++ Just dipping our toes in the water here - so far it was just Python and C Platform support PPC, AIX, s390x, cross-compiling to embedded ARM systems, ... Type annotations Main namespace annotations just completed Note what is not on this list: auto-parallelization
  • 24. Resources to learn more Scikit-learn: https://scikit-learn.org/stable/computing/parallelism.html https://joblib.readthedocs.io/en/latest/parallel.html SciPy: http://scipy.github.io/devdocs/dev/toolchain.html#openmp-support http://scipy.github.io/devdocs/search.html?q=workers NumPy: https://numpy.org/doc/stable/reference/random/parallel.html Relevant paper: Composable Multi-Threading and Multi-Processing for Numeric Libraries
  • 25. Find me at: ralf.gommers@gmail.com, rgommers, ralfgommers Thank you!