SlideShare ist ein Scribd-Unternehmen logo
1 von 78
Downloaden Sie, um offline zu lesen
© 2017 Continuum Analytics - Confidential & Proprietary
Array Computing and the Evolution of
SciPy, NumPy, and PyData
Travis E. Oliphant, PhD
February 13, 2020
travis@quansight.com
@teoliphant
Distinguished Lecture
Columbia University
travis@openteams.com
Published: February 3, 2020
Project
Started:
1998
Patience and
Persistence and
Grit
1998 20182001
2015
2009 20122005
…
2001
2006
SciPy, NumPy, and PyData Time-Line
1991
2003
2014
2008
2010 2016
2009
My Passions
Started my career in computational science
Satellites Measure Backscatter
Computer Algorithms Produce
Estimate of Earth Features
• Wind Speed
• Ice Cover
• Vegetation
• (and more)
More Science led to Python
Raja Muthupillai
Armando Manduca
Richard Ehman
1997
Jim Greenleaf
First Project (1998 — )
Started as Multipack in 1998 and became
SciPy in 2001 with the help of other
colleagues
115 releases, 815 contributors
Used by: 156,525
SciPy
“Distribution of Python Numerical Tools masquerading as one Library”
Name Description
cluster KMeans and Vector Quantization
fftpack Discrete Fourier Transform
integrate Numerical Integration
interpolate Interpolation routines
io Data Input and Output
linalg Fast Linear algebra
misc Utilities
ndimage N-dimensional Image processing
Name Description
odr Orthogonal Distance Regression
optimize
Constrained and Unconstrained
Optimization
signal Signal Processing Tools
sparse Sparse Matrices and Algebra
spatial Spatial Data Structures and Algorithms
special Special functions (e.g. Bessel)
stats Statistical Functions and Distributions
Professor at BYU
Scanning Impedance Imaging
My Open Source
addiction continued…
Gave up my chance at tenured academic position
in 2005-2006 to bring together the diverging
array community in Python and bring Numeric
and Numarray together.
166 releases, 866 contributors
Used by: 314,759
NumPy: an Array Extension of Python
• Data: the array object
– slicing and shaping
– data-type map to bytes (dtype)
• Fast Math (ufuncs):
– vectorization
– broadcasting
– aggregations
Brief History of NumPy
Person Package Year
Jim Fulton Matrix Object 1994
Jim Hugunin Numeric 1995
Perry Greenfield,
Rick White,Todd
Miller
Numarray 2001
Travis Oliphant NumPy 2005
NumPy was created to unify array objects
in Python and unify PyData community
Numeric
Numarray
NumPy
I started this unification project and ended up sacrificing my tenure
at a University to write and release NumPy.
My little “side projects” became my life
Making “Array Oriented Programming” Popular
renamed
~20 million (Ana)conda users
spun-out
Past 5 years have seen a
resurgence of array-oriented
computing because of…
Machine Learning and AI
Java
JavaScript
Python
Google Search Trends
Jun 2019
NumPy
Tensorflow
Scikit Learn
PyTorch
NumPy
Pandas
Python and in particular PyData keeps Growing
Python’s Scientific Ecosystem
Bokeh
Jake Vanderplas PyCon 2017 Keynote
Not all open-source is the same!
Community-Driven
Open Source
Software (CDOSS)
Company-Backed
Open Source
Software (CBOSS)
• Anyone can become the leader.
• Multiple-stake holders.
• Can look at community size for health.
• Users become contributors more often.
• Examples:
• Jupyter
• NumPy
• SciPy
• Pandas
• Need to work at a company to be the
leader,
• Many users, fewer developers
• Need to understand incentive of company
to understand health
• Examples:
• Tensorflow
• PyTorch
• Conda
Both can be valuable, but have different implications!
Governance
models
Huge Impact (from diverse efforts of 1000s)
LIGO : Gravitional Waves
Higgs Boson
Discovery
Black Hole
Imaging
AI is everywhere
Example — Amazon Photo
Automatic Facial
recognition
User feedback
on face names
updates model
Neural network with
several layers trained
with ~130,000 images.
Matched trained
dermatologists with 91%
area under sensitivity-
specificity curve.
Keys:
• Access to Data
• Access to Software
• Access to Compute
Python
has taken
over!
Thanks to 1000s of
of my “closest”
friends who worked
on all the libraries
We won!
(sort of)
Downloads
49 Million
Estimated Cost
$7.57 Million
Contributors
866
Estimated Effort
76 person-years
3
Current Maintainers
Downloads
27.7 Million
Estimated Cost
$7 Million
Contributors
1,666
Estimated Effort
70 person-years
3
Current Maintainers
Downloads
13.8 Million
Estimated Cost
$6.63 Million
Contributors
860
Estimated Effort
64 person-years
2
Current Maintainers
Development began in 2003
Development began in 2005
Development began in 2008
The original developers were not paid to work on or improve these libraries!
OSS Sustainability
• Developers get “burned-out” when many
people use their tools but there is no
money to maintain or improve them.
• Developers can live unbalanced lives.
• Multi-billion dollar companies are
benefiting from volunteer labor and not
giving back.
• Foundational libraries are not maintained
and key insights from creators don’t get
back into the code.
For example: Here was my list for
NumPy in 2012
• NDArray improvements
• Indexes (esp. for Structured arrays)
• SQL front-end
• Multi-level, hierarchical labels
• selection via mappings (labeled arrays)
• Memory spaces (array made up of regions)
• Distributed arrays (global array)
• Compressed arrays
• Standard distributed persistence
• fancy indexing as view and optimizations
• streaming arrays
• Dtype improvements
• Enumerated types (including dynamic enumeration)
• Derived fields
• Specification as a class (or JSON)
• Pointer dtype (i.e. C++ object, or varchar)
• Finishing datetime
• Missing data with both bit-patterns and mask
• Parameterized field names
• Ufunc improvements
• Generalized ufuncs support more than just contiguous arrays
• Specification of ufuncs in Python
• Move most dtype “array functions” to ufuncs
• Unify error-handling for all computations
• Allow lazy-evaluation and remote computation --- streaming and generator data
• Structured and string dtype ufuncs
• Multi-core and GPU optimized ufuncs
• Group-by reduction
Multiple other unrealized epiphanies…
• In 2014, I finally realized how I should have built dtypes (inheriting from a new
“meta-type” so all NumPy “dtypes" are actually real Python types. This would
have eliminated the need for the “ugly” array-scalars (but semantically necessary
in the current system).
• NumPy should have a smaller interface API that other array libraries could
implement instead of the entire API becoming a de facto array API.
• GPU and parallel-executing UFuncs should be built-in
• Apply-by and reduce-by should be NumPy functions.
I’ve never received budget to work on NumPy or SciPy (until this year with a CZI grant from
Facebook). Part of this is because I pursued other entrepreneurial mechanisms to generate
resources, but part of this is because granting mechanisms are not setup to “maintain”
community-driven open-source software.
1. Python is the “Lingua Franca” for technical computing and
machine learning / AI
2. Python Reached this status because it embraced array-
oriented computing (NumPy and Pandas)
3. "Emergent” community-driven Open-source has a
sustainability problem.
Major Conclusions:
We (basically) realized our ultimate goal when we started SciPy in 1999!
But, we are still searching for the means to sustain.
What is array-oriented computing
• Organize data together logically (and in memory)
• Operate on “chunks” at a time with high-level
operations: (map, join, reduce, transform, apply,
filter)
Memory using Object-oriented
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Array-oriented (Table) approach
Attr1 Attr2 Attr3
Object1
Object2
Object3
Object4
Object5
Object6
Benefits of Array-oriented
• Many technical problems are naturally array-
oriented (easy to vectorize)
• Algorithms can be expressed at a high-level
• These algorithms can be parallelized more simply
(quite often much information is lost in the
translation to typical “compiled” languages)
• Array-oriented algorithms map to modern hard-
ware caches and pipelines.
• Software stack now starting to re-focus with ML
frameworks emerging.
• There is a reason Fortran remains popular.
NumPy Array
shape
NumPy Examples
2d array
3d array
[439 472 477]
[217 205 261 222 245 238]
9.98330639789 2.96677717122
NumPy Slicing (Selection)
>>> a[0,3:5]
array([3, 4])
>>> a[4:,4:]
array([[44, 45],
[54, 55]])
>>> a[:,2]
array([2,12,22,32,42,52])
50 51 52 53 54 55
40 41 42 43 44 45
30 31 32 33 34 35
20 21 22 23 24 25
10 11 12 13 14 15
0 1 2 3 4 5
>>> a[2::2,::2]
array([[20, 22, 24],
[40, 42, 44]])
Quick History life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
1966: APL
1984: APL2
1990: J
1993: K -> Q
2019: (new version of K)
1996: Numeric (Python)
2006: NumPy
2012: Numba
Arthur Whitney (used by KDB)
Arthur Whitney
Jim Hugunin
Travis Oliphant
Siu Kwan Lam
Ken Iverson
Ken Iverson (IBM)
IBM
APL
J
K Matlab
Numeric
NumPy
Putting Science back in Comp Sci
• Much of the software stack is for systems programming --- C++,
Java, .NET, ObjC, web
• This has been great for desktop computing but terrible for science:
- Complex numbers?
- Vectorized primitives?
- Multidimensional arrays?
• Array-oriented programming was supplanted by Object-oriented
programming
• Software stack for scientists was not as helpful as it should be
• Fortran is still where many scientists ended up
• Past 5 years this is changing with emergence of Python, Jupyter,
Pandas, PyTorch (we still have a long way to go).
Array-Oriented Computing
Example1: Fibonacci Numbers
fn = fn 1 + fn 2
f0 = 0
f1 = 1
f = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . .
Common Python approaches
Recursive
Iterative
Algorithm matters!!
Array-oriented approaches
Using LFilter
Using Formula
Array-oriented approaches
Conway’s game of Life
• Dead cell with exactly 3 live neighbors
will come to life
• A live cell with 2 or 3 neighbors will
survive
• With too few or too many neighbors, the
cell dies
Interesting Patterns emerge
Conway’s Game of Life
APL
NumPy
Initialization
Update Step
life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
Zen of NumPy
• strided is better than scattered
• contiguous is better than strided
• descriptive is better than imperative
• array-oriented is better than object-oriented
• broadcasting is a great idea
• vectorized is better than an explicit loop
• unless it’s too complicated or uses too much memory ---
then use Numba
• think in higher dimensions
Inspired by Tim Peter and “import this”
What is good about NumPy?
• Array-oriented
• Extensive Dtype System (including structures)
• C-API
• Simple to understand data-structure
• Memory mapping
• Syntax support from Python
• Large community of users
• Broadcasting
• Easy to interface C/C++/Fortran code
What is wrong with NumPy
• Dtype system is difficult to extend
• Immediate mode creates huge temporaries
• “Almost” an in-memory data-base comparable to
SQL-lite (missing indexes)
• Integration with sparse arrays
• Lots of un-optimized parts
• Minimal support for multi-core / GPU
• Code-base is organic and hard to extend
• Tied to CPython run-time (doesn’t work on
other Python implementations)
Python Origins.
Version Date
0.9.0 Feb. 1991
0.9.4 Dec. 1991
0.9.6 Apr. 1992
0.9.8 Jan. 1993
1.0.0 Jan. 1994
1.2 Apr. 1995
1.4 Oct. 1996
1.5.2 Apr. 1999
How I got involved…
Getting data into memory — fast!
http://www.python.org/doc/essays/refcnt/
Reference Counting Essay
May 1998
Guido van Rossum
TableIO
April 1998
Michael A. Miller
NumPyIO
June 1998
How SciPy started…
Discussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis environment: Paul Barrett, Joe Harrington,
Perry Greenfield, Paul Dubois, Konrad Hinsen, and others. Activity in 1998, led to increased interest in 1999.
In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be present and began wrapping / writing in
earnest. On 6 April 1999, I announced I would be creating this uber-package which eventually became SciPy
Gaussian quadrature 5 Jan 1999
cephes 1.0 30 Jan 1999
sigtools 0.40 23 Feb 1999
Numeric docs March 1999
cephes 1.1 9 Mar 1999
multipack 0.3 13 Apr 1999
Helper routines 14 Apr 1999
multipack 0.6 (leastsq, ode, fsolve,
quad)
29 Apr 1999
sparse plan described 30 May 1999
multipack 0.7 14 Jun 1999
SparsePy 0.1 5 Nov 1999
cephes 1.2 (vectorize) 29 Dec 1999
Joined with others…
Started as Multipack in 1998 and became
SciPy in 2001 with the help of other
colleagues
115 releases, 815 contributors
Used by: 156,525
Don’t underestimate the importance of Team!
Anaconda success also depended on going from individual to a team
>700 contributors
Other People Matter
Know your model is incomplete:
• see people as “ends” not your “means”
• Believe in, love, and trust other people.
The Social Brain Hypothesis and Human Evolution, Robin I. M. Dunbar
Use your brain to adapt to other people
— this is why your brain is so big!
Hypothesis: You carry and update “models of
people" in your head. From very detailed to
approximate. Dunbar numbers!
Keep Open Mind: Be open to critique
dtype ctypes
PEP 3118 debate
over how to describe memory
vs.
Current me disagrees with past me!
I am glad there were others in the
debate.
Return good for evil
Hard because of our brains!
]
https://github.com/josephmisiti/awesome-machine-learning#python-general-purpose
http://deeplearning.net/software_links/
http://scikit-learn.org/stable/related_projects.html
Explosion of ML Frameworks and libraries
TVM/NNVM
We have a “divided” community again!
Numeric
Numarray
NumPy
Examples of packages being built on
differing standards
FastAI
skorch
Pyro
Eduard
anyrl
Braid
PyMC4
Horovod
MLFlow
But note
Unification Efforts
Train the
Model
Deploy the
Model
Platform1
Platform 2
Deploy the
Model
Platform 3
NNVM / TVM — Ambitious Plan at UW
What is next? What am I
working on for the next 20
years…
Technology and Economic problems
1. General interoperability — low-level libraries that reduce silos of data and analysis
2. Better High-level APIs (more interfaces in Python supported by multiple
implementations)
3. Data Management — in particular Data Catalogues
4. Fixing Python’s Extension problem (the ecosystem helped Python grow but is also and
anchor to it’s progress)
5. How to connect the trillions of dollars of market capital to the innovation available in
global, emergent, open-source communities.
High Level APIs for Arrays (Tensors),
DataFrames, and DataTypes
LABS
The extensions are an anchor to
Python runtime progress!
CPython C-API
What will work!
• Create a statically typed subset
of Python that is then used to
extend Python — EPython
• Port NumPy, SciPy, Scikits to
EPython (borrow heavily from
Cython ideas but use mypy-style
typing instead of new syntax).
LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance (PyData Core Team)
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab and JupyterHub
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
PySparse - sparse n-d arrays
Ibis - Pandas-like front-end to SQL
uarray — unified array interface for SciPy refactor
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Collaborating with NumFOCUS!
Bokeh
Adapted from Jake Vanderplas
PyCon 2017 Keynote
Build and Connect
Companies and
Communities to
Solve Challenging
Problems with Data
Enables me to keep working on array-
computing problems *and* meta-
problem of open-source funding.
Complete open-source
service consulting in the
PyData / NumFOCUS
ecosystem including
data-science and ML
We provide part-time
CTO work, custom
software, staff
augmentation, support,
training, staffing, and
mentoring
Open Source Research
Lab supporting the
NumFOCUS and
PyData Community.
Hiring developers,
evangelists, tech
writers, designers, and
product managers for
open-source projects.
Early stage funding
to companies that
provide return to
investors and
support open source
ecosystems with
industry disrupting
products and
services
Services Open Source Lab Venture Fund
Three Activities with One Mission
Some of the projects we support
Sparse
Fast Foundational ND-Array (Tensor)
object for Python
Extensive Library of Functions for NumPy
GPU-enabled Compiler for NumPy/Python
Parallel and Scaled Pandas and NumPy
DataFrames for general data-manipulation
and statistics and
Notebook environments for rapid
development and data analysis
Desktop IDE for data-science and
ML
Rapid development of Dashboards for
Python/PyData ecosystem.
Easy and fast web-based interactive
plots using Python.
Turn even very large datasets into
images, accurately.
General Sparse Arrays for Python
Cross-language libraries for array
computing
General and powerful symbolic mathematical
library
Very popular and powerful machine learning
library
An early stage venture
capital firm investing in
startups that build on
open-source technology
and support the
communities they depend
on (11 companies)
supporting
FairOSS
$20m fund
Problem
Open Source Teams
! Burned out
! Underrepresented
! Underpaid
Organizations
! Disconnected from
the Community
! Lack support and
maintenance
There’s no easy way to connect the
community with organizations
Open Source Marketplace
Managing Partners
! Provide Open Source Services
! Training / Support
! Feature development / fixes
Funding Partners
! Hire from the community
! Collectively fund
! Get support they need to
build effectively on open-
source.
Open-source Contributors create
profiles for themselves and their
projects and participate as actors in
the market.
FairOSS
A Public Benefit Company (goal is growing amount of freely available software)
• Owned by open-source contributors (will be doing a public fund-raise later this year)
• Those share-holders govern the organization (elect the board).
• Board appoints management and decides what is “fair”
Holds Companies accountable
• Allows usage of its trademarks only for companies that contribute back “fairly”
• Think “Kosher” or “Organic labeled”
• Companies give back by equity, revenue, and “in-kind” agreements with FairOSS
FairOSS is custodian of Revenue and Equity Agreements
• Equity agreements mean that FairOSS holds shares, options, or warrants of the company
(most companies are missing open-source community from their ‘cap-table’)
• Revenue agreements mean that companies pay FairOSS a portion of their revenue.
• FairOSS distributes almost all of the proceeds from these agreements to the open-
source communities.
If successful — this would make OpenSource investable and
make available >$23,000,000,000,000 (trillion) of investment
capital to open-source communities.
You can really change the world…
With Open Source Communities…
Let’s do more of that!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsIntroducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applications
 
Icpp power ai-workshop 2018
Icpp power ai-workshop 2018Icpp power ai-workshop 2018
Icpp power ai-workshop 2018
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기 Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
Koss Lab 세미나 오픈소스 인공지능(AI) 프레임웍파헤치기
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2O
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
 
Getting started with TensorFlow
Getting started with TensorFlowGetting started with TensorFlow
Getting started with TensorFlow
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
Data Science at the Command Line
Data Science at the Command LineData Science at the Command Line
Data Science at the Command Line
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit Recap
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
 

Ähnlich wie Array computing and the evolution of SciPy, NumPy, and PyData

Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 

Ähnlich wie Array computing and the evolution of SciPy, NumPy, and PyData (20)

Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
London level39
London level39London level39
London level39
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
3 python packages
3 python packages3 python packages
3 python packages
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Machine learning from software developers point of view
Machine learning from software developers point of viewMachine learning from software developers point of view
Machine learning from software developers point of view
 
Python libraries
Python librariesPython libraries
Python libraries
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
FEC2017-Introduction-to-programming
FEC2017-Introduction-to-programmingFEC2017-Introduction-to-programming
FEC2017-Introduction-to-programming
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Overview
OverviewOverview
Overview
 
Python Introduction its a oop language and easy to use
Python Introduction its a oop language and easy to usePython Introduction its a oop language and easy to use
Python Introduction its a oop language and easy to use
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptx
 

Mehr von Travis Oliphant

Mehr von Travis Oliphant (13)

PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for Python
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
 
Numba
NumbaNumba
Numba
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Array computing and the evolution of SciPy, NumPy, and PyData

  • 1. © 2017 Continuum Analytics - Confidential & Proprietary Array Computing and the Evolution of SciPy, NumPy, and PyData Travis E. Oliphant, PhD February 13, 2020 travis@quansight.com @teoliphant Distinguished Lecture Columbia University travis@openteams.com
  • 2. Published: February 3, 2020 Project Started: 1998 Patience and Persistence and Grit
  • 3. 1998 20182001 2015 2009 20122005 … 2001 2006 SciPy, NumPy, and PyData Time-Line 1991 2003 2014 2008 2010 2016 2009
  • 5. Started my career in computational science Satellites Measure Backscatter Computer Algorithms Produce Estimate of Earth Features • Wind Speed • Ice Cover • Vegetation • (and more)
  • 6. More Science led to Python Raja Muthupillai Armando Manduca Richard Ehman 1997 Jim Greenleaf
  • 7. First Project (1998 — ) Started as Multipack in 1998 and became SciPy in 2001 with the help of other colleagues 115 releases, 815 contributors Used by: 156,525
  • 8. SciPy “Distribution of Python Numerical Tools masquerading as one Library” Name Description cluster KMeans and Vector Quantization fftpack Discrete Fourier Transform integrate Numerical Integration interpolate Interpolation routines io Data Input and Output linalg Fast Linear algebra misc Utilities ndimage N-dimensional Image processing Name Description odr Orthogonal Distance Regression optimize Constrained and Unconstrained Optimization signal Signal Processing Tools sparse Sparse Matrices and Algebra spatial Spatial Data Structures and Algorithms special Special functions (e.g. Bessel) stats Statistical Functions and Distributions
  • 9. Professor at BYU Scanning Impedance Imaging
  • 10. My Open Source addiction continued… Gave up my chance at tenured academic position in 2005-2006 to bring together the diverging array community in Python and bring Numeric and Numarray together. 166 releases, 866 contributors Used by: 314,759
  • 11. NumPy: an Array Extension of Python • Data: the array object – slicing and shaping – data-type map to bytes (dtype) • Fast Math (ufuncs): – vectorization – broadcasting – aggregations
  • 12. Brief History of NumPy Person Package Year Jim Fulton Matrix Object 1994 Jim Hugunin Numeric 1995 Perry Greenfield, Rick White,Todd Miller Numarray 2001 Travis Oliphant NumPy 2005
  • 13. NumPy was created to unify array objects in Python and unify PyData community Numeric Numarray NumPy I started this unification project and ended up sacrificing my tenure at a University to write and release NumPy.
  • 14. My little “side projects” became my life
  • 15. Making “Array Oriented Programming” Popular renamed ~20 million (Ana)conda users spun-out
  • 16. Past 5 years have seen a resurgence of array-oriented computing because of… Machine Learning and AI
  • 19. Python and in particular PyData keeps Growing
  • 20. Python’s Scientific Ecosystem Bokeh Jake Vanderplas PyCon 2017 Keynote
  • 21. Not all open-source is the same! Community-Driven Open Source Software (CDOSS) Company-Backed Open Source Software (CBOSS) • Anyone can become the leader. • Multiple-stake holders. • Can look at community size for health. • Users become contributors more often. • Examples: • Jupyter • NumPy • SciPy • Pandas • Need to work at a company to be the leader, • Many users, fewer developers • Need to understand incentive of company to understand health • Examples: • Tensorflow • PyTorch • Conda Both can be valuable, but have different implications! Governance models
  • 22. Huge Impact (from diverse efforts of 1000s) LIGO : Gravitional Waves Higgs Boson Discovery Black Hole Imaging
  • 24. Example — Amazon Photo Automatic Facial recognition User feedback on face names updates model
  • 25. Neural network with several layers trained with ~130,000 images. Matched trained dermatologists with 91% area under sensitivity- specificity curve. Keys: • Access to Data • Access to Software • Access to Compute
  • 26. Python has taken over! Thanks to 1000s of of my “closest” friends who worked on all the libraries We won! (sort of)
  • 27. Downloads 49 Million Estimated Cost $7.57 Million Contributors 866 Estimated Effort 76 person-years 3 Current Maintainers Downloads 27.7 Million Estimated Cost $7 Million Contributors 1,666 Estimated Effort 70 person-years 3 Current Maintainers Downloads 13.8 Million Estimated Cost $6.63 Million Contributors 860 Estimated Effort 64 person-years 2 Current Maintainers Development began in 2003 Development began in 2005 Development began in 2008 The original developers were not paid to work on or improve these libraries!
  • 28. OSS Sustainability • Developers get “burned-out” when many people use their tools but there is no money to maintain or improve them. • Developers can live unbalanced lives. • Multi-billion dollar companies are benefiting from volunteer labor and not giving back. • Foundational libraries are not maintained and key insights from creators don’t get back into the code.
  • 29. For example: Here was my list for NumPy in 2012 • NDArray improvements • Indexes (esp. for Structured arrays) • SQL front-end • Multi-level, hierarchical labels • selection via mappings (labeled arrays) • Memory spaces (array made up of regions) • Distributed arrays (global array) • Compressed arrays • Standard distributed persistence • fancy indexing as view and optimizations • streaming arrays • Dtype improvements • Enumerated types (including dynamic enumeration) • Derived fields • Specification as a class (or JSON) • Pointer dtype (i.e. C++ object, or varchar) • Finishing datetime • Missing data with both bit-patterns and mask • Parameterized field names • Ufunc improvements • Generalized ufuncs support more than just contiguous arrays • Specification of ufuncs in Python • Move most dtype “array functions” to ufuncs • Unify error-handling for all computations • Allow lazy-evaluation and remote computation --- streaming and generator data • Structured and string dtype ufuncs • Multi-core and GPU optimized ufuncs • Group-by reduction
  • 30. Multiple other unrealized epiphanies… • In 2014, I finally realized how I should have built dtypes (inheriting from a new “meta-type” so all NumPy “dtypes" are actually real Python types. This would have eliminated the need for the “ugly” array-scalars (but semantically necessary in the current system). • NumPy should have a smaller interface API that other array libraries could implement instead of the entire API becoming a de facto array API. • GPU and parallel-executing UFuncs should be built-in • Apply-by and reduce-by should be NumPy functions. I’ve never received budget to work on NumPy or SciPy (until this year with a CZI grant from Facebook). Part of this is because I pursued other entrepreneurial mechanisms to generate resources, but part of this is because granting mechanisms are not setup to “maintain” community-driven open-source software.
  • 31. 1. Python is the “Lingua Franca” for technical computing and machine learning / AI 2. Python Reached this status because it embraced array- oriented computing (NumPy and Pandas) 3. "Emergent” community-driven Open-source has a sustainability problem. Major Conclusions: We (basically) realized our ultimate goal when we started SciPy in 1999! But, we are still searching for the means to sustain.
  • 32. What is array-oriented computing • Organize data together logically (and in memory) • Operate on “chunks” at a time with high-level operations: (map, join, reduce, transform, apply, filter)
  • 34. Array-oriented (Table) approach Attr1 Attr2 Attr3 Object1 Object2 Object3 Object4 Object5 Object6
  • 35. Benefits of Array-oriented • Many technical problems are naturally array- oriented (easy to vectorize) • Algorithms can be expressed at a high-level • These algorithms can be parallelized more simply (quite often much information is lost in the translation to typical “compiled” languages) • Array-oriented algorithms map to modern hard- ware caches and pipelines. • Software stack now starting to re-focus with ML frameworks emerging. • There is a reason Fortran remains popular.
  • 37. NumPy Examples 2d array 3d array [439 472 477] [217 205 261 222 245 238] 9.98330639789 2.96677717122
  • 38. NumPy Slicing (Selection) >>> a[0,3:5] array([3, 4]) >>> a[4:,4:] array([[44, 45], [54, 55]]) >>> a[:,2] array([2,12,22,32,42,52]) 50 51 52 53 54 55 40 41 42 43 44 45 30 31 32 33 34 35 20 21 22 23 24 25 10 11 12 13 14 15 0 1 2 3 4 5 >>> a[2::2,::2] array([[20, 22, 24], [40, 42, 44]])
  • 39. Quick History life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵} 1966: APL 1984: APL2 1990: J 1993: K -> Q 2019: (new version of K) 1996: Numeric (Python) 2006: NumPy 2012: Numba Arthur Whitney (used by KDB) Arthur Whitney Jim Hugunin Travis Oliphant Siu Kwan Lam Ken Iverson Ken Iverson (IBM) IBM APL J K Matlab Numeric NumPy
  • 40. Putting Science back in Comp Sci • Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web • This has been great for desktop computing but terrible for science: - Complex numbers? - Vectorized primitives? - Multidimensional arrays? • Array-oriented programming was supplanted by Object-oriented programming • Software stack for scientists was not as helpful as it should be • Fortran is still where many scientists ended up • Past 5 years this is changing with emergence of Python, Jupyter, Pandas, PyTorch (we still have a long way to go).
  • 41. Array-Oriented Computing Example1: Fibonacci Numbers fn = fn 1 + fn 2 f0 = 0 f1 = 1 f = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . .
  • 45. Conway’s game of Life • Dead cell with exactly 3 live neighbors will come to life • A live cell with 2 or 3 neighbors will survive • With too few or too many neighbors, the cell dies
  • 47. Conway’s Game of Life APL NumPy Initialization Update Step life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
  • 48. Zen of NumPy • strided is better than scattered • contiguous is better than strided • descriptive is better than imperative • array-oriented is better than object-oriented • broadcasting is a great idea • vectorized is better than an explicit loop • unless it’s too complicated or uses too much memory --- then use Numba • think in higher dimensions Inspired by Tim Peter and “import this”
  • 49. What is good about NumPy? • Array-oriented • Extensive Dtype System (including structures) • C-API • Simple to understand data-structure • Memory mapping • Syntax support from Python • Large community of users • Broadcasting • Easy to interface C/C++/Fortran code
  • 50. What is wrong with NumPy • Dtype system is difficult to extend • Immediate mode creates huge temporaries • “Almost” an in-memory data-base comparable to SQL-lite (missing indexes) • Integration with sparse arrays • Lots of un-optimized parts • Minimal support for multi-core / GPU • Code-base is organic and hard to extend • Tied to CPython run-time (doesn’t work on other Python implementations)
  • 51. Python Origins. Version Date 0.9.0 Feb. 1991 0.9.4 Dec. 1991 0.9.6 Apr. 1992 0.9.8 Jan. 1993 1.0.0 Jan. 1994 1.2 Apr. 1995 1.4 Oct. 1996 1.5.2 Apr. 1999
  • 52. How I got involved… Getting data into memory — fast! http://www.python.org/doc/essays/refcnt/ Reference Counting Essay May 1998 Guido van Rossum TableIO April 1998 Michael A. Miller NumPyIO June 1998
  • 53. How SciPy started… Discussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis environment: Paul Barrett, Joe Harrington, Perry Greenfield, Paul Dubois, Konrad Hinsen, and others. Activity in 1998, led to increased interest in 1999. In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be present and began wrapping / writing in earnest. On 6 April 1999, I announced I would be creating this uber-package which eventually became SciPy Gaussian quadrature 5 Jan 1999 cephes 1.0 30 Jan 1999 sigtools 0.40 23 Feb 1999 Numeric docs March 1999 cephes 1.1 9 Mar 1999 multipack 0.3 13 Apr 1999 Helper routines 14 Apr 1999 multipack 0.6 (leastsq, ode, fsolve, quad) 29 Apr 1999 sparse plan described 30 May 1999 multipack 0.7 14 Jun 1999 SparsePy 0.1 5 Nov 1999 cephes 1.2 (vectorize) 29 Dec 1999
  • 54. Joined with others… Started as Multipack in 1998 and became SciPy in 2001 with the help of other colleagues 115 releases, 815 contributors Used by: 156,525
  • 55. Don’t underestimate the importance of Team! Anaconda success also depended on going from individual to a team >700 contributors
  • 56. Other People Matter Know your model is incomplete: • see people as “ends” not your “means” • Believe in, love, and trust other people. The Social Brain Hypothesis and Human Evolution, Robin I. M. Dunbar Use your brain to adapt to other people — this is why your brain is so big! Hypothesis: You carry and update “models of people" in your head. From very detailed to approximate. Dunbar numbers!
  • 57. Keep Open Mind: Be open to critique dtype ctypes PEP 3118 debate over how to describe memory vs. Current me disagrees with past me! I am glad there were others in the debate.
  • 58. Return good for evil Hard because of our brains!
  • 60. We have a “divided” community again! Numeric Numarray NumPy
  • 61. Examples of packages being built on differing standards FastAI skorch Pyro Eduard anyrl Braid PyMC4 Horovod MLFlow But note
  • 62. Unification Efforts Train the Model Deploy the Model Platform1 Platform 2 Deploy the Model Platform 3
  • 63. NNVM / TVM — Ambitious Plan at UW
  • 64. What is next? What am I working on for the next 20 years…
  • 65. Technology and Economic problems 1. General interoperability — low-level libraries that reduce silos of data and analysis 2. Better High-level APIs (more interfaces in Python supported by multiple implementations) 3. Data Management — in particular Data Catalogues 4. Fixing Python’s Extension problem (the ecosystem helped Python grow but is also and anchor to it’s progress) 5. How to connect the trillions of dollars of market capital to the innovation available in global, emergent, open-source communities.
  • 66. High Level APIs for Arrays (Tensors), DataFrames, and DataTypes LABS
  • 67. The extensions are an anchor to Python runtime progress! CPython C-API
  • 68. What will work! • Create a statically typed subset of Python that is then used to extend Python — EPython • Port NumPy, SciPy, Scikits to EPython (borrow heavily from Cython ideas but use mypy-style typing instead of new syntax).
  • 69. LABS Sustaining the Future Open-source innovation and maintenance around the entire data- science and AI workflow. • NumPy ecosystem maintenance (PyData Core Team) • Improve connection of NumPy to ML Frameworks • GPU Support for NumPy Ecosystem • Improve foundations of Array computing • JupyterLab and JupyterHub • Data Catalog standards • Packaging (conda-forge, PyPA, etc.) PySparse - sparse n-d arrays Ibis - Pandas-like front-end to SQL uarray — unified array interface for SciPy refactor xnd — re-factored NumPy (low-level cross-language libraries for N-D (tensor) computing) Collaborating with NumFOCUS! Bokeh Adapted from Jake Vanderplas PyCon 2017 Keynote
  • 70. Build and Connect Companies and Communities to Solve Challenging Problems with Data Enables me to keep working on array- computing problems *and* meta- problem of open-source funding.
  • 71. Complete open-source service consulting in the PyData / NumFOCUS ecosystem including data-science and ML We provide part-time CTO work, custom software, staff augmentation, support, training, staffing, and mentoring Open Source Research Lab supporting the NumFOCUS and PyData Community. Hiring developers, evangelists, tech writers, designers, and product managers for open-source projects. Early stage funding to companies that provide return to investors and support open source ecosystems with industry disrupting products and services Services Open Source Lab Venture Fund Three Activities with One Mission
  • 72. Some of the projects we support Sparse Fast Foundational ND-Array (Tensor) object for Python Extensive Library of Functions for NumPy GPU-enabled Compiler for NumPy/Python Parallel and Scaled Pandas and NumPy DataFrames for general data-manipulation and statistics and Notebook environments for rapid development and data analysis Desktop IDE for data-science and ML Rapid development of Dashboards for Python/PyData ecosystem. Easy and fast web-based interactive plots using Python. Turn even very large datasets into images, accurately. General Sparse Arrays for Python Cross-language libraries for array computing General and powerful symbolic mathematical library Very popular and powerful machine learning library
  • 73. An early stage venture capital firm investing in startups that build on open-source technology and support the communities they depend on (11 companies) supporting FairOSS $20m fund
  • 74. Problem Open Source Teams ! Burned out ! Underrepresented ! Underpaid Organizations ! Disconnected from the Community ! Lack support and maintenance There’s no easy way to connect the community with organizations
  • 75. Open Source Marketplace Managing Partners ! Provide Open Source Services ! Training / Support ! Feature development / fixes Funding Partners ! Hire from the community ! Collectively fund ! Get support they need to build effectively on open- source. Open-source Contributors create profiles for themselves and their projects and participate as actors in the market.
  • 76.
  • 77. FairOSS A Public Benefit Company (goal is growing amount of freely available software) • Owned by open-source contributors (will be doing a public fund-raise later this year) • Those share-holders govern the organization (elect the board). • Board appoints management and decides what is “fair” Holds Companies accountable • Allows usage of its trademarks only for companies that contribute back “fairly” • Think “Kosher” or “Organic labeled” • Companies give back by equity, revenue, and “in-kind” agreements with FairOSS FairOSS is custodian of Revenue and Equity Agreements • Equity agreements mean that FairOSS holds shares, options, or warrants of the company (most companies are missing open-source community from their ‘cap-table’) • Revenue agreements mean that companies pay FairOSS a portion of their revenue. • FairOSS distributes almost all of the proceeds from these agreements to the open- source communities. If successful — this would make OpenSource investable and make available >$23,000,000,000,000 (trillion) of investment capital to open-source communities.
  • 78. You can really change the world… With Open Source Communities… Let’s do more of that!