"Number Crunching in Python": slides presented at EuroPython 2012, Florence, Italy
Slides have been authored by me and by Dr. Enrico Franchi.
Scientific and Engineering Computing, Numpy NDArray implementation and some working case studies are reported.
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Number Crunching in Python
1. LOREM
I P S U M
NUMBER
CRUNCHING
IN PYTHONEnrico Franchi (efranchi@ce.unipr.it) &
Valerio Maggio (valerio.maggio@unina.it)
2. DOLOR
S I T OUTLINE
• Scientific and Engineering Computing
• Common FP pitfalls
• Numpy NDArray (Memory and Indexing)
• Case Studies
3. DOLOR
S I T OUTLINE
• Scientific and Engineering Computing
• Common FP pitfalls
• Numpy NDArray (Memory and Indexing)
• Case Studies
4. DOLOR
S I T OUTLINE
• Scientific and Engineering Computing
• Common FP pitfalls
• Numpy NDArray (Memory and Indexing)
• Case Studies
5. number-crunching: n. [common] Computations of a
numerical nature, esp. those that make extensive
use of floating-point numbers. This term is in
widespread informal use outside hackerdom and even
in mainstream slang, but has additional hackish
connotations: namely, that the computations are
mindless and involve massive use of brute force.
This is not always evil, esp. if it involves ray tracing
or fractals or some other use that makes pretty
pictures, esp. if such pictures can be used as screen
backgrounds. See also crunch.
6. number-crunching: n. [common] Computations of a
numerical nature, esp. those that make extensive
use of floating-point numbers. This term is in
widespread informal use outside hackerdom and even
in mainstream slang, but has additional hackish
connotations: namely, that the computations are
mindless and involve massive use of brute force.
This is not always evil, esp. if it involves ray tracing
or fractals or some other use that makes pretty
pictures, esp. if such pictures can be used as screen
backgrounds. See also crunch.
We are not evil.
7. number-crunching: n. [common] Computations of a
numerical nature, esp. those that make extensive
use of floating-point numbers. This term is in
widespread informal use outside hackerdom and even
in mainstream slang, but has additional hackish
connotations: namely, that the computations are
mindless and involve massive use of brute force.
This is not always evil, esp. if it involves ray tracing
or fractals or some other use that makes pretty
pictures, esp. if such pictures can be used as screen
backgrounds. See also crunch.
We are not evil.
Just chaotic neutral.
8. AMET
M E N T I
T U M ALTERNATIVES
• Matlab (IDE, numeric computations oriented, high quality algorithms,
lots of packages, poor GP programming support, commercial)
• Octave (Matlab clone)
• R (stats oriented, poor general purpose programming support)
• Fortran/C++ (very low level, very fast, more complex to use)
• In general, these tools either are low level GP or high level DSLs
9. HIS EX,
T E M P O
R PYTHON
• Numpy (low-level numerical computations) +
Scipy (lots of additional packages)
• IPython (wonderfull command line interpreter) +
IPython Notebook (“Mathematica-like” interactive documents)
• HDF5 (PyTables, H5Py), Databases
• Specific libraries for machine learning, etc.
• General Purpose Object Oriented Programming
13. DENIQU
E
G U B E R
G R E N
Our Code
Numpy
Atlas/MKL
Improvements
Improvements
Algorithms are
fast because of
highly optimized
C/Fortran code
4 30 LOAD_GLOBAL 1 (dot)
33 LOAD_FAST 0 (a)
36 LOAD_FAST 1 (b)
39 CALL_FUNCTION 2
42 STORE_FAST 2 (c)
NUMPY STACK
c = a · b
14. ndarray
ndarray
Memory
behavior
shape, stride, flags
(i0, . . . , in 1) ! I
Shape: (d0, …, dn-1)
4x3
An n-dimensional array references some
(usually contiguous memory area)
An n-dimensional array has
property such as its shape or the
data-type of the elements containes
Is an object, so there is some
behavior, e.g., the def. of
__add__ and similar stuff
N-dimensional arrays are homogeneous
15. (i0, . . . , in 1) ! I
C-contiguousF-contiguous
Shape: (d0, …, dn)
IC =
n 1X
k=0
ik
n 1Y
j=k+1
dj
IF =
n 1X
k=0
ik
k 1Y
j=0
dj
Shape: (d0, …, dk ,…, dn-1)
Shape: (d0, …, dk ,…, dn-1)
IC = i0 · d0 + i14x3
IF = i0 + i1 · d1
ElementLayout
inMemory
16. Stride
C-contiguous F-contiguous
sF (k) =
k 1Y
j=0
dj
IF =
nX
k=0
ik · sF (k)
sC(k) =
n 1Y
j=k+1
dj
IC =
n 1X
k=0
ik · sC(k)
Stride
C-contiguousF-contiguous
C-contiguous
(s0 = d0, s1 = 1) (s0 = 1, s1 = d1)
IC =
n 1X
k=0
ik
n 1Y
j=k+1
dj IF =
n 1X
k=0
ik
k 1Y
j=0
dj
37. General Disclaimer:
All the Maths appearing in the next slides is only intended to better introduce the considered case studies. Speakers are not
responsible for any possible disease or “brain consumption” caused by too much formulas.
So BEWARE; use this information at your own risk!
It's intention is solely educational. We would strongly encourage you to use this information in cooperation with a medical or
health professional.
AwfulMaths
38. BEFORE STARTING
What do you need to get started:
• A handful Unix Command-line tool:
• Linux / Mac OSX Users: Your’re done.
• Windows Users: It should be the time to change your OS :-)
• [I]Python (You say?!)
• A DBMS:
• Relational: e.g., SQLite3, PostgreSQL
• No-SQL: e.g., MongoDB
MINIM
S C R I P
T O R E M
40. LOREM
I P S U M
• Vectorization (NumPy vs. “pure” Python
• Loops and Math functions (i.e., sin(x))
• Matrix-Vector Product
• Different implementations of Matrix-Vector Product
CASE STUDIES ON
NUMERICAL EFFICIENCY
62. MACHINE LEARNING
• Machine Learing = Learning by Machine(s)
• Algorithms and Techniques to gain insights from data or a dataset
• Supervised or Unsupervised Learning
• Machine Learning is actively being used today, perhaps in many more places than
you’d expected
• Mail Spam Filtering
• Search Engine Results Ranking
• Preference Selection
• e.g., Amazon “Customers Who Bought This Item Also Bought”
NAM IN,
S E A
N O
63. LOREM
I P S U M
CLUSTERING:
BRIEF INTRODUCTION
• Clustering is a type of unsupervised learning that automatically forms
clusters (groups) of similar things.
It’s like automatic classification.
You can cluster almost anything, and the more similar the items are
in the cluster, the better your clusters are.
• k-means is an algorithm that will find k clusters for a given dataset.
• The number of clusters k is user defined.
• Each cluster is described by a single point known as the centroid.
• Centroid means it’s at the center of all the points in the cluster.
71. LOREM
I P S U M
EXAMPLE:
CLUSTERING POINTS
ON A MAP
Here’s the situation:
your friend <NAME> wants you to take him out in the greater Portland, Oregon,
area (US) for his birthday.
A number of other friends are going to come also, so you need to provide a plan
that everyone can follow.
Your friend has given you a list of places he wants to go.
This list is long; it has 70 establishments in it.
73. s s f f
Latitude and Longitude Coordinates of two
points (s and f)
Corresponding differences
ˆ = arccos(sin s sin f + cos s cos f cos )
Spherical Distance Measure
SphericalDistanceMeasure
75. • Problem: Given an input matrix A,
calculate if possible, its inverse matrix.
• Definition:
In linear algebra, a n-by-n (square) matrix A is
invertible (a.k.a. is nonsingular or
nondegenerate) if there exists a n-by-n matrix B
(A-1) such that: AB = BA = In
TRIVIAL EXAMPLE:INVERSE MATRIX
76. ✓ Eigen Decomposition:
• If A is nonsingular, i.e., it can be eigendecomposed and none of its
eigenvalue is equal to zero
✓ Cholesky Decomposition:
• If A is positive definite, where is the Conjugate transpose matrix
of L (i.e., L is a lower triangular matrix)
✓ LU Factorization: (with L and U Lower (Upper) Triangular
Matrix)
✓ Analytic Solution: (writing the Matrix of Cofactors), a.k.a. Cramer Method
A 1
= Q⇤Q 1
A 1
= (L⇤
) 1
L 1
A 1
= 1
det(A) (CT
)i,j = 1
det(A) (Cji) = 1
det(A)
0
B
B
B
@
C1,1 C1,2 · · · C1,n
C2,1 C2,2 · · · C2,n
...
...
...
...
Cm,1 Cm,2 · · · Cm,n
1
C
C
C
A
L⇤
A = LU
Solution(s)
85. Type: function
String Form:<function inv at 0x105f72b90>
File: /Library/Python/2.7/site-packages/numpy/linalg/linalg.py
Definition: linalg.inv(a)
Source:
def inv(a):
"""
Compute the (multiplicative) inverse of a matrix. [...]
Parameters
----------
a : array_like, shape (M, M)
Matrix to be inverted.
Returns
-------
ainv : ndarray or matrix, shape (M, M)
(Multiplicative) inverse of the matrix `a`.
Raises
------
LinAlgError
If `a` is singular or not square.
[...]
"""
a, wrap = _makearray(a)
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
Underthehood
94. Create k points for starting centroids
(often randomly)
While any point has changed cluster assignment
for every point in dataset:
for every centroid:
d = distance(centroid,point)
assign(point, nearest(cluster))
for each cluster:
mean = average(cluster)
centroid[cluster] = mean
K-means