SlideShare ist ein Scribd-Unternehmen logo
1 von 64
enter.the.matrix
core.matrix
Array programming
as a language extension
for Clojure
(with a Numerical computing focus)
Plug-in paradigms
Paradigm

Exemplar language

Functional programming

Clojure implementation

Haskell

clojure.core

Meta-programming

Lisp

Logic programming

Prolog

core.logic

Process algebras / CSP

Go

core.async

Array programming

APL

core.matrix
APL
Venerable
history

•
•

Notation invented in 1957 by Ken Iverson
Implemented at IBM around 1960-64

Has its own
keyboard

Interesting
perspective on
code readability

life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1
0 1∘.⌽⊂⍵}
Modern array programming
Standalone environment for
statistical programming / graphics

Python library for array programming

A new language (2012) based on
array programming principles
.... and many others
Why Clojure for array programming?
1. Data Science
2. Platform
3. Philosophy
Elements of core.matrix
Abstraction
N-dimensional arrays
– what and why?

API
What can you do with
arrays?

Implementation
How is everything
implemented?
Abstraction

or: “What is the matrix?”
Design wisdom
abstraction

"It is better to have 100 functions
operate on one data structure than 10
functions on 10 data structures."
—Alan Perlis
What is an array?
Dimensions

Example

Terminology

3

1

2

1

2

3

4

5

6

2

0
0

1

7

8

0
0
0
3
3
3
6
6
6

1
1
1
4
4
4
7
7
7

2
2
2
5
5
5
8
8
8

Vector

Matrix

3D Array
(3rd order Tensor)

...
N

ND Array
...
Multi-dimensional array properties
Dimensions (ordered
and indexed)

Dimension 1

0

2

0
Dimension 0

1

0

1

2

1

3

4

5

2

6

7

Dimension sizes
together define the
shape of the array
(e.g. 3 x 3)

8

Each of the array
elements is a
regular value
Arrays = data about relationships
Set Y

:R :S :T :U

:A

1

2

3

:B

4

5

6

7

:C

Set X

0

8

9 10 11

Each element is a fact
about a relationship
between a value in Set
X and a value in Set Y

(foo :A :T) => 2

ND array lookup is analogous to arity-N functions!
Why arrays instead of functions?
0

1

2

0

0

1

2

1

3

4

5

2

6

7

8

vs.

(fn [i j]
(+ j (* 3 i)))

1.

Precomputed values with O(1) access

2.

Efficient computation with optimised bulk
operations

3.

Data driven representation
Expressivity
Java

for (int i=0; i<n; i++) {
for (int j=0; j<m; j++) {
for (int k=0; k<p; k++) {
result[i][j][k] = a[i][j][k] + b[i][j][k];
}
}
}

(mapv
(fn [a b]
(mapv
(fn [a b]
(mapv + a b))
a b))
a b)

(+ a b)

+ core.matrix
Principle of array programming:
generalise operations on regular (scalar) values
to multi-dimensional data

(+ 1 2) => 3
(+

) => 2
API
Equivalence to Clojure vectors
0

1

2

0

1
4

5

6

7

8

[0 1 2]

↔

[[0 1 2]
[3 4 5]
[6 7 8]]

2

3

↔

Nested Clojure vectors of regular shape are arrays!
Array creation
;; Build an array from a sequence
(array (range 5))
=> [0 1 2 3 4]
;; ... or from nested arrays/sequences
(array
(for [i (range 3)]
(for [j (range 3)]
(str i j))))
=> [["00" "01" "02"]
["10" "11" "12"]
["20" "21" "22"]]
Shape
;; Shape of a 3 x 2 matrix
(shape [[1 2]
[3 4]
[5 6]])
=> [3 2]

;; Regular values have no shape
(shape 10.0)
=> nil
Dimensionality
;; Dimensionality =
;;
=
;;
=
(dimensionality [[1
[3
[5
=> 2

number of dimensions
length of shape vector
nesting level
2]
4]
6]])

(dimensionality [1 2 3 4 5])
=> 1

;; Regular values have zero dimensionality
(dimensionality “Foo”)
=> 0
Scalars vs. arrays
(array? [[1 2] [3 4]])
=> true
(array? 12.3)
=> false
(scalar? [1 2 3])
=> false
(scalar? “foo”)
=> true
Everything is either an array or a scalar
A scalar works as like a 0-dimensional array
Indexed element access
Dimension 1

0

2

0

0

1

2

1

3

4

5

2

Dimension 0

1

6

7

8

(def M [[0 1 2]
[3 4 5]
[6 7 8]])
(mget M 1 2)
=> 5
Slicing access
Dimension 1

0

2

0

0

1

2

1

3

4

5

2

Dimension 0

1

6

7

8

(def M [[0 1 2]
[3 4 5]
[6 7 8]])
(slice M 1)
=> [3 4 5]
A slice of an array is itself an array!
Arrays as a composition of slices
(def M [[0 1 2]
[3 4 5]
[6 7 8]])

0

1

2

3

4

5

6

7

8

slices

(slices M)
=> ([0 1 2] [3 4 5] [6 7 8])

1

2

3

(apply + (slices M))
=> [9 12 15]

0

4

5

6

7

8
Operators
(use 'clojure.core.matrix.operators)

(+ [1 2 3] [4 5 6])
=> [5 7 9]
(* [1
=> [0

2 3] [0
4 -3]

2 -1])

(- [1 2] [3 4 5 6])
=> RuntimeException Incompatible shapes
(/ [1 2 3] 10.0)
=> [0.1 0.2 0.3]
Broadcasting scalars

(+

[[0 1 2]
[3 4 5]
[6 7 8]]

(+

[[0 1 2]
[[1 1 1]
[3 4 5]
[1 1 1]
[6 7 8]]
[1 1 1]]

1 1 )= ?
1

“Broadcasting”

[[1 2 3]
[4 5 6]
[7 8 9]]

)=.
Broadcasting arrays

(+

[[0 1 2]
[3 4 5]
[6 7 8]]

(+

[[0 1 2]
[[2 1 0]
[3 4 5]
[2 1 0]
[6 7 8]]
[2 1 0]]

1

[2 1 0]

1

“Broadcasting”

)= ?
[[2 2 2]
[5 5 5]
[8 8 8]]

)=.
Functional operations on sequences
map

reduce

(map inc [1 2 3 4])
=> (2 3 4 5)

(reduce * [1 2 3 4])
=> 24

(seq

seq

[1 2 3 4])
=> (1 2 3 4)
Functional operations on arrays
map ↔ emap
“element map”

(emap inc [[1 2]
[3 4]])
=> [[2 3]
[4 5]]

(ereduce * [[1 2]
reduce ↔ ereduce
[3 4]])
=> 24
“element reduce”

seq ↔ eseq
“element seq”

(eseq [[1 2]
[3 4]])
=> (1 2 3 4)
Specialised matrix constructors
0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

(permutation-matrix [3 1 0 2])

0

0

(identity-matrix 4)

0
0

(zero-matrix 4 3)

0

0

1

0

0

0

1

0

1

0

0

1

0

0

0

0

0

1

0
Array transformations

(transpose
0

2

3

4

5

)

4

2

1

3

1

0

5

Transposes reverses the order of all dimensions and indexes
Matrix multiplication

(mmul [[9 2 7] [6 4 8]]
[[2 8] [3 4] [5 9]])
=> [[59 143] [64 136]]
Geometry
(def π 3.141592653589793)

(def τ (* 2.0 π))
(defn rot [turns]
(let [a (* τ turns)]
[[ (cos a) (sin a)]
[(-(sin a)) (cos a)]]))

(mmul (rot 1/8) [3 4])
=> [4.9497 0.7071]
NB: See Tau Manifesto (http://tauday.com/) regarding the use of Tau (τ)

45 =
1/8 turn
Demo
Mutability?
Mutability – the tradeoffs
Pros

Cons

 Faster

✘ Mutability is evil

 Reduces GC pressure

✘ Harder to maintain / debug

 Standard in many existing
matrix libraries

✘ Hard to write concurrent code
✘ Not idiomatic in Clojure
✘ Not supported by all
core.matrix implementations
✘ “Place Oriented Programming”

Avoid mutability. But it’s an option if you really need it.
Mutability – performance benefit
Time for addition of vectors* (ns)

Immutable add

120

Mutable add!

4x
performance benefit

28

0

50

100

150

* Length 10 double vectors, using :vectorz implementation
Mutability – syntax
(add [1 2] 1)
[2 3]
(add! [1 2] 1)
=> RuntimeException ...... not mutable!
(def a (mutable [1 2]))
=> #<Vector2 [1.0,2.0]>

;; coerce to a mutable format

(add! a 1)
=> #<Vector2 [2.0,3.0]>

A core.matrix function name ending with “!” performs mutation
(usually on the first argument only)
Implementation
Many Matrix libraries…

MTJ

UJMP
javax.vecmath

ojAlgo
Lots of trade-offs
Native Libraries

vs.

Pure JVM

Mutability

vs.

Immutability

Specialized elements (e.g. doubles)

vs.

Generalised elements (Object, Complex)

Multi-dimensional

vs.

2D matrices only

Memory efficiency

vs.

Runtime efficiency

Concrete types

vs.

Abstraction (interfaces / wrappers)

Specified storage format

vs.

Multiple / arbitrary storage formats

License A

vs.

License B

Lightweight (zero-copy) views

vs.

Heavyweight copying / cloning
What’s the best data structure?
Length 50 “range” vector:

0

1

2

3 .. 49

1. Clojure Vector

2. Java double[] array

[0 1 2 …. 49]

new double[]
{0, 1, 2, …. 49};

3. Custom deftype

4. Native vector format

(deftype RangeVector
[^long start
^long end])

(org.jblas.DoubleMatrix.
params)
There is no spoon
Secret weapon time!
Clojure Protocols
clojure.core.matrix.protocols

(defprotocol PSummable
"Protocol to support the summing of all elements in
an array. The array must hold numeric values only,
or an exception will be thrown."
(element-sum [m]))

1. Abstract Interface
2. Open Extension
3. Fast dispatch
Protocols are fast and open
Function call costs (ns)

Open extension

Static / inlined code

1.2

Primitive function call

1.9

Boxed function call

7.9

Protocol call

13.8

Multimethod*

89
0

20

40

60

80

* Using class of first argument as dispatch function

100

✘
✘
✘
✓
✓
Typical core.matrix call path
User
Code
core.matrix
API
(matrix.clj)

Impl.
code

(esum [1 2 3 4])

(defn esum
"Calculates the sum of all the elements in a
numerical array."
[m]
(mp/element-sum m))

(extend-protocol mp/PSummable
SomeImplementationClass
(element-sum [a]
………))
Most protocols are optional
PImplementation
PDimensionInfo
PIndexedAccess
PIndexedSetting
PMatrixEquality
PSummable
PRowOperations
PVectorCross
PCoercion
PTranspose
PVectorDistance
PMatrixMultiply
PAddProductMutable
PReshaping
PMathsFunctionsMutable
PMatrixRank
PArrayMetrics
PAddProduct
PVectorOps
PMatrixScaling
PMatrixOps
PMatrixPredicates
PSparseArray
…..

MANDATORY
•

Required for a working core.matrix implementation

OPTIONAL
•
•
•

Everything in the API will work without these
core.matrix provides a “default implementation”
Implement for improved performance
Default implementations
Protocol name - from namespace
clojure.core.matrix.protocols
clojure.core.matrix.impl.default

(extend-protocol mp/PSummable
Number
(element-sum [a] a)

Implementation for any Number

Object
(element-sum [a]
(mp/element-reduce a +)))

Implementation for an arbitrary Object
(assumed to be an array)
Extending a protocol

(extend-protocol mp/PSummable
(Class/forName "[D")
Class to implement protocol for, in this
(element-sum [m]
case a Java array : double[]
Add type hint to avoid reflection
(let [^doubles m m]
(areduce m i res 0.0 (+ res (aget m i))))))

Optimised code to add up all the
elements of a double[] array
Speedup vs. default implementation
Timing for element sum of length 100 double array (ns)
(esum v)
"Default"

3690

(reduce + v)

2859

(esum v)
"Specialised"

15-20x
benefit

201

0

1000

2000

3000

4000
Internal Implementations
Implementation

Key Features

:persistent-vector

• Support for Clojure vectors
• Immutable
• Not so fast, but great for quick testing

:double-array

• Treats Java double[] objects as 1D arrays
• Mutable – useful for accumulating results etc.

:sequence

• Treats Clojure sequences as arrays
• Mostly useful for interop / data loading

:ndarray
:ndarray-double
:ndarray-long
.....

•
•
•
•

:scalar-wrapper
:slice-wrapper
:nd-wrapper

• Internal wrapper formats
• Used to provide efficient default implementations for
various protocols

Google Summer of Code project by Dmitry Groshev
Pure Clojure
N-Dimensional arrays similar to NumPy
Support arbitrary dimensions and data types
NDArray
(deftype NDArrayDouble
[^doubles data
^int
ndims
^ints
shape
^ints
strides
^int
offset])

offset
strides[0]

0

1

3

4

5

strides[1]

2
?

?

?

0

0

1

2

?

?

3

4

5

data
(Java array)
ndims = 2

shape = [2 3]

?
External Implementations
Implementation

Key Features

vectorz-clj

• Pure JVM (wraps Java Library Vectorz)
• Very fast, especially for vectors and small-medium matrices
• Most mature core.matrix implementation at present

Clatrix

• Use Native BLAS libraries by wrapping the Jblas library
• Very fast, especially for large 2D matrices
• Used by Incanter

parallel-colt-matrix

• Wraps Parallel Colt library from Java
• Support for multithreaded matrix computations

arrayspace

• Experimental
• Ideas around distributed matrix computation
• Builds on ideas from Blaze, Chapele, ZPL

image-matrix

• Treats a Java BufferedImage as a core.matrix array
• Because you can?
Switching implementations
(array (range 5))
=> [0 1 2 3 4]
;; switch implementations
(set-current-implementation :vectorz)

;; create array with current implementation
(array (range 5))
=> #<Vector [0.0,1.0,2.0,3.0,4.0]>
;; explicit implementation usage
(array :persistent-vector (range 5))
=> [0 1 2 3 4]
Mixing implementations
(def A (array :persistent-vector (range 5)))
=> [0 1 2 3 4]
(def B (array :vectorz (range 5)))
=> #<Vector [0.0,1.0,2.0,3.0,4.0]>
(* A B)
=> [0.0 1.0 4.0 9.0 16.0]
(* B A)
=> #<Vector [0.0,1.0,4.0,9.0,16.0]>
core.matrix implementations can be mixed
(but: behaviour depends on the first argument)
Future roadmap
 Version 1.0 release
 Data types: Complex numbers
 Expression compilation
 Domain specific extensions, e.g.:
symbolic computation (expresso)
stats
Geometry
linear algebra

 Incanter integration
END
Incanter Integration

 A great environment for statistical computing, data
science and visualisation in Clojure
 Uses the Clatrix matrix library – great performance
 Work in progress to support core.matrix fully for
Incanter 2.0
Benchmarks: Clojure vs. Python
Domain specific extensions
Extension library

Focus

core.matrix.stats

Statistical functions

core.matrix.geom

2D and 3D Geometry

expresso

Manipulation of array expressions
Broadcasting Rules
1. Designed for elementwise operations
- other uses must be explicit
2. Extends shape vector by adding new leading
dimensions
• original shape [4 5]
• can broadcast to any shape [x y ... z 4 5]
• scalars can broadcast to any shape
3. Fills the new array space by duplication of the original
array over the new dimensions
4. Smart implementations can avoid making full copies
by structural sharing or clever indexing tricks
Vectorz
ectorz
ectorz

Weitere ähnliche Inhalte

Andere mochten auch

Hacking Emotional Intelligence: A psychologist's guide
Hacking Emotional Intelligence: A psychologist's guide Hacking Emotional Intelligence: A psychologist's guide
Hacking Emotional Intelligence: A psychologist's guide SelfHackathon
 
Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Enthought, Inc.
 
A Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonA Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonTariq Rashid
 
Images and Vision in Python
Images and Vision in PythonImages and Vision in Python
Images and Vision in Pythonstreety
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningDr. Mirko Kämpf
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRyan Bosshart
 
Two dimensional array
Two dimensional arrayTwo dimensional array
Two dimensional arrayRajendran
 
파이썬 Numpy 선형대수 이해하기
파이썬 Numpy 선형대수 이해하기파이썬 Numpy 선형대수 이해하기
파이썬 Numpy 선형대수 이해하기Yong Joon Moon
 
Applications of matrices in real life
Applications of matrices in real lifeApplications of matrices in real life
Applications of matrices in real lifeSuhaibFaiz
 
Matrix Representation Of Graph
Matrix Representation Of GraphMatrix Representation Of Graph
Matrix Representation Of GraphAbhishek Pachisia
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用Mark Chang
 
Applications of Matrices
Applications of MatricesApplications of Matrices
Applications of Matricessanthosh kumar
 
Presentation on application of matrix
Presentation on application of matrixPresentation on application of matrix
Presentation on application of matrixPrerana Bhattarai
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習Mark Chang
 
The Future of Quantified Self in Healthcare
The Future of Quantified Self in HealthcareThe Future of Quantified Self in Healthcare
The Future of Quantified Self in HealthcareQuantified Self Dublin
 
2 d geometric transformations
2 d geometric transformations2 d geometric transformations
2 d geometric transformationsMohd Arif
 

Andere mochten auch (20)

Hacking Emotional Intelligence: A psychologist's guide
Hacking Emotional Intelligence: A psychologist's guide Hacking Emotional Intelligence: A psychologist's guide
Hacking Emotional Intelligence: A psychologist's guide
 
Arrays
ArraysArrays
Arrays
 
Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009Scientific Computing with Python Webinar --- May 22, 2009
Scientific Computing with Python Webinar --- May 22, 2009
 
2nd section
2nd section2nd section
2nd section
 
A Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with PythonA Gentle Introduction to Coding ... with Python
A Gentle Introduction to Coding ... with Python
 
Images and Vision in Python
Images and Vision in PythonImages and Vision in Python
Images and Vision in Python
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Two dimensional array
Two dimensional arrayTwo dimensional array
Two dimensional array
 
파이썬 Numpy 선형대수 이해하기
파이썬 Numpy 선형대수 이해하기파이썬 Numpy 선형대수 이해하기
파이썬 Numpy 선형대수 이해하기
 
Application of Matrices
Application of MatricesApplication of Matrices
Application of Matrices
 
Applications of matrices in real life
Applications of matrices in real lifeApplications of matrices in real life
Applications of matrices in real life
 
Matrix Representation Of Graph
Matrix Representation Of GraphMatrix Representation Of Graph
Matrix Representation Of Graph
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
Applications of Matrices
Applications of MatricesApplications of Matrices
Applications of Matrices
 
Application of matrices in real life
Application of matrices in real lifeApplication of matrices in real life
Application of matrices in real life
 
Presentation on application of matrix
Presentation on application of matrixPresentation on application of matrix
Presentation on application of matrix
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
The Future of Quantified Self in Healthcare
The Future of Quantified Self in HealthcareThe Future of Quantified Self in Healthcare
The Future of Quantified Self in Healthcare
 
2 d geometric transformations
2 d geometric transformations2 d geometric transformations
2 d geometric transformations
 

Ähnlich wie Enter The Matrix

Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with ClojureJohn Stevenson
 
Clojure Intro
Clojure IntroClojure Intro
Clojure Introthnetos
 
Thinking Functionally In Ruby
Thinking Functionally In RubyThinking Functionally In Ruby
Thinking Functionally In RubyRoss Lawley
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John StevensonJAX London
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 
Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshopVinay Kumar
 
Getting started cpp full
Getting started cpp   fullGetting started cpp   full
Getting started cpp fullVõ Hòa
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2Kevin Chun-Hsien Hsu
 
INTRODUCTION TO MATLAB session with notes
  INTRODUCTION TO MATLAB   session with  notes  INTRODUCTION TO MATLAB   session with  notes
INTRODUCTION TO MATLAB session with notesInfinity Tech Solutions
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 

Ähnlich wie Enter The Matrix (20)

Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with Clojure
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Clojure Intro
Clojure IntroClojure Intro
Clojure Intro
 
Tutorial matlab
Tutorial matlabTutorial matlab
Tutorial matlab
 
Tutorialmatlab kurniawan.s
Tutorialmatlab kurniawan.sTutorialmatlab kurniawan.s
Tutorialmatlab kurniawan.s
 
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
 
Clojure intro
Clojure introClojure intro
Clojure intro
 
Thinking Functionally In Ruby
Thinking Functionally In RubyThinking Functionally In Ruby
Thinking Functionally In Ruby
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John Stevenson
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
Matlab-1.pptx
Matlab-1.pptxMatlab-1.pptx
Matlab-1.pptx
 
Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
 
Getting started cpp full
Getting started cpp   fullGetting started cpp   full
Getting started cpp full
 
Arrays
ArraysArrays
Arrays
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
 
Plc (1)
Plc (1)Plc (1)
Plc (1)
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
INTRODUCTION TO MATLAB session with notes
  INTRODUCTION TO MATLAB   session with  notes  INTRODUCTION TO MATLAB   session with  notes
INTRODUCTION TO MATLAB session with notes
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 

Kürzlich hochgeladen

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Enter The Matrix

  • 2. core.matrix Array programming as a language extension for Clojure (with a Numerical computing focus)
  • 3. Plug-in paradigms Paradigm Exemplar language Functional programming Clojure implementation Haskell clojure.core Meta-programming Lisp Logic programming Prolog core.logic Process algebras / CSP Go core.async Array programming APL core.matrix
  • 4. APL Venerable history • • Notation invented in 1957 by Ken Iverson Implemented at IBM around 1960-64 Has its own keyboard Interesting perspective on code readability life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
  • 5. Modern array programming Standalone environment for statistical programming / graphics Python library for array programming A new language (2012) based on array programming principles .... and many others
  • 6. Why Clojure for array programming? 1. Data Science 2. Platform 3. Philosophy
  • 7. Elements of core.matrix Abstraction N-dimensional arrays – what and why? API What can you do with arrays? Implementation How is everything implemented?
  • 8. Abstraction or: “What is the matrix?”
  • 9. Design wisdom abstraction "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." —Alan Perlis
  • 10. What is an array? Dimensions Example Terminology 3 1 2 1 2 3 4 5 6 2 0 0 1 7 8 0 0 0 3 3 3 6 6 6 1 1 1 4 4 4 7 7 7 2 2 2 5 5 5 8 8 8 Vector Matrix 3D Array (3rd order Tensor) ... N ND Array ...
  • 11. Multi-dimensional array properties Dimensions (ordered and indexed) Dimension 1 0 2 0 Dimension 0 1 0 1 2 1 3 4 5 2 6 7 Dimension sizes together define the shape of the array (e.g. 3 x 3) 8 Each of the array elements is a regular value
  • 12. Arrays = data about relationships Set Y :R :S :T :U :A 1 2 3 :B 4 5 6 7 :C Set X 0 8 9 10 11 Each element is a fact about a relationship between a value in Set X and a value in Set Y (foo :A :T) => 2 ND array lookup is analogous to arity-N functions!
  • 13. Why arrays instead of functions? 0 1 2 0 0 1 2 1 3 4 5 2 6 7 8 vs. (fn [i j] (+ j (* 3 i))) 1. Precomputed values with O(1) access 2. Efficient computation with optimised bulk operations 3. Data driven representation
  • 14. Expressivity Java for (int i=0; i<n; i++) { for (int j=0; j<m; j++) { for (int k=0; k<p; k++) { result[i][j][k] = a[i][j][k] + b[i][j][k]; } } } (mapv (fn [a b] (mapv (fn [a b] (mapv + a b)) a b)) a b) (+ a b) + core.matrix
  • 15. Principle of array programming: generalise operations on regular (scalar) values to multi-dimensional data (+ 1 2) => 3 (+ ) => 2
  • 16. API
  • 17. Equivalence to Clojure vectors 0 1 2 0 1 4 5 6 7 8 [0 1 2] ↔ [[0 1 2] [3 4 5] [6 7 8]] 2 3 ↔ Nested Clojure vectors of regular shape are arrays!
  • 18. Array creation ;; Build an array from a sequence (array (range 5)) => [0 1 2 3 4] ;; ... or from nested arrays/sequences (array (for [i (range 3)] (for [j (range 3)] (str i j)))) => [["00" "01" "02"] ["10" "11" "12"] ["20" "21" "22"]]
  • 19. Shape ;; Shape of a 3 x 2 matrix (shape [[1 2] [3 4] [5 6]]) => [3 2] ;; Regular values have no shape (shape 10.0) => nil
  • 20. Dimensionality ;; Dimensionality = ;; = ;; = (dimensionality [[1 [3 [5 => 2 number of dimensions length of shape vector nesting level 2] 4] 6]]) (dimensionality [1 2 3 4 5]) => 1 ;; Regular values have zero dimensionality (dimensionality “Foo”) => 0
  • 21. Scalars vs. arrays (array? [[1 2] [3 4]]) => true (array? 12.3) => false (scalar? [1 2 3]) => false (scalar? “foo”) => true Everything is either an array or a scalar A scalar works as like a 0-dimensional array
  • 22. Indexed element access Dimension 1 0 2 0 0 1 2 1 3 4 5 2 Dimension 0 1 6 7 8 (def M [[0 1 2] [3 4 5] [6 7 8]]) (mget M 1 2) => 5
  • 23. Slicing access Dimension 1 0 2 0 0 1 2 1 3 4 5 2 Dimension 0 1 6 7 8 (def M [[0 1 2] [3 4 5] [6 7 8]]) (slice M 1) => [3 4 5] A slice of an array is itself an array!
  • 24. Arrays as a composition of slices (def M [[0 1 2] [3 4 5] [6 7 8]]) 0 1 2 3 4 5 6 7 8 slices (slices M) => ([0 1 2] [3 4 5] [6 7 8]) 1 2 3 (apply + (slices M)) => [9 12 15] 0 4 5 6 7 8
  • 25. Operators (use 'clojure.core.matrix.operators) (+ [1 2 3] [4 5 6]) => [5 7 9] (* [1 => [0 2 3] [0 4 -3] 2 -1]) (- [1 2] [3 4 5 6]) => RuntimeException Incompatible shapes (/ [1 2 3] 10.0) => [0.1 0.2 0.3]
  • 26. Broadcasting scalars (+ [[0 1 2] [3 4 5] [6 7 8]] (+ [[0 1 2] [[1 1 1] [3 4 5] [1 1 1] [6 7 8]] [1 1 1]] 1 1 )= ? 1 “Broadcasting” [[1 2 3] [4 5 6] [7 8 9]] )=.
  • 27. Broadcasting arrays (+ [[0 1 2] [3 4 5] [6 7 8]] (+ [[0 1 2] [[2 1 0] [3 4 5] [2 1 0] [6 7 8]] [2 1 0]] 1 [2 1 0] 1 “Broadcasting” )= ? [[2 2 2] [5 5 5] [8 8 8]] )=.
  • 28. Functional operations on sequences map reduce (map inc [1 2 3 4]) => (2 3 4 5) (reduce * [1 2 3 4]) => 24 (seq seq [1 2 3 4]) => (1 2 3 4)
  • 29. Functional operations on arrays map ↔ emap “element map” (emap inc [[1 2] [3 4]]) => [[2 3] [4 5]] (ereduce * [[1 2] reduce ↔ ereduce [3 4]]) => 24 “element reduce” seq ↔ eseq “element seq” (eseq [[1 2] [3 4]]) => (1 2 3 4)
  • 30. Specialised matrix constructors 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 (permutation-matrix [3 1 0 2]) 0 0 (identity-matrix 4) 0 0 (zero-matrix 4 3) 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0
  • 32. Matrix multiplication (mmul [[9 2 7] [6 4 8]] [[2 8] [3 4] [5 9]]) => [[59 143] [64 136]]
  • 33. Geometry (def π 3.141592653589793) (def τ (* 2.0 π)) (defn rot [turns] (let [a (* τ turns)] [[ (cos a) (sin a)] [(-(sin a)) (cos a)]])) (mmul (rot 1/8) [3 4]) => [4.9497 0.7071] NB: See Tau Manifesto (http://tauday.com/) regarding the use of Tau (τ) 45 = 1/8 turn
  • 34. Demo
  • 36. Mutability – the tradeoffs Pros Cons  Faster ✘ Mutability is evil  Reduces GC pressure ✘ Harder to maintain / debug  Standard in many existing matrix libraries ✘ Hard to write concurrent code ✘ Not idiomatic in Clojure ✘ Not supported by all core.matrix implementations ✘ “Place Oriented Programming” Avoid mutability. But it’s an option if you really need it.
  • 37. Mutability – performance benefit Time for addition of vectors* (ns) Immutable add 120 Mutable add! 4x performance benefit 28 0 50 100 150 * Length 10 double vectors, using :vectorz implementation
  • 38. Mutability – syntax (add [1 2] 1) [2 3] (add! [1 2] 1) => RuntimeException ...... not mutable! (def a (mutable [1 2])) => #<Vector2 [1.0,2.0]> ;; coerce to a mutable format (add! a 1) => #<Vector2 [2.0,3.0]> A core.matrix function name ending with “!” performs mutation (usually on the first argument only)
  • 41.
  • 42. Lots of trade-offs Native Libraries vs. Pure JVM Mutability vs. Immutability Specialized elements (e.g. doubles) vs. Generalised elements (Object, Complex) Multi-dimensional vs. 2D matrices only Memory efficiency vs. Runtime efficiency Concrete types vs. Abstraction (interfaces / wrappers) Specified storage format vs. Multiple / arbitrary storage formats License A vs. License B Lightweight (zero-copy) views vs. Heavyweight copying / cloning
  • 43. What’s the best data structure? Length 50 “range” vector: 0 1 2 3 .. 49 1. Clojure Vector 2. Java double[] array [0 1 2 …. 49] new double[] {0, 1, 2, …. 49}; 3. Custom deftype 4. Native vector format (deftype RangeVector [^long start ^long end]) (org.jblas.DoubleMatrix. params)
  • 44. There is no spoon
  • 46. Clojure Protocols clojure.core.matrix.protocols (defprotocol PSummable "Protocol to support the summing of all elements in an array. The array must hold numeric values only, or an exception will be thrown." (element-sum [m])) 1. Abstract Interface 2. Open Extension 3. Fast dispatch
  • 47. Protocols are fast and open Function call costs (ns) Open extension Static / inlined code 1.2 Primitive function call 1.9 Boxed function call 7.9 Protocol call 13.8 Multimethod* 89 0 20 40 60 80 * Using class of first argument as dispatch function 100 ✘ ✘ ✘ ✓ ✓
  • 48. Typical core.matrix call path User Code core.matrix API (matrix.clj) Impl. code (esum [1 2 3 4]) (defn esum "Calculates the sum of all the elements in a numerical array." [m] (mp/element-sum m)) (extend-protocol mp/PSummable SomeImplementationClass (element-sum [a] ………))
  • 49. Most protocols are optional PImplementation PDimensionInfo PIndexedAccess PIndexedSetting PMatrixEquality PSummable PRowOperations PVectorCross PCoercion PTranspose PVectorDistance PMatrixMultiply PAddProductMutable PReshaping PMathsFunctionsMutable PMatrixRank PArrayMetrics PAddProduct PVectorOps PMatrixScaling PMatrixOps PMatrixPredicates PSparseArray ….. MANDATORY • Required for a working core.matrix implementation OPTIONAL • • • Everything in the API will work without these core.matrix provides a “default implementation” Implement for improved performance
  • 50. Default implementations Protocol name - from namespace clojure.core.matrix.protocols clojure.core.matrix.impl.default (extend-protocol mp/PSummable Number (element-sum [a] a) Implementation for any Number Object (element-sum [a] (mp/element-reduce a +))) Implementation for an arbitrary Object (assumed to be an array)
  • 51. Extending a protocol (extend-protocol mp/PSummable (Class/forName "[D") Class to implement protocol for, in this (element-sum [m] case a Java array : double[] Add type hint to avoid reflection (let [^doubles m m] (areduce m i res 0.0 (+ res (aget m i)))))) Optimised code to add up all the elements of a double[] array
  • 52. Speedup vs. default implementation Timing for element sum of length 100 double array (ns) (esum v) "Default" 3690 (reduce + v) 2859 (esum v) "Specialised" 15-20x benefit 201 0 1000 2000 3000 4000
  • 53. Internal Implementations Implementation Key Features :persistent-vector • Support for Clojure vectors • Immutable • Not so fast, but great for quick testing :double-array • Treats Java double[] objects as 1D arrays • Mutable – useful for accumulating results etc. :sequence • Treats Clojure sequences as arrays • Mostly useful for interop / data loading :ndarray :ndarray-double :ndarray-long ..... • • • • :scalar-wrapper :slice-wrapper :nd-wrapper • Internal wrapper formats • Used to provide efficient default implementations for various protocols Google Summer of Code project by Dmitry Groshev Pure Clojure N-Dimensional arrays similar to NumPy Support arbitrary dimensions and data types
  • 55. External Implementations Implementation Key Features vectorz-clj • Pure JVM (wraps Java Library Vectorz) • Very fast, especially for vectors and small-medium matrices • Most mature core.matrix implementation at present Clatrix • Use Native BLAS libraries by wrapping the Jblas library • Very fast, especially for large 2D matrices • Used by Incanter parallel-colt-matrix • Wraps Parallel Colt library from Java • Support for multithreaded matrix computations arrayspace • Experimental • Ideas around distributed matrix computation • Builds on ideas from Blaze, Chapele, ZPL image-matrix • Treats a Java BufferedImage as a core.matrix array • Because you can?
  • 56. Switching implementations (array (range 5)) => [0 1 2 3 4] ;; switch implementations (set-current-implementation :vectorz) ;; create array with current implementation (array (range 5)) => #<Vector [0.0,1.0,2.0,3.0,4.0]> ;; explicit implementation usage (array :persistent-vector (range 5)) => [0 1 2 3 4]
  • 57. Mixing implementations (def A (array :persistent-vector (range 5))) => [0 1 2 3 4] (def B (array :vectorz (range 5))) => #<Vector [0.0,1.0,2.0,3.0,4.0]> (* A B) => [0.0 1.0 4.0 9.0 16.0] (* B A) => #<Vector [0.0,1.0,4.0,9.0,16.0]> core.matrix implementations can be mixed (but: behaviour depends on the first argument)
  • 58. Future roadmap  Version 1.0 release  Data types: Complex numbers  Expression compilation  Domain specific extensions, e.g.: symbolic computation (expresso) stats Geometry linear algebra  Incanter integration
  • 59. END
  • 60. Incanter Integration  A great environment for statistical computing, data science and visualisation in Clojure  Uses the Clatrix matrix library – great performance  Work in progress to support core.matrix fully for Incanter 2.0
  • 62. Domain specific extensions Extension library Focus core.matrix.stats Statistical functions core.matrix.geom 2D and 3D Geometry expresso Manipulation of array expressions
  • 63. Broadcasting Rules 1. Designed for elementwise operations - other uses must be explicit 2. Extends shape vector by adding new leading dimensions • original shape [4 5] • can broadcast to any shape [x y ... z 4 5] • scalars can broadcast to any shape 3. Fills the new array space by duplication of the original array over the new dimensions 4. Smart implementations can avoid making full copies by structural sharing or clever indexing tricks

Hinweis der Redaktion

  1. Today I’m going to be talking about core.matrix, and it’s quite appropriate that I’m talking about it here today at the ClojureConj because this project actually came about as a direct result of conversations I had with many people at last year’s ConjThe focus of those discussions was very much about how we could make numerical computing better in Clojure.And the solution I’ve been working on over the past year along with a number of collaborators is core.matrix, which offers array programming as a language extension to Clojure
  2. When I say language extension, it is of course in the sense that Clojure seems to have this ability to absorb new paradigms just by plugging in new libraries.Clojure already stole many good pure functional programming techniques from languages like HaskellAnd of course we have the macro meta-programming capabilities from LispMore recently we’ve got core.logic bringing in Logic programming, inspired by Prolog and miniKanrenAnd core.async bringing in the Communicating Sequential Processes with some syntax similar to GoAnd core.matrix is designed very much in the same way, to provide array programming capabilities. And if we want to trace the roots of array programming, we can go all the way back to this language called APL
  3. About the same age as Lisp? First specified in 1958Love the fact that it has its own keyboard, with all these symbols inspired by mathematical notationAnd you get some crazy code.Might seem like a bit of a dinosaur new
  4. Array programming has had quite a renaissance in recent years.This is because of the increasing important of data science and numerical computing in many fields- So we’ve seen languages like R that provide an environment for statistical computingHighlight value of paradigm – clearly a demand for these kind of numerical computing capabilities
  5. Why bring array programming for Clojure?1. Data science focus – lots of interest in doing data crunching work in Clojure2. Provides a powerful platform: - Why should you have to introduce a whole new stack to get access to array programming paradigm? Shouldn’t have to give up advantages of a good general purpose language to do data science. - Clojure is already a great platform to build on: JVM platform –lots of advantages3. Clojure is compelling for many philosophicalreasons: concurrency, immutability state, a focus on data. Array programming seems to be a good fit for this philosophy.
  6. So today I’m going to talk about core.matrix with three different lensesFirst I want to talk about the abstraction – what are these arrays?Then I’m going to talk about the core.matrix APIImplementation: how does this all work, some of the engineering choices we’ve made
  7. Start off with one of my favourite quotes, because it contains a pretty important insight.“It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures”There is of course one error here….. (click)We should of course be talking about an abstraction here, not a concrete data structure. A great example of this is the sequence abstraction in Clojure – there are literally hundreds of functions that operate on Clojure sequences. Because so many functions produce and consume sequences, it gives you many different ways to compose then together. And it’s more than just the clojure.core API: other code can build on the same abstraction, which means that the composability extends to any code you write that uses the same abstraction. It makes entire libraries composable. In some ways I think the key to building systems using simple, composable components is about having shared abstractions.We’ve taken this principle very much to heart in core.matrix, our abstraction of course is the array - more specifically the multi-dimensional arrayAnd the rest of core.matrix is really all about giving you a powerful set of composable operations you can do with arrays
  8. Overloaded terminology!- Vector = 1D array (maths / array programming sense) – Also a Clojure vector- Matrix: conventionally used to indicate a 2 dimensional numerical array, - Array: in the sense of the N-dimensional array, but also the specific concrete example of a Java arrayDimensions: also overloaded! Here using in the sense of the number of dimensions in an array, but it’s also used to refer to the number of dimensions in a vector space, e.g. 3 dimensional Euclidean space.If we’re lucky it should be clear from the context what we’re talking about.
  9. Give you an idea about how general array programming can be – An array is a way of representing a function using dataInstead of computing a value for each combination of inputs, we’re typically pre-computing all such values
  10. Give you an idea about how general array programming can be – An array is a way of representing a function using dataInstead of computing a value for each combination of inputs, we’re typically pre-computing all such values
  11. Example of adding a 3D array.Java it’s just a big nested loop…Clojure you can do it with nested maps, which is a bit more of a functional style, but still you’ve got this three-level nesting With core.matrix it’s really simple. We just generalise + to arbitrary multi-dimensional arrays and it all just worksDoes conciseness matter? Well if you’re writing a lot of code manipulating arrays it’s going to save you quite a bit of time, but more importantly it makes it much easier to avoid errors. Very easy to get off-by-one errors in this kind of code.core.matrix gives you a nice DSL that does all the index juggling for youAlso it helps you to be mentally much closer to the problem that you are modelling. You ideally want an API that reflects the way that you think about the problem you are solving.
  12. So lets talk about the core.matrix API.This isn’t going to be an exhaustive tour, but I’m going to highlight a few of the key features to give you a taste of what is possible
  13. One of the important API design objectives was to exploit the “natural equivalence of arrays to nested Clojure vectors”. 1D array is a Clojure vector, 2D array is like a vector of vectorsMost things in the core.matrix API work with nested Clojure vectors.This is nice – gives a natural syntax, and great for dynamic, exploratory work at the REPL.
  14. The most fundamental attribute of an array is probably the shape
  15. The most fundamental attribute of an array is probably the shape
  16. Arrays are compositions of arrays!This is one of the best signs that you have a good abstraction: if the abstraction can be recursively defined as a composition of the same abstraction.
  17. So of course we have quite a few different functions that let you work with slices of arrays.Most useful is probably the slices function, which cuts an array into a sequence of its slicesPretty common to want to do this – imagine if each slice is a row in your data set
  18. We define array versions of the common mathematical operators.These use the same names as clojure.coreYou have to use the clojure.core.matrix.operators namespace if you want to use these names instead of the standard clojure.core operators
  19. Question: what should happen if we add a scalar number to an array?We have a feature called broadcasting, which allows a lower dimensional array to be treated as a higher dimensional array
  20. The idea of broadcasting also generalises to arrays!Here the semantics is the same, we just duplicate the smaller array to fill out the shape of the larger array
  21. So lets talk about some higher order functionsTwo of my favourite Clojure functions – map and reduce are extremely useful higher order functions
  22. So one of the interesting observations about array programming is that you can also see it as a generalisation of sequences in multiple dimensions, so it probably isn’t too surprising that many of the sequence functions in Clojure actually have a nice array programming equivalentemap is the equivalent of map, it maps a function over all elements of an array – the key difference is that is preserves the structure of the array so here we’re mapping over a 2x2 matrix, and therefore we get a 2x2 resultereduce is the equivalent of reduce over all elementseseqis a handy bridge between core.matrix arrays and regular Clojure sequences – it just returns all the elements of an array in orderNote row-major ordering of eseq and ereduce
  23. Basically mutability is horrible. You should be avoiding it as much as you canBut it turns out that it is needed in some cases – performance matters for numerical workMutability OK for library implementers, e.g. accumulation of a result in a temporary arrayOnce a value is constructed, shouldn’t be mutated any more
  24. Usually 4x performance benefit isn’t a big deal – unless it happens to be your bottleneckThere are cases where it might be important: e.g. if you are crunching through a lot of data and need to add to some sort of accumulator…
  25. Mutability OK for library implementers, e.g. accumulation of a result in a temporary arrayOnce a value is constructed, shouldn’t be mutated any more
  26. Clearly this is insane – why so many matrix libraries?
  27. This explains the problem. But doesn’t really help us….
  28. The point is – there isn’t ever going to be a perfect right answer when choosing a concrete data type to implement an abstraction. There are always going to be inherent advantages of different approaches
  29. Luckily we have a secret weapon, and I think this is actually what really distinguishes core.matrix from all other array programming systems
  30. Of course the secret weapon is Clojure protocols.Here’s an example – PSummable protocol is a very simple protocol that allows to to compute the sum of all values in an arrayThree things are important to know about First is that they define an abstract interface – which is exactly what we need to define operations that work on our array abstractionSecondly they feature open extension: which means that we can solve the expression problem and use protocols with arbitrary types – importantly, this includes types that weren’t written with the protocol in mind – e.g. arbitrary Java classesThird feature is really fast dispatch – which is important if we want to core.matrix to be useful in high performance situations.
  31. Protocols are really the “sweet spot” of being both fast and openWe benchmarked a pretty wide variety of different function calls
  32. It’s easy to make a working core.matrix implementation!It’s more work if you want to make it perfom across the whole APIBut that’s OK because it can be done incrementallySo hopefully this provides a smooth development path for core.matrix implementations to integrate
  33. The secret is having default implementations for all protocols, that get used if you haven’t extended the protocol for your particular typeNote that the default implementation delegates to another protocol call – this is generally the case, ultimately all these protocol calls have to be implemented in terms of the lower-level mandatory protocols if we want them to work on any array.
  34. Value of a specialised implementation
  35. Makes some operations very efficient- For example if you want to transpose an NDArray, you just need to reverse the shape and reverse the strides.
  36. vectorz-clj: probably the best choice if you want general purpose double numericsclatrix: probably the best choice if you want linear algebra with big matrices
  37. Not only can you switch implementation: you can also mix them!Actually quite unique capabilityHow do we do this? Provide generic coercion functionality – so implementations typically use this to coerce second argument to type of the first
  38. So we have some rules for broadcastingNote that it only really makes sense for elementwise operations. You can broadcast arrays explicitly if you want to to, but it only happens automatically for elementwise operations at present.Can only add leading dimensions.