Advanced Functional Programming in Scala

Advanced Functional
Programming in Scala
Patrick Nicolas
Oct 2013
Rev. July 2015
patricknicolas.blogspot.com
www.slideshare.net/pnicolas

This is an overview of some interesting advanced
features of Scala. It is not meant to be a tutorial and
assume that you are familiar with the key constructs
of the language.
Some of the examples are extracted from Scala for
Machine Learning – Packt Publishing

Scala has a lot of features …..
Actors
Composed futures
F-bound
Reactive
Advanced functional programming?
... among them

Higher kind projection
Contravariant functors
Monadic composition
Streams
Views
Type classes
Stacked mixins models
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Dataflow back pressure
Continuation passing style

Functors and monads are defined as single type higher kinds:
M[_]. The problem is to define monadic composition for
objects belongs to categories that have two or more types
M[_, _] ( i.e. Function1[U, V] ).
Scala support functorial and monadic operations for multi-
type categories using higher kind type projection

Let us consider a covariant functor F that applies a morphism f
within a category C defined as
∀𝑎, 𝑏 ∈ 𝐶 𝑓: 𝑎 → 𝑏
𝐹 𝑎 → 𝑏 = 𝐹 𝑎 → 𝐹(𝑏)
The definition of a functor in Scala relies on a single type
higher kind M
(*) Functors are important concepts in algebraic topology used
in defining algebra for tensors for example.

How can we define a functor for classes that have multiple
parameterized type?
Let’s consider the definition of a tensor using Scala Function1
The covariant CoVector (resp. contravariant Vector) vectors are
created through a projection onto the covariant (resp.
contravariant) parameterized type T of Function1.

The implementation of the functor for the Vector type uses the
projection of the higher kind Function1 to its covariant
component by accessing # the inner type Vector of Tensor
The map applies covariant composition, compose of Function1

Some categories of objects such as covariant tensors or
function parameterized on the input or contravariant
type (i.e. T => Function1[T, U] for a given type U),
require the order of morphisms be reversed.
Morphisms on contravariant argument type are transported
through contravariant functors.

Let us consider a contravariant functor F that applies a
morphism f within a category C defined as
∀𝑎, 𝑏 ∈ 𝐶 𝑓 𝑎 → 𝑏
𝐹 𝑎 → 𝑏 = 𝐹 𝑏 → 𝐹(𝑎)
The definition of a contravariant functor in Scala relies on a
single type higher kind M

The implementation of the contravariant functor for the CoVector
type uses the projection of the higher kind Function1 to its
covariant component by accessing # the inner type CoVector of
Tensor
The map applies covariant composition, andThen of Function1

It is quite common to compose, iteratively or recursively
functions, methods or data transformations.
Monadic composition
Monads extends the concept of functor to support
composition (or chaining) of computation into a chain

Monads are abstract structures in algebraic topology related to
the category theory.
A category C is a structure which has
● object {a, b,c...}
● morphism or maps on objects f: a->b
● composition of morphisms
f: a->b, g: b->c => f o g: a->c
Monads enable the “monadic” composition or chaining of
functions or computation on single type argument.
Monadic composition

Let’s consider the definition of a kernel function Kf as the composition
of 2 functions g o h.
𝒦𝑓 𝐱, 𝐲 = 𝑔(
𝑖
ℎ(𝑥𝑖, 𝑦𝑖))
Monadic composition
We create a monad to generate any kind of kernel functions Kf, by
composing their component g: g1 o g2 o … o gn o h

A monad extends a functor with binding method (flatMap)
The monadic definition of the kernel function component h
Monadic composition

Example of Kernel functions
𝒦 𝐱, 𝐲 = 𝑒
−
1
2
𝐱−𝐲
𝜎
2
h: 𝑥, 𝑦 → 𝑥 − 𝑦 g: 𝑥 → 𝑒
−
1
2𝜎2( 𝑥)2
Polynomial kernel
𝒦 𝐱, 𝐲 = (1 + 𝐱. 𝐲) 𝑑
h: 𝑥, 𝑦 → 𝑥. 𝑦 g: 𝑥 → (1 + 𝑥) 𝑑
Monadic composition
Radius basis function kernel

The monadic composition consists of chaining the flatMap invocation
on the functor, map, that preserves morphisms on kernel functions.
Monadic composition
The for comprehension closure is a syntactic sugar on the iterative
monadic composition.

Streams
Streams reduce memory consumption by allocating and
releasing chunk of data (or slice or time series) while allowing
reuse of intermediate results.
Some problems lend themselves to process very large data
sets of unknown size for which the execution may have to be
aborted or re-applied

The large data set is converted into a stream then broken
down into manageable slices. The slices are instantiated,
processed (i.e. loss function) and released back to the
garbage collector, one at the time
X0 X1 ….... Xn ………. Xm
Data stream
1
2𝑚
𝑦 𝑛 − 𝑓 𝒘|𝑥 𝑛
2
+ 𝜆 𝒘 2
Garbage collector
Xi
Allocate
slice .take
Release slice .drop
Heap
Traversal loss function
Streams

Slices of NOBS observations are allocated one at the time, (take)
processed, then released (drop) at the time.
Views and streams

The reference streamRef has to be weak, in order to have the slices
garbage collected. Otherwise the memory consumption increases
with each new batch of data.
(*) Alternatives: define strmRef as a def or use StreamIterator
Views and streams

Comparing list, stream and stream with weak references.
Views and streams
Operating zone

Views
Scientific computations require chaining complex data
transformations on large data set. There is not always a
need to process all elements of the dataset.
Scala allows the creation of a view on collections that are
the result of a data transformation. The elements are
instantiated only once needed.

Views
Accessing an element of the list requires allocating
the entire list in memory.
Accessing an element of the view requires allocating
only this element in memory.

Type classes
Scala libraries classes cannot always be sub-classed.
Wrapping library component in a helper class clutters the
design.
Type classes extends classes functionality without
cluttering name spaces (alternative to type classes)
The purpose of reusability goes beyond refactoring code.
It includes leveraging existing well understood concepts
and semantic.

Let’s consider the definition of a tensor as being either a vector
or a covector.
Type classes
Let’s extend the concept of tensor with. A metric is computed as
the inner product or composition of a Covector and a vector.
The computationis implemented by the method Metric.apply

Type classes
The inner object Metric define the implicit conversion

Scala stacked traits and abstract values preserve the core
formalism of mathematical expressions.
Traditional programming languages compare unfavorably to
scientific related language such as R because their inability
to follow a strict mathematical formalism:
1. Variable declaration
2. Model definition
3. Instantiation

𝑓 ∈ ℝ 𝑛
→ ℝ 𝑛
𝑓 𝑥 = 𝑒 𝑥
𝑔 ∈ ℝ 𝑛 → ℝ
ℎ = 𝑔𝑜𝑓
g 𝒙 = 𝑖 𝑥𝑖
Declaration
Model
Instantiation

Building machine learning apps requires configurable,
dynamic workflows
Leverage mixins, inheritance and abstract values to create
models and weave data transformation.
Factory design patterns have been used to model dynamic
systems (GoF). Dependency injection has gain popularity
for creating configurable systems (i.e. Spring framework).

Multiple models and algorithms are typically evaluated by
weaving computation tasks.
A learning platform is a framework that
• Define computational tasks
• Wires the tasks (data flow)
• Deploys the tasks (*)
Overcome limitation of monadic composition (3 level of
dynamic binding…)
(*) Actor-based deployment

Even the simplest workflow, defined as a pipeline of data transformations
requires a flexible design …

Summary of the 3 configurability layers of Cake pattern
1. Given the objective of the computation, select the best
sequence of module/tasks (i.e. Modeling: Preprocessing +
Training + Validating)
2. Given the profile of data input, select the best data
transformation for each module (i.e. Data preprocessing:
Kalman, DFT, Moving average….)
3. Given the computing platform, select the best
implementation for each data transformation (i.e. Kalman:
KalmanOnAkka, Spark…)

Implementation of Preprocessing module

Implementation of Preprocessing module using discrete Fourier
… and discrete Kalman filter

d
d
Preprocessing
Loading
Reducing Training
Validating
Preprocessor
DFTFilter
Kalman
EM
PCA SVM
MLP
Reducer Supervisor
Clustering
Clustering workflow = preprocessing task -> Reducing task
Modeling workflow = preprocessing task -> model training task
-> model validation
Modeling

A simple clustering workflow requires a preprocessor &
reducer. The computation sequence exec transform a time
series of element of type U and return a time series of
type W as option

A model is created by processing the original time series of type TS[T]
through a preprocessor, a training supervisor and a validator

Putting all together for a conditional path execution …

Magnet pattern
Method overloading in Scala has limitations:
• Type erasure in the JVM causes collision of type of
arguments in overloaded methods
• Overloaded methods cannot be lifted into a function
• Code may be unecessary duplicated
The magnet pattern overcomes these limitations by
encapsulating the return and redefining the overloaded
methods as implicit functions.

Magnet pattern
Let’s consider the following three incarnations of the method test
These methods have different return types. The first and last
methods conflict because of type erasure on T => List[Double]

Magnet pattern
Step 1: Define generic return type and constructor
Step 2: Implement the test methods as implicits

Magnet pattern
Step 3: Implement the lifted function test as follows
The first call invokes the implicit fromTN and the second
triggers the implicit fromT.
The return type is inferred from the type of argument

View bound
Context bound cannot be used to bind the parameterized
type of a generic class to a primitive type.
Scala view bounds allows to create developers to create
class with parameterized types associated to a Scala or
Java primitive type.

View bound
Let’s consider a class which parameterized type can be
manipulate as a Float.
Context bound is not permissible
Constraining the type with a upper bound Float does not
work as Float is a final class.

View bound
The solution is to bind the class type to a Float using an
implicit conversion (or view)
The <% directive is the short notation for

F-Bound polymorphism is a parametric type polymorphism
that constrains the subtypes to themselves using bounds.
It is important to write code that catch error at compile
time. How can we enforce type integrity in subclasses?
F-Bound polymorphism

Let’s create a trait that define a discriminative learning model
with method to manipulate data.
The class Svm and Mlp implements the Discriminative trait.
The problem is that nothing prevent to create a class Nnet
that impersonates an Svm class.

One solution is to restrict (or bound) the type to a Discriminative
class
It prevents a new class to insert itself into the hierarchy.
.. but does not guarantee the type integrity for existing classes

The self reference guarantee the integrity of each existing
and new subclass. F-Bound polymorphism is a self-
referenced bound polymorphism.

Monadic composition
Streams
Views
Type classes
Cake pattern
Magnet pattern
View bounds
F-bound polymorphism
Data flow control
Continuation passing style

Data flow back pressure
A data flow control mechanism handling back pressure
on bounded mail boxes of upstream actors.
Scala actors provide a reliable way to deploy workflows
on a distributed environment. However, some nodes
may experience slow processing and create performance
bottlenecks.

Actor-based workflow has to consider
- Cascading failures => supervision strategy
- Cascading bottleneck => Mailbox back-pressure strategy
Workers
Router, Dispatcher, …

Messages passing scheme to process various data streams
with transformations.
Dataset
Workers
Controller
Watcher
Load->
Compute->
Bounded mailboxes
<- GetStatus
Status ->
Completed->

Worker actors processes data chunk msg.xt sent by the
controller with the transformation msg.fct
Message sent by collector to trigger computation

Watcher actor monitors messages queues report to collector with
Status message.
GetStatus message sent by the collector has no payload

Controller creates the workers, bounded mailbox for each worker
actor (msgQueues) and the watcher actor.

The Controller loads the data sets per chunk upon receiving the
message Load from the main program. It processes the results of
the computation from the worker (Completed) and throttle the
input to workers for each Status message.

The Load message is implemented as a loop that create data chunk
which size is adjusted according to the load computed by the
watcher and forwarded to the controller, Status

Simple throttle increases/decreases size of the batch of
observations given the current load and specified watermark.
Selecting faster/slower and less/more accurate version of algorithm
can also be used in the regulation strategy

Feedback control loop adjusts the size of the batches given the
load in mail boxes and complexity of the computation

• Feedback control loop should be smoothed (moving
average, Kalman…)
• A larger variety of data flow control actions such as
adding more workers, increasing queue capacity, …
• The watch dog should handle dead letters, in case of a
failure of the feedback control or the workers.
• Reactive streams introduced in Akka 2.2+ has a
sophisticated TCP-based propagation and back pressure
control flows
Notes

Delimited continuation
Continuation Passing Style (CPS) is a technique that
abstracts computation unit as a data structure in order to
control the state of a computer program, workflow or
sequence of data transformations
Continuations are used to ‘jump’ to a method that
produces a call to the current method. They can be
regarded as ‘functional GOTO’

A data transformation (or computation unit) can be
extended (continued) with another transformation known
as continuation. The continuation is provided as argument
of the orginal transformation.
Let’s consider the following workflow
The first workflow is not a continuation, the second is

A delimited continuation is a section of the workflow that
is reified into a function returning a value. This technique
relies on control delimiters (shift/reset) to make the
continuation composable and reusable.

More Scala nuggets…
• Domain specific language
• Reactive streams
• Back-pressure strategy using connection state
Wait a minute, there is more…..

Advanced Functional Programming in Scala

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Advanced Functional Programming in Scala

Ähnlich wie Advanced Functional Programming in Scala (20)

Mehr von Patrick Nicolas

Mehr von Patrick Nicolas (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Advanced Functional Programming in Scala

Hinweis der Redaktion