MBrace is a programming model and cluster infrastructure for effectively defining and executing large scale computation in the cloud. Based on the .NET framework, it builds upon and extends F# asynchronous workflows.
https://skillsmatter.com/skillscasts/5157-mbrace-large-scale-distributed-computation-with-f
2. Athens based ISV company
Specialize in the .NET framework and C#/F#
Various business fields
◦ Business process management
◦ GIS
◦ Application framework development
R&D Development
◦ OR Mappers
◦ MBrace and related frameworks
◦ Open Source development
About Nessos IT
3. What is MBrace?
A Programming Model.
◦ Leverages the power of the F# language.
◦ Inspired by F#’s asynchronous workflows.
◦ Declarative, compositional, higher-order.
A Cluster Infrastructure.
◦ Based on the .NET framework.
◦ Elastic, fault tolerant, multitasking.
4. HelloWorld
The MBrace Programming Model
val hello : Cloud<unit>
let hello = cloud {
printfn "hello, world!"
return ()
}
MBrace.CreateProcess <@ hello @>
5. Sequential Composition
The MBrace Programming Model
let first = cloud { return 15 }
let second = cloud { return 27 }
cloud {
let! x = first
let! y = second
return x + y
}
6. Example : Sequential fold
The MBrace Programming Model
val foldl :
('S -> 'T -> Cloud<'S>) ->
'S -> 'T list -> Cloud<'S>
let rec foldl f s ts = cloud {
match ts with
| [] -> return s
| t :: ts' ->
let! s' = f s t
return! foldl f s' ts'
}
7. ParallelComposition
The MBrace Programming Model
val (<||>) : Cloud<'T> -> Cloud<'S> -> Cloud<'S * 'T>
cloud {
let first = cloud { return 15 }
let second = cloud { return 27 }
let! x,y = first <||> second
return x + y
}
8. ParallelComposition (Variadic)
The MBrace Programming Model
val Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []>
cloud {
let sqr x = cloud { return x * x }
let jobs = Array.map sqr [|1 .. 100|]
let! sqrs = Cloud.Parallel jobs
return Array.sum sqrs
}
9. Non-Deterministic Parallelism
The MBrace Programming Model
val Cloud.Choice : Cloud<'T option> [] -> Cloud<'T option>
let tryPick (f : 'T -> Cloud<'S option>) (ts : 'T []) =
cloud {
let jobs = Array.map f ts
return! Cloud.Choice jobs
}
10. Exception handling
The MBrace Programming Model
let first = cloud { return 17 }
let second = cloud { return 25 / 0 }
cloud {
try
let! x,y = first <||> second
return x + y
with :? DivideByZeroException ->
return -1
}
11. Example: Map-Reduce
The MBrace Programming Model
let mapReduce (mapF : 'T -> ICloud<'S>)
(reduceF : 'S -> 'S -> ICloud<'S>)
(identity : 'S) (inputs : 'T list) =
let rec aux inputs = cloud {
match inputs with
| [] -> return identity
| [t] -> return! mapF t
| _ ->
let left,right = List.split inputs
let! s1, s2 = aux left <||> aux right
return! reduceF s1 s2
}
aux inputs
14. About that MapReduce workflow…
Communication Overhead.
◦ Data captured in cloud workflow closures.
◦ Needlessly passed between worker machines.
Granularity issues.
◦ Each input entails a scheduling decision by the cluster.
◦ Cluster size not taken into consideration.
◦ Multicore capacity of worker nodes ignored.
15. The Cloud Ref
Distributed Data in MBrace
let createRef (data : string list) = cloud {
let! cref = CloudRef.New data
return cref : CloudRef<string list>
}
let deRef (cref : CloudRef<string list>) = cloud {
return cref.Value
}
16. The Cloud Ref
Distributed Data in MBrace
Simplest data primitive in MBrace.
References a value stored in the cluster.
Conceptually similar to ML ref types.
Immutable by design.
Values cached in worker nodes for performance.
17. Disposable types
Distributed Data in MBrace
cloud {
use! data = CloudRef.New [| 1 .. 1000000 |]
let! x,y = doSomething data <||> doSomethingElse data
return x + y
}
19. Performance
We tested MBrace against Hadoop.
Tests were staged onWindows Azure.
Clusters of 4, 8, 16 and 32 Large Azure instances.
Two algorithms were tested, grep and k-means.
Source code available on github.
20. Distributed grep
Performance
Find occurrences of given pattern in text files.
Straightforward Map-Reduce algorithm.
Input data was 32, 64, 128 and 256 GB of text.
21. Distributed grep
Performance
Find occurrences of given pattern in text files.
Straightforward Map-Reduce algorithm.
Input data was 32, 64, 128 and 256 GB of text.
23. K-means
Performance
Centroid computation out of a set of vectors.
Iterative algorithm.
Not naturally describable in Map-Reduce workflows.
Hadoop implementation using Apache Mahout.
Input was 106
, randomly generated 100-dimensional
points.