Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1GHEzwu.
Anil Madhavapeddy introduces the Irmin library by means of a functional queue, shows how the Git mirroring works, and then demonstrates some more complex applications such as its use in as a unikernel storage layer. Irmin also works in the browser as a pure JavaScript library, and demonstrates the "Cuekeeper" GTD application built using it. Filmed at qconnewyork.com.
Anil Madhavapeddy is a Senior Research Fellow at the University of Cambridge, based in the Systems Research Group. He is co-chair of the Commercial Uses of Functional Programming workshop, and an author of the O'Reilly Real World OCaml book.
1. Functional Distributed Programming with Irmin
QCon NYC 2015, New York
Anil Madhavapeddy (speaker)
with Benjamin Farinier, Thomas Gazagnaire, Thomas Leonard
University of Cambridge Computer Laboratory
June 12, 2015
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 1 / 29
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/irmin
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
4. Background
Git in the datacenter
Irmin, a large-scale, immutable, branch-consistent storage
Weakly consistent data structures
Mergeable queues
Mergeable ropes
Benchmarking Irmin
Use Cases
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 2 / 29
5. Background Git in the datacenter
Common features every distributed system needs
• Persistence for fault tolerance and scaling
• Scheduling of communication between nodes
• Tracing across nodes for debugging and profiling
Most distributed systems run over an operating system, and so are
stuck with the OS kernel exerting control. We use unikernels, which
are application VMs that have complete control over their resources.
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 3 / 29
6. Background Git in the datacenter
What if we just used Git?
• Persistence
• git clone of a shared repository across nodes
• git commit of local operations in the node
• Scheduling
• git pull to receive events from other nodes
• git push to publish events to other nodes
• Tracing and Debugging
• git log to see global operations
• git checkout to roll back time to a snapshot
• git bisect to locate problem messages
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 4 / 29
7. Background Git in the datacenter
Problems with using Git?
• Garbage Collection
• Git records all operations permanently, so our database will grow
permanently!
• git rebase is needed to compact history.
• Shell Control
• Calling the git command-line is slow and lacks fine control.
• Makes it hard to extend the Git protocol for additional features.
• Programming Model
• Git is designed for distributed source code manipulation.
• Built-in merge functions designed around text files.
• Let’s use it for distributed data structures instead!
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 5 / 29
8. Background Irmin, a large-scale, immutable, branch-consistent storage
Irmin, large-scale, immutable, branch-consistent storage
• Irmin is a library to persist and synchronize distributed data
structures both on-disk and in-memory
• It enables a style of programming very similar to the Git workflow,
where distributed nodes fork, fetch, merge and push data between
each other
• The general idea is that you want every active node to get a local
(partial) copy of a global database and always be very explicit
about how and when data is shared and migrated
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 6 / 29
9. Background Irmin, a large-scale, immutable, branch-consistent storage
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 7 / 29
10. Background Irmin, a large-scale, immutable, branch-consistent storage
type t = ...
(** User -defined contents. *)
type result = [
‘Ok of t
| ‘Conflict of string
]
val merge:
old:t → t → t → result
(** 3-way merge functions. *)
old
x y
?
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 8 / 29
11. Background Irmin, a large-scale, immutable, branch-consistent storage
Demo: Distributed Logging
Multiple nodes all logging to a central store:
1 Design the logging data structure.
• A log is a list of (string + timestamp)
• When merging, the timestamps must be in increasing order
• Equal timestamps can be in any order
• With this logic, merge conflicts are impossible
2 Every node clones the log repository
3 A log is recorded locally, then pushed centrally.
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 9 / 29
12. Weakly consistent data structures
Weakly consistent data structures
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 10 / 29
13. Weakly consistent data structures Mergeable queues
Mergeable queues
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 11 / 29
14. Weakly consistent data structures Mergeable queues
module type IrminQueue.S = sig
type t
type elt
val create : unit → t
val length : t → int
val is_empty : t → bool
val push : t → elt → t
val pop : t → (elt * t)
val peek : t → (elt * t)
val merge : IrminMerge.t
end
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 12 / 29
15. Weakly consistent data structures Mergeable queues
I0
n01
n02
n03
n04
n05
n06
n07
I1
n11
n12
n13
n14
top
bottom
top
bottom
Index
Node
Elt
pop list
push list
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 13 / 29
16. Weakly consistent data structures Mergeable queues
I old
A
B
C
D
I 1
A
B
C
D
E
I 2
B
C
D
F
G
I 1
B
C
D
E
I 2
F
G
I 1
B
C
D
E
I 2
F
G
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 14 / 29
17. Weakly consistent data structures Mergeable queues
Current state
Operation Read Write
Push 0 2 O(1)
Pop 2 on average 1 on average O(1)
Merge n 1 O(n)
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 15 / 29
18. Weakly consistent data structures Mergeable queues
Current state
Operation Read Write
Push 0 2 O(1)
Pop 2 on average 1 on average O(1)
Merge n 1 O(n)
With a little more work
Operation Read Write
Push 0 2 O(1)
Pop 2 on average 1 on average O(1)
Merge log n 1 O(log n)
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 15 / 29
19. Weakly consistent data structures Mergeable ropes
Mergeable ropes
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 16 / 29
20. Weakly consistent data structures Mergeable ropes
module type IrminRope.S = sig
type t
type value (* e.g char *)
type cont (* e.g string *)
val create : unit → t
val make : cont → t
...
val set : t → int → value → t
val get : t → int → value
val insert : t → int → cont → t
val delete : t → int → int → t
val append : t → t → t
val split : t → int → (t * t)
val merge : IrminMerge.t
end
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 17 / 29
21. Weakly consistent data structures Mergeable ropes
Operation Rope String
Set/Get O(log n) O(1)
Split O(log n) O(1)
Concatenate O(log n) O(n)
Insert O(log n) O(n)
Delete O(log n) O(n)
Merge log (f (n)) f (n)
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 18 / 29
22. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
23. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
24. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
5
2 2
lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
25. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
5
2 2
lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
26. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
5
2 2 2
lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
27. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
5
2 2 2 4
3lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
28. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
5 5
2 2 2 4
3lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
29. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
10
5 5
2 2 2 4
3lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
30. Weakly consistent data structures Mergeable ropes
10
5 2
2 2 1
lo rem ip sum
do
a met
10
5 5
2 2 2 1
lo rem ip sum do lor a met
10
5 2
2 2 4
3lo rem ip sum
do
sit a
met
10
5 5
2 2 2 4
3lo rem ip sum do lor
sit a
met
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 19 / 29
31. Benchmarking Irmin
Benchmarking Irmin
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 20 / 29
32. Benchmarking Irmin
0
100000
200000
300000
400000
500000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Timespentforwholeoperations(µs)
Number of push/pop successively applied
Core
Obj
Memory
GitMem
GitDsk
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 21 / 29
33. Benchmarking Irmin
module ObjBackend ... = struct
type t = unit
type key = K.t
type value = V.t
let create () = return ()
let clear () = return ()
let add t value =
return (Obj.magic (Obj.repr value))
let read t key =
return (Obj.obj (Obj.magic key))
let mem t key = return true
...
end
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 22 / 29
34. Benchmarking Irmin
0
100
200
300
400
500
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Timespentforwholeoperations(µs)
Number of push/pop successively applied
Core
Obj
Memory
GitMem
GitDsk
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 23 / 29
35. Use Cases
Use Cases
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 24 / 29
36. Use Cases
Demo: Dog, a loyal synchronization tool
Command line interface to logging clients at
https://github.com/samoht/dog
1 dog listen to setup the server listener
• Server maintains list of clients in a subtree
• It regularly merges all clients in parallel to master branch
2 dog init starts up a client logger
3 dog push syncs the client with the server
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 25 / 29
37. Use Cases
Demo: CueKeeper, an Irmin TODO manager
Do Git programming in the browser
https://github.com/talex5/cuekeeper
http://test.roscidus.com/CueKeeper/
1 Irmin is written in OCaml, and compiles to efficient JavaScript
• Git objects are mapped into IndexedDB
• Uses LocalStorage to sync between tabs
2 DOM elements are computed from the Git store (a React-like
model)
3 Client has full history, snapshotting and custom merge logic.
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 26 / 29
38. Use Cases
Demo: XenStore TNG
The Xen hypervisor toolstack
https://www.youtube.com/watch?v=DSzvFwIVm5s
1 Xen is a widely deployed hypervisor (Amazon EC2, Rackspace
Cloud, ...)
• Every VM boot needs a lot of communication
• Tracing when something goes wrong is hard
• Programming model is quite reactive
2 Dave Scott from Citrix ported the core toolstack to use Irmin,
and made it faster!
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 27 / 29
39. Use Cases
Why OCaml?
• Let us prototype complex functional datastructures very quickly
• Efficient compilation to native code (x86, ARM, PowerPC,
Sparc, ...), unikernels (MirageOS), JavaScript and Java
• Execution model is strict and predictable, important for
systems programming
• Native code compilation is statically linked, or can be used as a
normal shared library
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 28 / 29
40. Use Cases
Irmin Status (“Not Entirely Insane”)
• Still pre 1.0, but several useful datastructures such as
distributed queues and efficient ropes.
• HTTP REST for remote clients, library via OCaml, or
command-line interface.
• Bidirectional operation, so git commits map to Irmin commits
from any direction.
• Open source at https://irmin.io, installable via the OPAM
package manager at https://opam.ocaml.org
• Feedback welcome at
mirageos-devel@lists.xenproject.org or
https://github.com/mirage/irmin/issues
Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge CompuFunctional Distributed Programming with Irmin 29 / 29
41. Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/irmin