6. UNIX File API
● read(), write()
● fsync() (Flush(), in Go)
The fsync() function is intended to force a physical write of data from the
buffer cache, and to assure that after a system crash or other failure that
all data up to the time of the fsync() call is recorded on the disk. Since
the concepts of ``buffer cache'', ``system crash'', ``physical write'', and
``non-volatile storage'' are not defined here, the wording has to be more
abstract.
7.
8. Caches all the way down
● App
● Filesystem
● Block Device
● Hardware
11. Magical File Pointer
● Distributed/Sharded
● Replicated
● Error-corrected
● Versioned
● Available anywhere on the cluster
The Dream of Plan 9 is alive in Golang
12. Storage is Generic (when you squint)
Bytes, persisted somewhere
● Block Devices
○ One, fixed size, “magical file pointer”
● Object Stores
○ HTTP API to some magic bucket of (often append-
only) MFPs
● Filesystems
○ POSIX semantics and metadata referring to MFPs
13. Reminder : The only way to know
something is durable is to sync()
(and pray you’re not being lied to)
15. Storage is a HARD problem.
...But Torus’ major claim is that it’s a
SEPARABLE problem.
16. We all agree on the architecture
Metadata
Storage
Storage
Storage
Storage
Client
17. Architecture, Explained
● Client
○ Mounts a data source
○ Gets data from storage directly
○ Updates metadata
● Metadata
○ Presents a consistent view for multiple clients
○ Knows where the data lives
○ Service discovery, etc
● Storage
○ Truthfully stores data it’s responsible for
18. Architecture, Explained
● Client <-> Metadata
○ Updates metadata about what it’s reading/writing
○ Where to find the storage in question
● Client <-> Storage
○ Bulk data transport
● Metadata <-> Storage
○ Is told what data it owns
○ Tells it what to keep or throw away
19. Distributed Consensus is Hard
Any form of distributed storage (FS, database, ….)
requires at least some of its data be strongly consistent.
Everyone has had to reimplement this in some form or
another, if for no other reason than locking -- but for
many good reasons too, eg, declaring something
written/available to all clients.
Yo, turns out we have a secret weapon….
20. So what’s the separation?
Storage
Storage
Storage
Storage
Client
23. Short term
● Stress testing
● More block device features
● Even better Kubernetes integration
24. Long term
● Object storage volumes
○ Multi R/W, various object-y APIs
● Experiments with transports
○ eg, QUIC
● More intricate rings
○ Rack-awareness, data locality, drive speed….
○ (Lots of ‘fun’ here)
● Kubernetes awareness
○ Schedule work near data