This document discusses the design of a scalable file server system. It begins by defining the basic requirements of the system, including file creation, reading, writing, deletion and benchmarks for measuring success. It then addresses various design challenges such as handling reads, accommodating writes while allowing for multiple readers, improving performance through caching and reducing contention, providing fault tolerance through mechanisms like logging and replication, and electing leaders during failure recovery. The document works through these challenges by drawing analogies to concepts from other domains like page caching, file systems, disk storage, and libraries.
2. Getting off, on the wrong
foot
A(0) = A(n) for any n
Fn - the n-th filter
initial conditions
step 0 : C(0) = A(0), V(0) = [], D(0) = []
step n :
on an exclude filter Fn...
- D[n) = C(n-1).filter(Fn)
- C(n) = C(n-1).exclude(Fn)
- V(n) = V(n-1) + D(n)
on an include filter F ...
- D[n) = V(n-1).filter(Fn)
- C(n) = C(n-1) + D(n)
- V(n) = V(n-1).exclude(Fn)
3. Getting off, on the wrong
foot
at step n
on an exclude filter F(n) ...
- C(n) = C(n-1).exclude(F(n))
= C(0).exclude(G(n-
1)).exclude(F(n))
= C(0).exclude(G(n-1) | F(n))
- G(n) = G(n-1) | F(n)
- V(n) = A(n) - C(n)
= A(0) - C(0).exclude(G(n))
= C(0).exclude(!G(n))
- H(n) = !G(n)
on an include filter F(n) ...
- V(n) = V(n-1).exclude(F(n))
= C(0).exclude(H(n-
1)).exclude(F(n))
= C(0).exclude(H(n-1) | F(n))
= C(0).exclude(H(n))
4. Getting off, on the wrong
foot
reducing it further to eliminate H...
G(n) = G(n-1) | F(n) if F(n) is an
exclude filter
G(n) = !H(n) if F(n) is an include filter
= !(H(n-1) | F(n))
= !H(n-1) & !F(n)
= G(n-1) & !F(n)
you can work out D(n) similarly
5. The rest of this talk …
We’ll (pseudo) design a system
Share some experiences from a real effort
Touch upon the human angle to systems
research
but not necessarily in that order …
6. The system, “defined”
Scalable file server – pretty simple spec.
Interface
Create/Read/Write/Delete
Measures of success & benchmarks
Is there a gold standard?
8. page cache & virtual
memory
Picture frame Memory pages
Video library File system
Album, skip, duration Filename, offset, length
9. file systems & disks
Library classification Directories & files
Books on shelves Files on disks
Shelves : place | Disk : read | write
remove
Mirroring
Copies
Disk allocation
Reserving shelf space
10. Accommodating writes
Single writer, multiple readers (SWMR) – big
deal.
Copies & stale data – hmm …
Partitioned writes -> SWMR
Coherency with multiple writers
Reader/Writer locks