This document discusses the author's experience reading computer science research papers as an industry practitioner in distributed systems. It covers several topics from papers, including staged event-driven architectures, the use of leases for distributed locking and heartbeats, and designing systems to allow some inaccuracy for improved performance and resilience. The author advocates using insights from academic research to inform design decisions and mitigate technical risks when building large-scale, robust systems in industry.
50. Leases
● Distributed locking
● Lease term tradeoffs
○ short vs long
● Use of leases in modern applications
○ Leader election TTL (in etcd)
51. Leases
● Distributed locking
● Lease term tradeoffs
○ short vs long
● Use of leases in modern applications
○ Leader election TTL (in etcd)
○ Liveness detection
102. [Designing with]
Inaccuracy for Performance & Resilience
simplified implementation
focus on observability
applicable to
some problem
domains
103. [Designing with]
Inaccuracy for Performance & Resilience
fuzz testing
generative testing
simplified implementation
fault injection testing
focus on observability
applicable to
some problem
domains
104. References
● T. Wurthinger, C. Wimmer et al. "One VM to Rule Them
All"
● M. Rinard "Probabilistic Accuracy Bounds for Fault-
Tolerant Computations that Discard Tasks"
● F. Corbato, M. Daggett, R. Daley "An Experimental Time-
Sharing System"
● E. Dijkstra "Cooperating Sequential Processes"
● L. Lamport "Time, Clocks, and the Ordering of Events in a
Distributed System"
● http://blinkdb.org/
105. References
● B. Oki, B. Liskov "Viewstamped Replication: A New Primary Copy
Method to Support Highly-Available Distributed Systems"
● L. Lamport "The Part-Time Parliament"
● M. Welsh, D. Culler, E. Brewer "SEDA: An Architecture for Well-
Conditioned, Scalable Internet Services"
● C. Gray, D. Cheriton "Leases: An Efficient Fault-Tolerant
Mechanism for Distributed File Cache Consistency"
● S. Agarwal, B. Mozafari et al. "BlinkDB: Queries with Bounded
Errors and Bounded Response Times on Very Large Data"
109. Robust & scalable pipelines
Leases for sharing &
heartbeat
Inaccuracy for resilience &
performance
110. Robust & scalable pipelines
Leases for sharing &
heartbeat
Inaccuracy for resilience &
performance
CS research is timeless:
use it to mitigate risk