Diego Souza fala sobre sistemas distribuídos mostradando uma introdução sobre os conceitos básicos e algumas considerações práticas que podem afetar o nosso dia a dia.
Assista esta palestra em https://www.eventials.com/locaweb/sistemas-distribuidos/
4. the basics
what is a distributed system? (cont.)
● a distributed system is a piece of software
that ensures that a collection of
independent computers appears to its
users as a single coherent system;
5. the basics
what is a distributed system? (cont.)
● a distributed system is a software system in
which components located on networked
computers communicate and coordinate
their actions by passing messages;
6. the basics
what is a distributed system?
● a distributed system is one in which the
failure of a computer you didn't even know
existed can render your own computer
unusable [Lamport];
7. the basics
fallacies of a distributed system
1. the network is reliable;
2. latency is zero;
3. bandwidth is infinite;
4. the network is secure;
5. topology doesn't change;
6. there is one administrator;
7. transport cost is zero;
8. the network is homogeneous;
9. the basics
why?
● things no longer fit in a single machine;
● scalability [size, geographic, organizational];
● availability;
● fault tolerance;
● performance;
10. the basics
scalability
● is the ability of a system, network, or
process, to handle a growing amount of
work in a capable manner or its ability to be
enlarged to accommodate that growth;
11. the basics
performance
● depends on the context and what we want
to achieve:
○ response time/low latency;
○ throughput;
○ utilization of computer resources;
12. the basics
latency
● the state of being latent; delay, a period
between the initiation of something and
the occurrence;
● a wise man once said:
○ Bandwidth is easy. Engineers build bandwidth. But
latency is hard. Only God gives us latency;
13. the basics
availability
● the proportion of time a system is in a
functioning condition. If a user cannot
access the system, it is said to be
unavailable;
14. the basics
fault tolerance
● ability of a system to behave in a well-defined
manner once faults occur;
16. models
availability metrics
availability = uptime / (uptime + downtime)
availability = mtbf / (mtbf + mttr)
mtbf: mean time between failure
mttr: mean time to repair
● q: is every second the same?
22. models
replication [strong consistency]
● primary/copy [eg. mysql master]
● 2pc [eg. mysql cluster]
● paxos, zab, raft
23. models
replication [weak consistency]
● amazon dynamo
○ consistent hashing [partitioning]
○ partial quorums
○ failure detection and read repair
○ gossip protocol
● note: r + w > n != strong consistency
24. models
time
● global clock [ntp, total order]
● local clock [partial order]
● logical clock [partial order; lamport clock,
vector clocks]
25. models
consensus & atomic broadcast
● consensus: vote & agreement;
● atomic broadcast: reliable message
transmission and order guarantees;
● they are equivalent
26. models
flp impossibility
● does not exist an algorithm for the
consensus problem in an asynchronous
system subject to failures, even if messages
can never be lost, at most one process may
fail, and it can only fail by crashing
● note: its not that bad! :)
28. models
cap: [note: pick only two is misleading]
● consistency: the same data at the same
time;
● availability;
● partition tolerance: continues to operate
despite message loss [network or node
failure];