SlideShare a Scribd company logo
1 of 43
Download to read offline
distributed systems 
diego souza @ infra-dev
agenda 
● the basics 
● models 
● practical aspects
the basics
the basics 
what is a distributed system? (cont.) 
● a distributed system is a piece of software 
that ensures that a collection of 
independent computers appears to its 
users as a single coherent system;
the basics 
what is a distributed system? (cont.) 
● a distributed system is a software system in 
which components located on networked 
computers communicate and coordinate 
their actions by passing messages;
the basics 
what is a distributed system? 
● a distributed system is one in which the 
failure of a computer you didn't even know 
existed can render your own computer 
unusable [Lamport];
the basics 
fallacies of a distributed system 
1. the network is reliable; 
2. latency is zero; 
3. bandwidth is infinite; 
4. the network is secure; 
5. topology doesn't change; 
6. there is one administrator; 
7. transport cost is zero; 
8. the network is homogeneous;
the basics 
examples: 
● cassandra 
● hadoop 
● www 
● internet 
● etc.
the basics 
why? 
● things no longer fit in a single machine; 
● scalability [size, geographic, organizational]; 
● availability; 
● fault tolerance; 
● performance;
the basics 
scalability 
● is the ability of a system, network, or 
process, to handle a growing amount of 
work in a capable manner or its ability to be 
enlarged to accommodate that growth;
the basics 
performance 
● depends on the context and what we want 
to achieve: 
○ response time/low latency; 
○ throughput; 
○ utilization of computer resources;
the basics 
latency 
● the state of being latent; delay, a period 
between the initiation of something and 
the occurrence; 
● a wise man once said: 
○ Bandwidth is easy. Engineers build bandwidth. But 
latency is hard. Only God gives us latency;
the basics 
availability 
● the proportion of time a system is in a 
functioning condition. If a user cannot 
access the system, it is said to be 
unavailable;
the basics 
fault tolerance 
● ability of a system to behave in a well-defined 
manner once faults occur;
models
models 
availability metrics 
availability = uptime / (uptime + downtime) 
availability = mtbf / (mtbf + mttr) 
mtbf: mean time between failure 
mttr: mean time to repair 
● q: is every second the same?
models 
availability metrics 
yield = successes / requests 
● a: very unlikely!
models 
availability metrics 
harvest = data_available / total_data 
● how incomplete is this [think of 
websearch]?
models 
distributing the dataset 
● partition 
● replication
models 
partition 
● improves performance [reduces dataset]; 
● improves availability [partial failures]; 
● usually application specific [random, time, 
user];
models 
replication 
● improves performance [full copy]; 
● improves availability [full copy, reed-solomon 
codes]; 
○ synchronous, asynchronous; 
○ single copy, multi-master 
○ crdts
models 
replication [strong consistency] 
● primary/copy [eg. mysql master] 
● 2pc [eg. mysql cluster] 
● paxos, zab, raft
models 
replication [weak consistency] 
● amazon dynamo 
○ consistent hashing [partitioning] 
○ partial quorums 
○ failure detection and read repair 
○ gossip protocol 
● note: r + w > n != strong consistency
models 
time 
● global clock [ntp, total order] 
● local clock [partial order] 
● logical clock [partial order; lamport clock, 
vector clocks]
models 
consensus & atomic broadcast 
● consensus: vote & agreement; 
● atomic broadcast: reliable message 
transmission and order guarantees; 
● they are equivalent
models 
flp impossibility 
● does not exist an algorithm for the 
consensus problem in an asynchronous 
system subject to failures, even if messages 
can never be lost, at most one process may 
fail, and it can only fail by crashing 
● note: its not that bad! :)
models
models 
cap: [note: pick only two is misleading] 
● consistency: the same data at the same 
time; 
● availability; 
● partition tolerance: continues to operate 
despite message loss [network or node 
failure];
practical aspects
I find latency one of the most important 
aspects of performance
hard to develop, even hard to operate: they 
are not unbreakable
consensus is a hard problem
failures are the norm
metrics, metrics, metrics
what to do in presence of failures
think about backpressure mechanisms
think about timeouts
feature flag as a deploy mechanism
think hard about scalability
thanks :) 
questions or comments?
appendix
appendix: what we have here 
● cassandra 
● zookeeper 
● ceph 
● etcd 
● consul 
● leela
links 
● http://book.mixu.net/distsys/

More Related Content

What's hot

Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problemRinkuMonani
 
9 fault-tolerance
9 fault-tolerance9 fault-tolerance
9 fault-tolerance4020132038
 
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017Ensar Basri Kahveci
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocolsZongYing Lyu
 
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphoresVaibhav Khanna
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102Linaro
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Real time scheduling - basic concepts
Real time scheduling - basic conceptsReal time scheduling - basic concepts
Real time scheduling - basic conceptsStudent
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systemsAndrea Monacchi
 
Nondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilNondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilracesworkshop
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
 
Unit v-Distributed Transaction and Replication
Unit v-Distributed Transaction and ReplicationUnit v-Distributed Transaction and Replication
Unit v-Distributed Transaction and ReplicationDhivyaa C.R
 
Computer Architecture Seminar
Computer Architecture SeminarComputer Architecture Seminar
Computer Architecture SeminarNaman Kumar
 
Scalable concurrency control in a dynamic membership
Scalable concurrency control  in a dynamic membershipScalable concurrency control  in a dynamic membership
Scalable concurrency control in a dynamic membershipAugusto Ciuffoletti
 
Analysis of mutual exclusion algorithms with the significance and need of ele...
Analysis of mutual exclusion algorithms with the significance and need of ele...Analysis of mutual exclusion algorithms with the significance and need of ele...
Analysis of mutual exclusion algorithms with the significance and need of ele...Govt. P.G. College Dharamshala
 

What's hot (17)

Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problem
 
9 fault-tolerance
9 fault-tolerance9 fault-tolerance
9 fault-tolerance
 
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocols
 
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphores
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
Real time scheduling - basic concepts
Real time scheduling - basic conceptsReal time scheduling - basic concepts
Real time scheduling - basic concepts
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systems
 
Nondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilNondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evil
 
Homework solution1
Homework solution1Homework solution1
Homework solution1
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 
Homework solutionsch9
Homework solutionsch9Homework solutionsch9
Homework solutionsch9
 
Unit v-Distributed Transaction and Replication
Unit v-Distributed Transaction and ReplicationUnit v-Distributed Transaction and Replication
Unit v-Distributed Transaction and Replication
 
Computer Architecture Seminar
Computer Architecture SeminarComputer Architecture Seminar
Computer Architecture Seminar
 
Scalable concurrency control in a dynamic membership
Scalable concurrency control  in a dynamic membershipScalable concurrency control  in a dynamic membership
Scalable concurrency control in a dynamic membership
 
Analysis of mutual exclusion algorithms with the significance and need of ele...
Analysis of mutual exclusion algorithms with the significance and need of ele...Analysis of mutual exclusion algorithms with the significance and need of ele...
Analysis of mutual exclusion algorithms with the significance and need of ele...
 

Viewers also liked

Princípios de Concorrência em Ruby e Além
Princípios de Concorrência em Ruby e AlémPrincípios de Concorrência em Ruby e Além
Princípios de Concorrência em Ruby e AlémLocaweb
 
API Do Email Marketing Locaweb
API Do Email Marketing LocawebAPI Do Email Marketing Locaweb
API Do Email Marketing LocawebLocaweb
 
Overview Sobre Varnish
Overview Sobre VarnishOverview Sobre Varnish
Overview Sobre VarnishLocaweb
 
Isolamento e mvcc
Isolamento e mvccIsolamento e mvcc
Isolamento e mvccLocaweb
 
Ambient Light Events- Wylkon Queiroz
Ambient Light Events- Wylkon QueirozAmbient Light Events- Wylkon Queiroz
Ambient Light Events- Wylkon QueirozLocaweb
 
Comercio eletronico - Dicas práticas
Comercio eletronico - Dicas práticasComercio eletronico - Dicas práticas
Comercio eletronico - Dicas práticasLocaweb
 
Tech talkrubocop
Tech talkrubocopTech talkrubocop
Tech talkrubocopLocaweb
 
Debian no limite - como ter um desktop atualizado
Debian no limite - como ter um desktop atualizadoDebian no limite - como ter um desktop atualizado
Debian no limite - como ter um desktop atualizadoClaudio Ferreira Filho
 
Soluções para sua empresa vender na Internet
Soluções para sua empresa vender na InternetSoluções para sua empresa vender na Internet
Soluções para sua empresa vender na InternetLocaweb
 
Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)
Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)
Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)Dickson S. Guedes
 
Dojo PHP (treinanto programação orientada a objetos em PHP)
Dojo PHP (treinanto programação orientada a objetos em PHP)Dojo PHP (treinanto programação orientada a objetos em PHP)
Dojo PHP (treinanto programação orientada a objetos em PHP)Fabrízio Mello
 
Como encontrar uma agulha num palheiro de logs
Como encontrar uma agulha num palheiro de logsComo encontrar uma agulha num palheiro de logs
Como encontrar uma agulha num palheiro de logsDickson S. Guedes
 
Celery for SysAdmins
Celery for SysAdminsCelery for SysAdmins
Celery for SysAdminsLocaweb
 
Postgres Wonderland - Campus Party 2013
Postgres Wonderland - Campus Party 2013Postgres Wonderland - Campus Party 2013
Postgres Wonderland - Campus Party 2013Fabio Telles Rodriguez
 

Viewers also liked (20)

Princípios de Concorrência em Ruby e Além
Princípios de Concorrência em Ruby e AlémPrincípios de Concorrência em Ruby e Além
Princípios de Concorrência em Ruby e Além
 
API Do Email Marketing Locaweb
API Do Email Marketing LocawebAPI Do Email Marketing Locaweb
API Do Email Marketing Locaweb
 
Overview Sobre Varnish
Overview Sobre VarnishOverview Sobre Varnish
Overview Sobre Varnish
 
Isolamento e mvcc
Isolamento e mvccIsolamento e mvcc
Isolamento e mvcc
 
Storage em Oracle RAC
Storage em Oracle RACStorage em Oracle RAC
Storage em Oracle RAC
 
Postgres Big data
Postgres Big dataPostgres Big data
Postgres Big data
 
Ambient Light Events- Wylkon Queiroz
Ambient Light Events- Wylkon QueirozAmbient Light Events- Wylkon Queiroz
Ambient Light Events- Wylkon Queiroz
 
Comercio eletronico - Dicas práticas
Comercio eletronico - Dicas práticasComercio eletronico - Dicas práticas
Comercio eletronico - Dicas práticas
 
Tech talkrubocop
Tech talkrubocopTech talkrubocop
Tech talkrubocop
 
Debian no limite - como ter um desktop atualizado
Debian no limite - como ter um desktop atualizadoDebian no limite - como ter um desktop atualizado
Debian no limite - como ter um desktop atualizado
 
Soluções para sua empresa vender na Internet
Soluções para sua empresa vender na InternetSoluções para sua empresa vender na Internet
Soluções para sua empresa vender na Internet
 
Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)
Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)
Estripando o Elefante - (Trabalhando com extensões no PostgreSQL)
 
Dojo PHP (treinanto programação orientada a objetos em PHP)
Dojo PHP (treinanto programação orientada a objetos em PHP)Dojo PHP (treinanto programação orientada a objetos em PHP)
Dojo PHP (treinanto programação orientada a objetos em PHP)
 
Postgres Wonderland - PGDay CE2013
Postgres  Wonderland - PGDay CE2013Postgres  Wonderland - PGDay CE2013
Postgres Wonderland - PGDay CE2013
 
Como encontrar uma agulha num palheiro de logs
Como encontrar uma agulha num palheiro de logsComo encontrar uma agulha num palheiro de logs
Como encontrar uma agulha num palheiro de logs
 
Revisão do postgresql.conf
Revisão do postgresql.confRevisão do postgresql.conf
Revisão do postgresql.conf
 
Celery for SysAdmins
Celery for SysAdminsCelery for SysAdmins
Celery for SysAdmins
 
Postgres Wonderland - Campus Party 2013
Postgres Wonderland - Campus Party 2013Postgres Wonderland - Campus Party 2013
Postgres Wonderland - Campus Party 2013
 
Freenas
FreenasFreenas
Freenas
 
Trabalhando com Logs no PostgreSQL
Trabalhando com Logs no PostgreSQLTrabalhando com Logs no PostgreSQL
Trabalhando com Logs no PostgreSQL
 

Similar to Sistemas Distribuidos

02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdfRobeliaJoyVillaruz
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSMaurvi04
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Rogue Wave Software
 
Cloud Computing-UNIT 1 claud computing basics
Cloud Computing-UNIT 1 claud computing basicsCloud Computing-UNIT 1 claud computing basics
Cloud Computing-UNIT 1 claud computing basicsmoeincanada007
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
 
Types of Operating System
Types of Operating SystemTypes of Operating System
Types of Operating SystemTHEFPS
 
introduction to cloud computing for college.pdf
introduction to cloud computing for college.pdfintroduction to cloud computing for college.pdf
introduction to cloud computing for college.pdfsnehan789
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With GatlingKnoldus Inc.
 

Similar to Sistemas Distribuidos (20)

02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Realtime
RealtimeRealtime
Realtime
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Simplifying debugging for multi-core Linux devices and low-power Linux clusters
 
Introduction
IntroductionIntroduction
Introduction
 
Cloud Computing-UNIT 1 claud computing basics
Cloud Computing-UNIT 1 claud computing basicsCloud Computing-UNIT 1 claud computing basics
Cloud Computing-UNIT 1 claud computing basics
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Types of Operating System
Types of Operating SystemTypes of Operating System
Types of Operating System
 
Monolith to microservices journey
Monolith to microservices journeyMonolith to microservices journey
Monolith to microservices journey
 
introduction to cloud computing for college.pdf
introduction to cloud computing for college.pdfintroduction to cloud computing for college.pdf
introduction to cloud computing for college.pdf
 
Operating System
Operating SystemOperating System
Operating System
 
distributed system original.pdf
distributed system original.pdfdistributed system original.pdf
distributed system original.pdf
 
Gatling
Gatling Gatling
Gatling
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
 

More from Locaweb

Random testing
Random testingRandom testing
Random testingLocaweb
 
AngularJS
AngularJSAngularJS
AngularJSLocaweb
 
Testes utilizando cucumber + PhantomJs
Testes utilizando cucumber + PhantomJsTestes utilizando cucumber + PhantomJs
Testes utilizando cucumber + PhantomJsLocaweb
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talkLocaweb
 
Uso de estatísticas pelo postgre sql
Uso de estatísticas pelo postgre sqlUso de estatísticas pelo postgre sql
Uso de estatísticas pelo postgre sqlLocaweb
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespacesLocaweb
 

More from Locaweb (6)

Random testing
Random testingRandom testing
Random testing
 
AngularJS
AngularJSAngularJS
AngularJS
 
Testes utilizando cucumber + PhantomJs
Testes utilizando cucumber + PhantomJsTestes utilizando cucumber + PhantomJs
Testes utilizando cucumber + PhantomJs
 
Lua tech talk
Lua tech talkLua tech talk
Lua tech talk
 
Uso de estatísticas pelo postgre sql
Uso de estatísticas pelo postgre sqlUso de estatísticas pelo postgre sql
Uso de estatísticas pelo postgre sql
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Sistemas Distribuidos

  • 1. distributed systems diego souza @ infra-dev
  • 2. agenda ● the basics ● models ● practical aspects
  • 4. the basics what is a distributed system? (cont.) ● a distributed system is a piece of software that ensures that a collection of independent computers appears to its users as a single coherent system;
  • 5. the basics what is a distributed system? (cont.) ● a distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages;
  • 6. the basics what is a distributed system? ● a distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable [Lamport];
  • 7. the basics fallacies of a distributed system 1. the network is reliable; 2. latency is zero; 3. bandwidth is infinite; 4. the network is secure; 5. topology doesn't change; 6. there is one administrator; 7. transport cost is zero; 8. the network is homogeneous;
  • 8. the basics examples: ● cassandra ● hadoop ● www ● internet ● etc.
  • 9. the basics why? ● things no longer fit in a single machine; ● scalability [size, geographic, organizational]; ● availability; ● fault tolerance; ● performance;
  • 10. the basics scalability ● is the ability of a system, network, or process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth;
  • 11. the basics performance ● depends on the context and what we want to achieve: ○ response time/low latency; ○ throughput; ○ utilization of computer resources;
  • 12. the basics latency ● the state of being latent; delay, a period between the initiation of something and the occurrence; ● a wise man once said: ○ Bandwidth is easy. Engineers build bandwidth. But latency is hard. Only God gives us latency;
  • 13. the basics availability ● the proportion of time a system is in a functioning condition. If a user cannot access the system, it is said to be unavailable;
  • 14. the basics fault tolerance ● ability of a system to behave in a well-defined manner once faults occur;
  • 16. models availability metrics availability = uptime / (uptime + downtime) availability = mtbf / (mtbf + mttr) mtbf: mean time between failure mttr: mean time to repair ● q: is every second the same?
  • 17. models availability metrics yield = successes / requests ● a: very unlikely!
  • 18. models availability metrics harvest = data_available / total_data ● how incomplete is this [think of websearch]?
  • 19. models distributing the dataset ● partition ● replication
  • 20. models partition ● improves performance [reduces dataset]; ● improves availability [partial failures]; ● usually application specific [random, time, user];
  • 21. models replication ● improves performance [full copy]; ● improves availability [full copy, reed-solomon codes]; ○ synchronous, asynchronous; ○ single copy, multi-master ○ crdts
  • 22. models replication [strong consistency] ● primary/copy [eg. mysql master] ● 2pc [eg. mysql cluster] ● paxos, zab, raft
  • 23. models replication [weak consistency] ● amazon dynamo ○ consistent hashing [partitioning] ○ partial quorums ○ failure detection and read repair ○ gossip protocol ● note: r + w > n != strong consistency
  • 24. models time ● global clock [ntp, total order] ● local clock [partial order] ● logical clock [partial order; lamport clock, vector clocks]
  • 25. models consensus & atomic broadcast ● consensus: vote & agreement; ● atomic broadcast: reliable message transmission and order guarantees; ● they are equivalent
  • 26. models flp impossibility ● does not exist an algorithm for the consensus problem in an asynchronous system subject to failures, even if messages can never be lost, at most one process may fail, and it can only fail by crashing ● note: its not that bad! :)
  • 28. models cap: [note: pick only two is misleading] ● consistency: the same data at the same time; ● availability; ● partition tolerance: continues to operate despite message loss [network or node failure];
  • 30. I find latency one of the most important aspects of performance
  • 31. hard to develop, even hard to operate: they are not unbreakable
  • 32. consensus is a hard problem
  • 35. what to do in presence of failures
  • 38. feature flag as a deploy mechanism
  • 39. think hard about scalability
  • 40. thanks :) questions or comments?
  • 42. appendix: what we have here ● cassandra ● zookeeper ● ceph ● etcd ● consul ● leela