SlideShare a Scribd company logo
1 of 42
Download to read offline
Distributed Systems

 scalability and high availability




Renato Lucindo - lucindo.github.com - @rlucindo
Renato Lucindo

 Call me Lucindo (or Linus)
 2002 - Bachelor Computer Science
 2007 - M.Sc. Computer Science (Combinatorial
 Optimization)
 7+ year developing Distributed Systems




 My default answer: "I don't know."
Agenda


 Scalability

 High Availability

 Problems

 Tips and Tricks

 Learning More
Distributed Systems

  Multiple computers that interact with each other over a
  network to achieve a common goal
  Purpose
     Scalability
     High availability




                                     source: http://www.cnds.jhu.edu/
Scalability

  System ability to handle gracefully a growing amount of
  work

  Scale up (vertical)
     Add resources to a single node
     Improve existing code to handle more work

  Scale out (horizontal)
     Add more nodes to a system
     Linear (or better) scalability
Scalability - Vertical

  Add: CPU, Memory, Disks (bigger box)
  Handling more simultaneous:
     Connections
     Operations
     Users
  Choose a good I/O and concurrency model
     Non-blocking I/O
     Asynchronous I/O
     Threads (single, pool, per-connection)
     Event handling patterns (Reactor, Proactor, ...)
  Memory model?
     STM
Scalability - Vertical

  Careful with numbers
      Requests per second
      # of Connections
      Simultaneous operations
  Event handling
      Think front-end
      Slow connections/clients
      It's slower than other options
  In doubt, go async
  Back-end
      Thread pool (thread per-connection)
      No events
      Process per-core
Scalability - Horizontal

  Add nodes to handle more work
  Front-end
     Straightforward
     Stateless
  Back-end
     Master/Slave(s)
     Partitioning
         DHT
         Volatile Index
Scalability - Horizontal

  Master/Slave
  Write on single Master
  Read on Slaves (one or more)
  Scales reads
Scalability - Horizontal

  Partitioning (Sharding)
     Distribute dada across nodes
  Generally involves data de-normalization
  Where is some specific data?
     Master Index
     Hash (DTH, Consistent Hashing)
     Volatile Index
  Joins done in application level
  NoSQL friendly
Scalability - Horizontal

  Volatile Index: build and maintain data index as cached
  information (all clients)
High Availability

            "Processes, as well as people, die"


  Handle hardware and software failures
      Eliminate single point of failure
  Redundancy
  Failover
  Replicas
High Availability - Failover/Redundancy
High Availability - Replicas

  Two or more copies of same data
  Replica granularity
     From node replica to "row" replica
  Load balancing
  Write concurrency
  Replica updates
  Key for high availability and root of several problems
Problems
Problems - CAP Theorem
Problems - CAP Theorem

 Consistency: all operations (reads/writes) yield a global
 consistent state

 Availability: all requests (on non-failed servers) must have
 a response

 Partition Tolerance: nodes may not be able
 to communicate with each other.



                     Pick Two
Problems - CAP Theorem

 C + A: network problems might stop the system

 Examples:
    Oracle RAC, IBM DB2 Parallel
    RDBMS (Master/Slave)
    Google File System
    HDFS (Hadoop)
Problems - CAP Theorem

 C + P: clients can't always perform operations

 Examples:
    Distributed lock-systems: Chubby, ZooKeeper
    Paxos protocol (consensus)
    BigTable, Hbase
    Hypertable
    MongoDB
Problems - CAP Theorem

 A + P: clients may read inconsistent (old or undone) data

 Examples:
    Amazon Dynamo
    Cassandra
    Voldemort
    CouchDB
    Riak
    Caches
Problem with CAP Theorem

 In practice, C + A and C + P systems are the same.
     C + A: not tolerant of network partitions
     C + P: not available when a network partition occurs
 Big problem: network partition
     Not so big (how often does it happens?)
 Pick two
     Availability
     Consistency
 The forgotten: Latency
     Or, how long the system waits before considering a
     partitioned network?
Problems - Real World

Every component may fail:
   Network failure
   Hardware failure
   Electricity
   Natural disasters
   Code failure
Tips & Tricks
Tips & Tricks - Pyramid

  Capacity (connections, operations, ...) Pyramid
Tips & Tricks - Reply Fast

  FAIL Fast
  Break complex requests into smaller ones
  Use timeouts
  No transactions
  Be aware that a single slow operation or component can
  generate contention
  Self-denial attack
Tips & Tricks - Cache

  Cache: component location, data, dns lookups, previous
  requests, etc
  Use negative cache for failed requests (low expiration)
  Don't rely on cache
  Your system must work with no cache
Tips & Tricks - Queues

  Easy way to add asynchronous processing an decouple
  your system.
Tips & Tricks - DNS
Tips & Tricks - Logs

  Log everything
  Use several log levels
  On every log message
       User
       Request host
       Component involved
       Version
       Filename and line
  If log level not enabled do not process log message
       Avoid lookup calls (gettimeofday)
Tips & Tricks - Domino Effect

  Make sure your load balancer won't overload components
  User smart algorithms
     Load Balance
     Resource Allocation
Tips & Tricks - (Zero) Configuration

  No configuration files
  Use good defaults
  Auto-discovery (multicast, gossip, ...)
  Make everything configurable
     Administrative command
     No need to stop for changes
  Automatic self adjusts when possible
Tips & Tricks - STOP Test

  With your system under load: kill -STOP <component>
Tips & Tricks - Know your tools

  load average (uptime)
  stats tools
      vmstat
      iostat
      mpstat
      tcpstat, tcprstat, etc
  tcpdump, nc, netstat
  tunning
      /proc/net/*
      ulimit
      sysctl
  oprofile
  debuging tools (gdb, valgrind)
  ...
Tips & Tricks - Count

  Count everything
     Connections
     Operations
     Failures
     Successes
     Request times (granularity)
  Total, average, standard deviation
  Monitor counters
Tips & Tricks - Stability Patterns

  Use Timeouts
  Circuit Breaker
  Bulkheads
  Steady State
  Fail Fast
  Handshaking
  Test Harness
  Decoupling Middleware
Tips & Tricks - Don't Panic!
Learning More - Books

TCP/IP Illustrated, Vol. 1: The Protocols
Learning More - Books

Unix Network Programming, Vol. 1: The Sockets Networking
Learning More - Books

Pattern Oriented Software Architecture, Vol. 2
Learning More - Books

Release It!
Learning More - Papers

 The Google File System
 Bigtable: A Distributed Storage System for Structured Data
 Dynamo: Amazon's Highly Available Key-Value Store
 PNUTS: Yahoo!’s Hosted Data Serving Platform
 MapReduce: Simplified Data Processing on Large Clusters

 Towards robust distributed systems
 Brewer's conjecture and the feasibility of consistent,
 available, partition-tolerant web services
 BASE: An Acid Alternative
 Looking up data in P2P systems
Thanks!!! Questions?

lucindo.github.com - @rlucindo

More Related Content

What's hot

Introduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsIntroduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsMrMaKKaWi
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed SystemsRupsee
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)Sri Prasanna
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecturenupurmakhija1211
 
Virtualization and cloud Computing
Virtualization and cloud ComputingVirtualization and cloud Computing
Virtualization and cloud ComputingRishikese MR
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemudaya khanal
 
Communication in Distributed Systems
Communication in Distributed SystemsCommunication in Distributed Systems
Communication in Distributed SystemsDilum Bandara
 
Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemorySHIKHA GAUTAM
 
1. Overview of Distributed Systems
1. Overview of Distributed Systems1. Overview of Distributed Systems
1. Overview of Distributed SystemsDaminda Herath
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency modelspalani kumar
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMvwchu
 

What's hot (20)

Introduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsIntroduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer Systems
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecture
 
Fundamental Cloud Architectures
Fundamental Cloud ArchitecturesFundamental Cloud Architectures
Fundamental Cloud Architectures
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Virtualization and cloud Computing
Virtualization and cloud ComputingVirtualization and cloud Computing
Virtualization and cloud Computing
 
Distributed Coordination-Based Systems
Distributed Coordination-Based SystemsDistributed Coordination-Based Systems
Distributed Coordination-Based Systems
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Communication in Distributed Systems
Communication in Distributed SystemsCommunication in Distributed Systems
Communication in Distributed Systems
 
Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared Memory
 
11. dfs
11. dfs11. dfs
11. dfs
 
1. Overview of Distributed Systems
1. Overview of Distributed Systems1. Overview of Distributed Systems
1. Overview of Distributed Systems
 
Cloud Computing Using OpenStack
Cloud Computing Using OpenStack Cloud Computing Using OpenStack
Cloud Computing Using OpenStack
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 

Similar to Distributed Systems: scalability and high availability

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Stuart Charlton
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReducecoolmirza143
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageNilesh Salpe
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCCal Henderson
 

Similar to Distributed Systems: scalability and high availability (20)

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009
 
test
testtest
test
 
HeartBeat
HeartBeatHeartBeat
HeartBeat
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReduce
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Distributed Systems: scalability and high availability

  • 1. Distributed Systems scalability and high availability Renato Lucindo - lucindo.github.com - @rlucindo
  • 2. Renato Lucindo Call me Lucindo (or Linus) 2002 - Bachelor Computer Science 2007 - M.Sc. Computer Science (Combinatorial Optimization) 7+ year developing Distributed Systems My default answer: "I don't know."
  • 3. Agenda Scalability High Availability Problems Tips and Tricks Learning More
  • 4. Distributed Systems Multiple computers that interact with each other over a network to achieve a common goal Purpose Scalability High availability source: http://www.cnds.jhu.edu/
  • 5. Scalability System ability to handle gracefully a growing amount of work Scale up (vertical) Add resources to a single node Improve existing code to handle more work Scale out (horizontal) Add more nodes to a system Linear (or better) scalability
  • 6. Scalability - Vertical Add: CPU, Memory, Disks (bigger box) Handling more simultaneous: Connections Operations Users Choose a good I/O and concurrency model Non-blocking I/O Asynchronous I/O Threads (single, pool, per-connection) Event handling patterns (Reactor, Proactor, ...) Memory model? STM
  • 7. Scalability - Vertical Careful with numbers Requests per second # of Connections Simultaneous operations Event handling Think front-end Slow connections/clients It's slower than other options In doubt, go async Back-end Thread pool (thread per-connection) No events Process per-core
  • 8. Scalability - Horizontal Add nodes to handle more work Front-end Straightforward Stateless Back-end Master/Slave(s) Partitioning DHT Volatile Index
  • 9. Scalability - Horizontal Master/Slave Write on single Master Read on Slaves (one or more) Scales reads
  • 10. Scalability - Horizontal Partitioning (Sharding) Distribute dada across nodes Generally involves data de-normalization Where is some specific data? Master Index Hash (DTH, Consistent Hashing) Volatile Index Joins done in application level NoSQL friendly
  • 11. Scalability - Horizontal Volatile Index: build and maintain data index as cached information (all clients)
  • 12. High Availability "Processes, as well as people, die" Handle hardware and software failures Eliminate single point of failure Redundancy Failover Replicas
  • 13. High Availability - Failover/Redundancy
  • 14. High Availability - Replicas Two or more copies of same data Replica granularity From node replica to "row" replica Load balancing Write concurrency Replica updates Key for high availability and root of several problems
  • 16. Problems - CAP Theorem
  • 17. Problems - CAP Theorem Consistency: all operations (reads/writes) yield a global consistent state Availability: all requests (on non-failed servers) must have a response Partition Tolerance: nodes may not be able to communicate with each other. Pick Two
  • 18. Problems - CAP Theorem C + A: network problems might stop the system Examples: Oracle RAC, IBM DB2 Parallel RDBMS (Master/Slave) Google File System HDFS (Hadoop)
  • 19. Problems - CAP Theorem C + P: clients can't always perform operations Examples: Distributed lock-systems: Chubby, ZooKeeper Paxos protocol (consensus) BigTable, Hbase Hypertable MongoDB
  • 20. Problems - CAP Theorem A + P: clients may read inconsistent (old or undone) data Examples: Amazon Dynamo Cassandra Voldemort CouchDB Riak Caches
  • 21. Problem with CAP Theorem In practice, C + A and C + P systems are the same. C + A: not tolerant of network partitions C + P: not available when a network partition occurs Big problem: network partition Not so big (how often does it happens?) Pick two Availability Consistency The forgotten: Latency Or, how long the system waits before considering a partitioned network?
  • 22. Problems - Real World Every component may fail: Network failure Hardware failure Electricity Natural disasters Code failure
  • 24. Tips & Tricks - Pyramid Capacity (connections, operations, ...) Pyramid
  • 25. Tips & Tricks - Reply Fast FAIL Fast Break complex requests into smaller ones Use timeouts No transactions Be aware that a single slow operation or component can generate contention Self-denial attack
  • 26. Tips & Tricks - Cache Cache: component location, data, dns lookups, previous requests, etc Use negative cache for failed requests (low expiration) Don't rely on cache Your system must work with no cache
  • 27. Tips & Tricks - Queues Easy way to add asynchronous processing an decouple your system.
  • 28. Tips & Tricks - DNS
  • 29. Tips & Tricks - Logs Log everything Use several log levels On every log message User Request host Component involved Version Filename and line If log level not enabled do not process log message Avoid lookup calls (gettimeofday)
  • 30. Tips & Tricks - Domino Effect Make sure your load balancer won't overload components User smart algorithms Load Balance Resource Allocation
  • 31. Tips & Tricks - (Zero) Configuration No configuration files Use good defaults Auto-discovery (multicast, gossip, ...) Make everything configurable Administrative command No need to stop for changes Automatic self adjusts when possible
  • 32. Tips & Tricks - STOP Test With your system under load: kill -STOP <component>
  • 33. Tips & Tricks - Know your tools load average (uptime) stats tools vmstat iostat mpstat tcpstat, tcprstat, etc tcpdump, nc, netstat tunning /proc/net/* ulimit sysctl oprofile debuging tools (gdb, valgrind) ...
  • 34. Tips & Tricks - Count Count everything Connections Operations Failures Successes Request times (granularity) Total, average, standard deviation Monitor counters
  • 35. Tips & Tricks - Stability Patterns Use Timeouts Circuit Breaker Bulkheads Steady State Fail Fast Handshaking Test Harness Decoupling Middleware
  • 36. Tips & Tricks - Don't Panic!
  • 37. Learning More - Books TCP/IP Illustrated, Vol. 1: The Protocols
  • 38. Learning More - Books Unix Network Programming, Vol. 1: The Sockets Networking
  • 39. Learning More - Books Pattern Oriented Software Architecture, Vol. 2
  • 40. Learning More - Books Release It!
  • 41. Learning More - Papers The Google File System Bigtable: A Distributed Storage System for Structured Data Dynamo: Amazon's Highly Available Key-Value Store PNUTS: Yahoo!’s Hosted Data Serving Platform MapReduce: Simplified Data Processing on Large Clusters Towards robust distributed systems Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services BASE: An Acid Alternative Looking up data in P2P systems