SlideShare a Scribd company logo
1 of 29
Megastore  - Providing Scalable, Highly Available J. Baker, C. Bond,  J.C. Corbett, JJ Furman,  A. Khorlin, J. Larson, J-M Léon, Y. Li, A. Lloyd, V. Yushprakh Google Inc. [email_address] May. 2011
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]
Motivation ,[object Object]
Motivation ,[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]
Megastore Overview ,[object Object],[object Object],[object Object],[object Object],[object Object]
Architecture ,[object Object],[object Object],[object Object]
Architecture
Architecture ,[object Object]
Architecture ,[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Schema
Mapping to Bigtable ,[object Object],[object Object],[object Object],[object Object]
Indexes ,[object Object],[object Object],[object Object]
Transactions & Concurrency ,[object Object],[object Object],[object Object]
Transactions & Concurrency ,[object Object],[object Object],[object Object],[object Object]
Transactions & Concurrency ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Transactions & Concurrency ,[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]
Paxos ,[object Object],[object Object]
Reads
Writes
Failure Detection ,[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]
Distribution of Availability
Distribution of Average Latencies
Conclusion ,[object Object],[object Object],[object Object]
Questions?

More Related Content

What's hot

17. Recovery System in DBMS
17. Recovery System in DBMS17. Recovery System in DBMS
17. Recovery System in DBMS
koolkampus
 

What's hot (20)

Running MariaDB in multiple data centers
Running MariaDB in multiple data centersRunning MariaDB in multiple data centers
Running MariaDB in multiple data centers
 
Importance & Principles of Modeling from UML Designing
Importance & Principles of Modeling from UML DesigningImportance & Principles of Modeling from UML Designing
Importance & Principles of Modeling from UML Designing
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
 
17. Recovery System in DBMS
17. Recovery System in DBMS17. Recovery System in DBMS
17. Recovery System in DBMS
 
Ooad
OoadOoad
Ooad
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Grasp patterns and its types
Grasp patterns and its typesGrasp patterns and its types
Grasp patterns and its types
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed Debugging
 
Introducing Saga Pattern in Microservices with Spring Statemachine
Introducing Saga Pattern in Microservices with Spring StatemachineIntroducing Saga Pattern in Microservices with Spring Statemachine
Introducing Saga Pattern in Microservices with Spring Statemachine
 
Processes and threads
Processes and threadsProcesses and threads
Processes and threads
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soni
 
Design Patterns
Design PatternsDesign Patterns
Design Patterns
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 

Viewers also liked

Viewers also liked (10)

Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
MORE Mega Store .........
MORE Mega Store .........MORE Mega Store .........
MORE Mega Store .........
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
Megastore
MegastoreMegastore
Megastore
 
Noha mega store
Noha mega storeNoha mega store
Noha mega store
 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
 
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
 
Spanner
SpannerSpanner
Spanner
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 

Similar to Google Megastore

CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 
Maruthi_YH_resume
Maruthi_YH_resumeMaruthi_YH_resume
Maruthi_YH_resume
Maruthi YH
 

Similar to Google Megastore (20)

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performance
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Working with kubernetes
Working with kubernetesWorking with kubernetes
Working with kubernetes
 
Project seminar
Project seminarProject seminar
Project seminar
 
Pro Techniques for the SSAS MD Developer
Pro Techniques for the SSAS MD DeveloperPro Techniques for the SSAS MD Developer
Pro Techniques for the SSAS MD Developer
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database Options
 
Maruthi_YH_resume
Maruthi_YH_resumeMaruthi_YH_resume
Maruthi_YH_resume
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 

More from bergwolf

More from bergwolf (11)

NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSF
 
Linux aio
Linux aioLinux aio
Linux aio
 
RCU
RCURCU
RCU
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
vmfs intro
vmfs introvmfs intro
vmfs intro
 
pnfs status
pnfs statuspnfs status
pnfs status
 
linux trim
linux trimlinux trim
linux trim
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefs
 
logfs
logfslogfs
logfs
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plans
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Google Megastore

Editor's Notes

  1. 1. Query Local : Query the local replica's coordinator to determine if the entity group is up-to-date locally. 2. Find Position : Determine the highest possibly-committed log position, and select a replica that has applied through that log position. (a) ( Local read ) If step 1 indicates that the local replica is up-to-date, read the highest accepted log position and timestamp from the local replica. (b) ( Majority read ) If the local replica is not up-to-date (or if step 1 or step 2a times out), read from a majority of replicas to nd the maximum log position that any replica has seen, and pick a replica to read from. We select the most responsive or up-to-date replica, not always the local replica. 3. Catchup : As soon as a replica is selected, catch it up to the maximum known log position as follows: (a) For each log position in which the selected replica does not know the consensus value, read the value from another replica. For any log positions without a known-committed value available, invoke Paxos to propose a no-op write. Paxos will drive a majority of replicas to converge on a single value|either the no-op or a previously proposed write. (b) Sequentially apply the consensus value of all unapplied log positions to advance the replica's state to the distributed consensus state.In the event of failure, retry on another replica. 4. Validate : If the local replica was selected and was not previously up-to-date, send the coordinator a validate message asserting that the ( entity group; replica ) pair reflects all committed writes. Do not wait for a reply if the request fails, the next read will retry. 5. Query Data : Read the selected replica using the timestamp of the selected log position. If the selected replica becomes unavailable, pick an alternate replica, perform catchup, and read from it instead. The results of a single large query may be assembled transparently from multiple replicas. Note that in practice 1 and 2a are executed in parallel.
  2. 1. Accept Leader : Ask the leader to accept the value as proposal number zero. If successful, skip to step 3. 2. Prepare : Run the Paxos Prepare phase at all replicas with a higher proposal number than any seen so far at this log position. Replace the value being written with the highest-numbered proposal discovered, if any. 3. Accept : Ask remaining replicas to accept the value. If this fails on a majority of replicas, return to step 2 after a randomized backoff. 4. Invalidate : Invalidate the coordinator at all full replicas that did not accept the value. Fault handling at this step is described in Section 4.7 below. 5. Apply : Apply the value's mutations at as many replicas as possible. If the chosen value differs from that originally proposed, return a conflict error.