SlideShare ist ein Scribd-Unternehmen logo
1 von 28
PNUTS: Yahoo!’s Hosted
 Data Serving Platform
           Brian F. Cooper, Raghu
 Ramakrishnan, Utkarsh Srivastava, Adam
   Silberstein, Philip Bohannon, HansArno
  Jacobsen,Nick Puz, Daniel Weaver and
               Ramana Yerneni
               Yahoo! Research
                                            1
Motivation
• Web applications need:
  o Scalability
    -architectural scalability, scale linearly
  o Geographic scope
    -data replicas on multiple continents
  o High availability
    -failures, apps will still be able to read data
  o Relaxed consistency needs
    -Tolerate stale or reordered data


                                                      2
Relaxed Consistency
• Not strictly consistency
• Very expensive.



• Not eventually consistency
• Ex: a photo sharing application
• U1: Remove someone from the list of people who
  can view his photos
• U2: Post spring-break photos


                                                   3
What is PNUTS?
• PNUTS, a massively parallel and geographically
  distributed database system for Yahoo!’s web
  applications.



• An architecture based on record-
  level, asynchronous geographic replication, and
  use of a guaranteed message-delivery service
  rather than a persistent log.



                                                    4
System architecture




                      5
System architecture
• Storage Units
• Store several hundreds of tablets, a tablet usually several
  hundreds of megabytes.
• Routers
• The router stores an interval mapping, which defines the
  boundaries of each tablet, and also maps each tablet
  to a storage unit.
• Tablet Controller
• Routers contain only a cached copy of the interval
  mapping. The mapping is owned by the tablet controller
• YMB- Yahoo Message Broker
• topic-based pub/sub system


                                                                6
Yahoo Message Broker
• Distributed publish-subscribe service.

• Guarantees delivery once a message is
  published.

• Asynchronously assigned to different regions
  and applied to their replicas.




                                            7
Types of Table




                 8
Tablet splitting and balancing
     Each storage unit has many tablets (horizontal partitions of the table)
                        Storage unit may become a hotspot


Storage unit
                                                                     Tablet




         Overfull tablets split             Tablets may grow over time

                  Shed load by moving tablets to other servers
                                                                               9
Query processing



                   10
Accessing data

         4                1
         Record for key k Get key k




                                      2
                   3
                   Record for key k   Get key k




    SU                SU               SU

                                                  11
Bulk read
               1
             {k1, k2, … kn}




     Get k
         1
                                      2
                  Get k
                      2
                              Get k
                                  3




                                          Scatter/
    SU       SU                 SU        gather
                                          engine

                                            12
Per-record timeline
consistency
• all replicas of a given record apply all updates to
  the record in the same order.




                                                        13
Per-record timeline
consistency


•   An example sequence of updates to a record
•   3 events: insert, update and delete.
•   One replica assigned as the master
•   Generation: new insert Version: each update


                                                  14
Consistency model
 • Goal: make it easier for applications to reason about
   updates and cope with asynchrony

 • web applications typically manipulate one record at a
   time

 Record Update                  Update Update   Update    Update Update          Delete
                    Update
 inserted


     v. 1    v. 2        v. 3      v. 4    v. 5        v. 6   v. 7        v. 8
                                          Generation 1                                     Time




                                                                                          15
Consistency model
                                                       Read-any




                        Stale version          Stale version             Current
                                                                         version


       v. 1   v. 2   v. 3   v. 4    v. 5        v. 6     v. 7     v. 8
                                   Generation 1                                     Time



Read-any: Returns a possibly stale version of the record.



                                                                                   16
Consistency model
                                                Read latest




                        Stale version          Stale version          Current
                                                                      version


       v. 1   v. 2   v. 3   v. 4    v. 5        v. 6   v. 7    v. 8
                                   Generation 1                                  Time



Read latest: Returns the latest copy of the record that
reflects all writes that have succeeded.


                                                                                17
Consistency model
Read-critical(required version):                    Read ≥ v.6




                            Stale version          Stale version          Current
                                                                          version


           v. 1   v. 2   v. 3   v. 4    v. 5        v. 6   v. 7    v. 8
                                       Generation 1                                  Time



  Read critical: Returns a version of the record that is
  strictly newer than, or the same as the required version.


                                                                                    18
Consistency model
Test-and-set-write(required version)                 Write if = v.7

                                                                             ERROR


                              Stale version          Stale version            Current
                                                                              version


             v. 1   v. 2   v. 3   v. 4    v. 5        v. 6   v. 7     v. 8
                                         Generation 1                                    Time



    This call performs the requested write to the record if
    and only if the present version of the record is the same
    as required version

                                                                                        19
Consistency model
                                           Write if = v.7

                                                                   ERROR


                    Stale version          Stale version            Current
                                                                    version


 Mechanism: per record mastership
   v. 1   v. 2   v. 3   v. 4    v. 5
                               Generation 1
                                            v. 6   v. 7     v. 8
                                                                               Time




                                                                              20
Consistency levels
   • Eventual consistency
       o Transactions:
           • Alice changes status from “Sleeping” to “Awake”
           • Alice changes location from “Home” to “Work”




           (Alice, Home, Sleeping) (Alice, Home, Awake)               (Alice, Work, Awake)
Region 1
                                                            Awake
                       Awake                   Work


                                                                               Final state consistent

                                                            Work
           (Alice, Home, Sleeping)       (Alice, Work, Sleeping)      (Alice, Work, Awake)
Region 2
                                                          “Invalid” state visible
Consistency levels
   • Timeline consistency
       o Transactions:
           • Alice changes status from “Sleeping” to “Awake”
           • Alice changes location from “Home” to “Work”




           (Alice, Home, Sleeping)   (Alice, Home, Awake)     (Alice, Work, Awake)
Region 1
                             Awake                  Work
                                                                                (Alice, Work, Awake)

                                                           Work



           (Alice, Home, Sleeping)                                        (Alice, Work, Awake)
Region 2
Experiments



              23
Experimental setup
• Production PNUTS code
  o Enhanced with ordered table type

• Three PNUTS regions
  o 2 west coast, 1 east coast
  o 5 storage units, 2 message brokers, 1 router

• Workload parameters
  o Request rate: 1200-3600 requests/second
  o Read: write mix ratio:0-50% writes
  o Locality:80%



                                                   24
Inserts
• Inserts
   o required 75.6 ms per insert in West 1 (tablet
     master)
   o 131.5 ms per insert into the non-master West
     2, and
   o 315.5 ms per insert into the non-master East.

   o These results show the expected effect that the
     cost of inserting is significantly higher if the insert
     is initiated in a non-master region that is far away
     from the tablet master.

                                                         25
10% writes by default




                        26
Lessons learned (1)
• Simpler is better than clever
   o Clever approaches are hard to
     implement, test, debug and maintain

• Incremental is better than big-bang
Lessons learned (2)
• Non-algorithmic challenges can be hard
   o Dealing with network config, legacy software
     and requirements, the “corporate way,” multiple
     stakeholders…

• Researchers should get dirty hands
   o Being a part of shipping a real system can
     radically readjust your worldview
   o Write some test cases to understand system
     complexity

Weitere ähnliche Inhalte

Ähnlich wie Pnuts

2011.06.20 stratified-btree
2011.06.20 stratified-btree2011.06.20 stratified-btree
2011.06.20 stratified-btreeAcunu
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Modelsiammutex
 
VMware Backup in Cybozu Labs
VMware Backup in Cybozu LabsVMware Backup in Cybozu Labs
VMware Backup in Cybozu LabsTakashi Hoshino
 
Difference between team foundation server and subversion
Difference between team foundation server and subversionDifference between team foundation server and subversion
Difference between team foundation server and subversionUmar Ali
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Ben Stopford
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Kubernetes Workshop
Kubernetes WorkshopKubernetes Workshop
Kubernetes Workshoploodse
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency modelsrogerbodamer
 
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e..."JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...eLiberatica
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js ExplainedJeff Kunkle
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 

Ähnlich wie Pnuts (20)

Subversion last minute survival crash course
Subversion  last minute survival crash courseSubversion  last minute survival crash course
Subversion last minute survival crash course
 
2011.06.20 stratified-btree
2011.06.20 stratified-btree2011.06.20 stratified-btree
2011.06.20 stratified-btree
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
VMware Backup in Cybozu Labs
VMware Backup in Cybozu LabsVMware Backup in Cybozu Labs
VMware Backup in Cybozu Labs
 
Difference between team foundation server and subversion
Difference between team foundation server and subversionDifference between team foundation server and subversion
Difference between team foundation server and subversion
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kubernetes Workshop
Kubernetes WorkshopKubernetes Workshop
Kubernetes Workshop
 
Extlect03
Extlect03Extlect03
Extlect03
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e..."JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
 
Graph processing
Graph processingGraph processing
Graph processing
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js Explained
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 

Kürzlich hochgeladen

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 

Pnuts

  • 1. PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research 1
  • 2. Motivation • Web applications need: o Scalability -architectural scalability, scale linearly o Geographic scope -data replicas on multiple continents o High availability -failures, apps will still be able to read data o Relaxed consistency needs -Tolerate stale or reordered data 2
  • 3. Relaxed Consistency • Not strictly consistency • Very expensive. • Not eventually consistency • Ex: a photo sharing application • U1: Remove someone from the list of people who can view his photos • U2: Post spring-break photos 3
  • 4. What is PNUTS? • PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. • An architecture based on record- level, asynchronous geographic replication, and use of a guaranteed message-delivery service rather than a persistent log. 4
  • 6. System architecture • Storage Units • Store several hundreds of tablets, a tablet usually several hundreds of megabytes. • Routers • The router stores an interval mapping, which defines the boundaries of each tablet, and also maps each tablet to a storage unit. • Tablet Controller • Routers contain only a cached copy of the interval mapping. The mapping is owned by the tablet controller • YMB- Yahoo Message Broker • topic-based pub/sub system 6
  • 7. Yahoo Message Broker • Distributed publish-subscribe service. • Guarantees delivery once a message is published. • Asynchronously assigned to different regions and applied to their replicas. 7
  • 9. Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Storage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 9
  • 11. Accessing data 4 1 Record for key k Get key k 2 3 Record for key k Get key k SU SU SU 11
  • 12. Bulk read 1 {k1, k2, … kn} Get k 1 2 Get k 2 Get k 3 Scatter/ SU SU SU gather engine 12
  • 13. Per-record timeline consistency • all replicas of a given record apply all updates to the record in the same order. 13
  • 14. Per-record timeline consistency • An example sequence of updates to a record • 3 events: insert, update and delete. • One replica assigned as the master • Generation: new insert Version: each update 14
  • 15. Consistency model • Goal: make it easier for applications to reason about updates and cope with asynchrony • web applications typically manipulate one record at a time Record Update Update Update Update Update Update Delete Update inserted v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time 15
  • 16. Consistency model Read-any Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read-any: Returns a possibly stale version of the record. 16
  • 17. Consistency model Read latest Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read latest: Returns the latest copy of the record that reflects all writes that have succeeded. 17
  • 18. Consistency model Read-critical(required version): Read ≥ v.6 Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read critical: Returns a version of the record that is strictly newer than, or the same as the required version. 18
  • 19. Consistency model Test-and-set-write(required version) Write if = v.7 ERROR Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time This call performs the requested write to the record if and only if the present version of the record is the same as required version 19
  • 20. Consistency model Write if = v.7 ERROR Stale version Stale version Current version Mechanism: per record mastership v. 1 v. 2 v. 3 v. 4 v. 5 Generation 1 v. 6 v. 7 v. 8 Time 20
  • 21. Consistency levels • Eventual consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake) Region 1 Awake Awake Work Final state consistent Work (Alice, Home, Sleeping) (Alice, Work, Sleeping) (Alice, Work, Awake) Region 2 “Invalid” state visible
  • 22. Consistency levels • Timeline consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake) Region 1 Awake Work (Alice, Work, Awake) Work (Alice, Home, Sleeping) (Alice, Work, Awake) Region 2
  • 24. Experimental setup • Production PNUTS code o Enhanced with ordered table type • Three PNUTS regions o 2 west coast, 1 east coast o 5 storage units, 2 message brokers, 1 router • Workload parameters o Request rate: 1200-3600 requests/second o Read: write mix ratio:0-50% writes o Locality:80% 24
  • 25. Inserts • Inserts o required 75.6 ms per insert in West 1 (tablet master) o 131.5 ms per insert into the non-master West 2, and o 315.5 ms per insert into the non-master East. o These results show the expected effect that the cost of inserting is significantly higher if the insert is initiated in a non-master region that is far away from the tablet master. 25
  • 26. 10% writes by default 26
  • 27. Lessons learned (1) • Simpler is better than clever o Clever approaches are hard to implement, test, debug and maintain • Incremental is better than big-bang
  • 28. Lessons learned (2) • Non-algorithmic challenges can be hard o Dealing with network config, legacy software and requirements, the “corporate way,” multiple stakeholders… • Researchers should get dirty hands o Being a part of shipping a real system can radically readjust your worldview o Write some test cases to understand system complexity