SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Google Spanner: our understanding
   of concepts and implications
                Harisankar H
          DOS lab weekly seminar
                 8/Dec/2012
     http://harisankarh.wordpress.com

        "Google Spanner: our understanding of concepts and
        implications" by Harisankar H is licensed under a
        Creative Commons Attribution 3.0 Unported License.
Outline
• Spanner
  – User perspective
     • User = application programmer/administrator
  – System architecture
  – Implications
Spanner: user perspective
• Global scale database with strict transactional
  guarantees
   – Global scale
      • designed to work across datacenters in different continents
      • Claim: “designed to scale up to millions of nodes, hundreds of
        datacenters, trillions of database rows”
   – Strict transactional guarantees
      • Supports general transactions(even inter-row)
      • Stronger properties than serializability*
          – replaced MySQL cluster storing their critical ad-related data
      • Reliable even during wide-area natural disasters
   – Supports hierarchical schema of tables
      • Semi-relational
          – Supports SQL-like query and definition language
   – User-defined locality and availability
                                * means: explained in later slides
Need for Spanner
• Limitations of existing systems
   – BigTable, (could apply to NoSQL systems in general)
       • Needed complex, evolving schemas
       • Only eventual consistency across data centers
            – Needed wide-area replication with strong consistency
       • Transactional scope limited to single row
            – Needed general cross-row transactions
   – Megastore, (relational db-like system)
       • Low performance
            – Layered on top of BigTable
                » High communication costs
            – Less efficient replica consistency algorithms*
       • Better transactional guarantees in Spanner*
Spanner: transactional guarantee
• External consistency
  – Stricter than serializability
  – E.g.,                                      T3

                                    T1
                                                         T2

                                         physical time
                 Serial ordering

     T1                T3                 T2
                                                              T2 after T1
     T1                T2                 T3


      T2               T3                 T1
      T2               T1                 T3
External consistency: motivation
  • Facebook-like example from OSDI talk
          by Tom                            T3: view Jerry’s profile

                                  T1: unfriend Tom
          by Jerry                                         T2: post comment

                                                physical time

                          Jerry unfriends Tom to write a controversial comment


T2: Jerry posts comment          T3: Tom views Jerry’s profile         T1: Jerry unfriends Tom

             If serial order is as above, Jerry will be in trouble!


   Formally, “If commit of T1 preceded the initiation of a new transaction T2 in
   wall-clock(physical) time, then commit of T1 should precede commit of T2 in
   the serial ordering also. ”
Spanner: transactional guarantee
• Additional (weaker)transaction modes for
  performance
  – Read-only transaction supporting snapshot isolation
     • Snapshot isolation
         – Transactions read a consistent snapshot of the database
         – Values written should not have conflicting updates after the
           snapshot was read
         – E.g., R1(X)R1(Y) R2(X)R2(Y) W2(Y) W1(X) is allowed
         – Weaker than serializability, but more efficient(lock-free)
         – Spanner do not allow writes for these transactions
             » Probably, that is how they preserve isolation
  – Snapshot read
     • Read of a consistent state of the database in the past
Hierarchical data model
    – Universes(Spanner deployment)
      • Databases(collection of tables)
         – Tables with schemas
             » Ordered Rows, columns
             » One or more primary-key columns
                  • Rows named during primary keys
         – Hierarchies of tables
             » Directory tables(top of table hierarchy)
                  • Directories
                      • Each row in directory table(with
                         key K) along with the rows in
                         descendant tables that start with
                         K form a directory



                               Figures (a),(b) from Spanner, OSDI 2012 paper

                      Fig: a
User perspective: database
             configuration
• Database placement and reliability
  – Administrator:
     • Create options which specify number of replicas and
       placement
         – E.g., option (a): North America: 5 replicas, Europe: 3 replicas
                 option (b): Latin America: 3 replicas …
  – Application
     • Directory is the smallest unit for which these properties can
       be specified
     • Tag each directory or database with these options
         – E.g., TomDir1: option (b)
                 JerryDir3: option (a) ….


                     Next:    System architecture
Spanner architecture: basics
• Replica consistency
   – Using Paxos protocol
       • Different Paxos groups for different sets of directories
            – Can be across data centers
• Concurrency control
   – Using two phase locking
       • Chose over optimistic methods because of long-lived transactions(order of
         minutes)
• Transaction coordination
   – 2 phase commit
       • 2 phase commit on top of Paxos ensures availability
• Timestamps for transactions and data items
   – To support snapshot isolation and snapshot reads
   – Multiple timestamped versions of data items maintained
Spanner components
         Universe master(status +         Placement driver(move data
          interactive debugging)          across zones automatically)


                                    Network

     Zone 1(physical location)                                             *True
                                                                           Time
                                              Zone 2(physical location)
   Zone master(assign data)                                               Service
                                          Zone master(assign data)
Location proxy(locate data)
Location proxies(locate data)
                                       Location proxy(locate data)
                                       Location proxies(locate data)
     …
                                              …
 Span servers(data)
                                        Span servers(data)           ……
Zones, directories and Paxos groups




              Fig: (b)
                         Figures (a),(b) from Spanner, OSDI 2012 paper
Replication-related components
• Tablet: unit of storage
  – Bag of directories
  – Abstraction on top of underlying DFS Colossus
• Single Paxos state machine(replica) per tablet
• Replicas of each tablet form a Paxos group
• Leader elected among a Paxos group
                            Paxos group

                                Paxos leader
   Tablet replica: DC1,n2                              ….
                              Tablet replica: DC2,n8
                   ….                        ….


            dirs
Transaction-related components
                                                      Paxos group(Participant)

                              Participant leader
Transaction T5:                 Paxos leader                 Participant slave
                           Tablet replica:                                       ….
                                                        Tablet replica:
                                             ….                        ….


     …..



                           Paxos group(Coordinator)

                              Coordinator leader(2PC +2PL)
     Coordinator slave         Paxos leader
  Tablet replica: DC1,n2                                 ….
                             Tablet replica: DC2,n8
                  ….                        ….
Next:
• Serializability ensured by the already
  explained components
• External consistency implemented with help
  of TrueTime service
  – True time service also used for leader election
    using timed leases
TrueTime + transaction
   implementation



     [by Aditya]
Implications of Spanner



     [REMOVED]
Thank you




• Image credits
  – Figures (a),(b) from Spanner, OSDI 2012 paper

Weitere ähnliche Inhalte

Was ist angesagt?

Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Oracle RAC 12c Best Practices with Appendices DOAG2013
Oracle RAC 12c Best Practices with Appendices DOAG2013Oracle RAC 12c Best Practices with Appendices DOAG2013
Oracle RAC 12c Best Practices with Appendices DOAG2013Markus Michalewicz
 
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationGC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationLudovic Poitou
 
Enable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgentEnable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgentBobby Curtis
 
Sql server replication step by step
Sql server replication step by stepSql server replication step by step
Sql server replication step by steplaonap166
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfSrirakshaSrinivasan2
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyStreamNative
 
OOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASKyle Hailey
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 
Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015aioughydchapter
 
MAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMarkus Michalewicz
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 

Was ist angesagt? (20)

Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Oracle RAC 12c Best Practices with Appendices DOAG2013
Oracle RAC 12c Best Practices with Appendices DOAG2013Oracle RAC 12c Best Practices with Appendices DOAG2013
Oracle RAC 12c Best Practices with Appendices DOAG2013
 
Analyzing awr report
Analyzing awr reportAnalyzing awr report
Analyzing awr report
 
AlwaysON Basics
AlwaysON BasicsAlwaysON Basics
AlwaysON Basics
 
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 PresentationGC Tuning in the HotSpot Java VM - a FISL 10 Presentation
GC Tuning in the HotSpot Java VM - a FISL 10 Presentation
 
Enable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgentEnable GoldenGate Monitoring with OEM 12c/JAgent
Enable GoldenGate Monitoring with OEM 12c/JAgent
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Sql server replication step by step
Sql server replication step by stepSql server replication step by step
Sql server replication step by step
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
 
OOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AAS
 
Lost with data consistency
Lost with data consistencyLost with data consistency
Lost with data consistency
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015Oracle rac cachefusion - High Availability Day 2015
Oracle rac cachefusion - High Availability Day 2015
 
MAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19c
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 

Andere mochten auch

Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud SpannerSimon Su
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed databaseAbhra Basak
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)Steve Min
 
Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud MonitoringSimon Su
 
Decentralized cloud an industrial reality with higher resilience by jean-pa...
Decentralized cloud   an industrial reality with higher resilience by jean-pa...Decentralized cloud   an industrial reality with higher resilience by jean-pa...
Decentralized cloud an industrial reality with higher resilience by jean-pa...Khazret Sapenov
 
Docker presentation
Docker presentationDocker presentation
Docker presentationEugen Oskin
 
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud PlatformGet more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud Platformjavier ramirez
 
DockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsDockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsArnaud Porterie
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
 
Hilscher netIOT - Industrial Cloud Communication
Hilscher netIOT - Industrial Cloud CommunicationHilscher netIOT - Industrial Cloud Communication
Hilscher netIOT - Industrial Cloud CommunicationArmin Pühringer
 
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]Dominik Obermaier
 
WSO2 Cloud Platform: Vision and Roadmap
WSO2 Cloud Platform: Vision and RoadmapWSO2 Cloud Platform: Vision and Roadmap
WSO2 Cloud Platform: Vision and RoadmapWSO2
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 

Andere mochten auch (18)

Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed database
 
Spanner
SpannerSpanner
Spanner
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)
 
Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud Monitoring
 
Decentralized cloud an industrial reality with higher resilience by jean-pa...
Decentralized cloud   an industrial reality with higher resilience by jean-pa...Decentralized cloud   an industrial reality with higher resilience by jean-pa...
Decentralized cloud an industrial reality with higher resilience by jean-pa...
 
The Real Time Cloud
The Real Time CloudThe Real Time Cloud
The Real Time Cloud
 
Docker presentation
Docker presentationDocker presentation
Docker presentation
 
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud PlatformGet more from Analytics 360 with BigQuery and the Google Cloud Platform
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
 
DockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsDockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operations
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Hilscher netIOT - Industrial Cloud Communication
Hilscher netIOT - Industrial Cloud CommunicationHilscher netIOT - Industrial Cloud Communication
Hilscher netIOT - Industrial Cloud Communication
 
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
Pub/Sub for the masses- Ein Einführungsworkshop in MQTT [GERMAN]
 
WSO2 Cloud Platform: Vision and Roadmap
WSO2 Cloud Platform: Vision and RoadmapWSO2 Cloud Platform: Vision and Roadmap
WSO2 Cloud Platform: Vision and Roadmap
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 

Ähnlich wie Google Spanner : our understanding of concepts and implications

Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and HadoopGirish L
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures小新 制造
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Shi Shao Feng
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyMongoDB
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopfordBen Stopford
 
Introduction to search engine-building with Lucene
Introduction to search engine-building with LuceneIntroduction to search engine-building with Lucene
Introduction to search engine-building with LuceneKai Chan
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 

Ähnlich wie Google Spanner : our understanding of concepts and implications (20)

Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and Hadoop
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Introduction to search engine-building with Lucene
Introduction to search engine-building with LuceneIntroduction to search engine-building with Lucene
Introduction to search engine-building with Lucene
 
Ch3-2
Ch3-2Ch3-2
Ch3-2
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
1mb copy of newdoc
1mb copy of newdoc 1mb copy of newdoc
1mb copy of newdoc
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
1.1 Overview.pdf
1.1 Overview.pdf1.1 Overview.pdf
1.1 Overview.pdf
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Google Spanner : our understanding of concepts and implications

  • 1. Google Spanner: our understanding of concepts and implications Harisankar H DOS lab weekly seminar 8/Dec/2012 http://harisankarh.wordpress.com "Google Spanner: our understanding of concepts and implications" by Harisankar H is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 2. Outline • Spanner – User perspective • User = application programmer/administrator – System architecture – Implications
  • 3. Spanner: user perspective • Global scale database with strict transactional guarantees – Global scale • designed to work across datacenters in different continents • Claim: “designed to scale up to millions of nodes, hundreds of datacenters, trillions of database rows” – Strict transactional guarantees • Supports general transactions(even inter-row) • Stronger properties than serializability* – replaced MySQL cluster storing their critical ad-related data • Reliable even during wide-area natural disasters – Supports hierarchical schema of tables • Semi-relational – Supports SQL-like query and definition language – User-defined locality and availability * means: explained in later slides
  • 4. Need for Spanner • Limitations of existing systems – BigTable, (could apply to NoSQL systems in general) • Needed complex, evolving schemas • Only eventual consistency across data centers – Needed wide-area replication with strong consistency • Transactional scope limited to single row – Needed general cross-row transactions – Megastore, (relational db-like system) • Low performance – Layered on top of BigTable » High communication costs – Less efficient replica consistency algorithms* • Better transactional guarantees in Spanner*
  • 5. Spanner: transactional guarantee • External consistency – Stricter than serializability – E.g., T3 T1 T2 physical time Serial ordering T1 T3 T2 T2 after T1 T1 T2 T3 T2 T3 T1 T2 T1 T3
  • 6. External consistency: motivation • Facebook-like example from OSDI talk by Tom T3: view Jerry’s profile T1: unfriend Tom by Jerry T2: post comment physical time Jerry unfriends Tom to write a controversial comment T2: Jerry posts comment T3: Tom views Jerry’s profile T1: Jerry unfriends Tom If serial order is as above, Jerry will be in trouble! Formally, “If commit of T1 preceded the initiation of a new transaction T2 in wall-clock(physical) time, then commit of T1 should precede commit of T2 in the serial ordering also. ”
  • 7. Spanner: transactional guarantee • Additional (weaker)transaction modes for performance – Read-only transaction supporting snapshot isolation • Snapshot isolation – Transactions read a consistent snapshot of the database – Values written should not have conflicting updates after the snapshot was read – E.g., R1(X)R1(Y) R2(X)R2(Y) W2(Y) W1(X) is allowed – Weaker than serializability, but more efficient(lock-free) – Spanner do not allow writes for these transactions » Probably, that is how they preserve isolation – Snapshot read • Read of a consistent state of the database in the past
  • 8. Hierarchical data model – Universes(Spanner deployment) • Databases(collection of tables) – Tables with schemas » Ordered Rows, columns » One or more primary-key columns • Rows named during primary keys – Hierarchies of tables » Directory tables(top of table hierarchy) • Directories • Each row in directory table(with key K) along with the rows in descendant tables that start with K form a directory Figures (a),(b) from Spanner, OSDI 2012 paper Fig: a
  • 9. User perspective: database configuration • Database placement and reliability – Administrator: • Create options which specify number of replicas and placement – E.g., option (a): North America: 5 replicas, Europe: 3 replicas option (b): Latin America: 3 replicas … – Application • Directory is the smallest unit for which these properties can be specified • Tag each directory or database with these options – E.g., TomDir1: option (b) JerryDir3: option (a) …. Next: System architecture
  • 10. Spanner architecture: basics • Replica consistency – Using Paxos protocol • Different Paxos groups for different sets of directories – Can be across data centers • Concurrency control – Using two phase locking • Chose over optimistic methods because of long-lived transactions(order of minutes) • Transaction coordination – 2 phase commit • 2 phase commit on top of Paxos ensures availability • Timestamps for transactions and data items – To support snapshot isolation and snapshot reads – Multiple timestamped versions of data items maintained
  • 11. Spanner components Universe master(status + Placement driver(move data interactive debugging) across zones automatically) Network Zone 1(physical location) *True Time Zone 2(physical location) Zone master(assign data) Service Zone master(assign data) Location proxy(locate data) Location proxies(locate data) Location proxy(locate data) Location proxies(locate data) … … Span servers(data) Span servers(data) ……
  • 12. Zones, directories and Paxos groups Fig: (b) Figures (a),(b) from Spanner, OSDI 2012 paper
  • 13. Replication-related components • Tablet: unit of storage – Bag of directories – Abstraction on top of underlying DFS Colossus • Single Paxos state machine(replica) per tablet • Replicas of each tablet form a Paxos group • Leader elected among a Paxos group Paxos group Paxos leader Tablet replica: DC1,n2 …. Tablet replica: DC2,n8 …. …. dirs
  • 14. Transaction-related components Paxos group(Participant) Participant leader Transaction T5: Paxos leader Participant slave Tablet replica: …. Tablet replica: …. …. ….. Paxos group(Coordinator) Coordinator leader(2PC +2PL) Coordinator slave Paxos leader Tablet replica: DC1,n2 …. Tablet replica: DC2,n8 …. ….
  • 15. Next: • Serializability ensured by the already explained components • External consistency implemented with help of TrueTime service – True time service also used for leader election using timed leases
  • 16. TrueTime + transaction implementation [by Aditya]
  • 18. Thank you • Image credits – Figures (a),(b) from Spanner, OSDI 2012 paper