08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Google Spanner : our understanding of concepts and implications
1. Google Spanner: our understanding
of concepts and implications
Harisankar H
DOS lab weekly seminar
8/Dec/2012
http://harisankarh.wordpress.com
"Google Spanner: our understanding of concepts and
implications" by Harisankar H is licensed under a
Creative Commons Attribution 3.0 Unported License.
2. Outline
• Spanner
– User perspective
• User = application programmer/administrator
– System architecture
– Implications
3. Spanner: user perspective
• Global scale database with strict transactional
guarantees
– Global scale
• designed to work across datacenters in different continents
• Claim: “designed to scale up to millions of nodes, hundreds of
datacenters, trillions of database rows”
– Strict transactional guarantees
• Supports general transactions(even inter-row)
• Stronger properties than serializability*
– replaced MySQL cluster storing their critical ad-related data
• Reliable even during wide-area natural disasters
– Supports hierarchical schema of tables
• Semi-relational
– Supports SQL-like query and definition language
– User-defined locality and availability
* means: explained in later slides
4. Need for Spanner
• Limitations of existing systems
– BigTable, (could apply to NoSQL systems in general)
• Needed complex, evolving schemas
• Only eventual consistency across data centers
– Needed wide-area replication with strong consistency
• Transactional scope limited to single row
– Needed general cross-row transactions
– Megastore, (relational db-like system)
• Low performance
– Layered on top of BigTable
» High communication costs
– Less efficient replica consistency algorithms*
• Better transactional guarantees in Spanner*
5. Spanner: transactional guarantee
• External consistency
– Stricter than serializability
– E.g., T3
T1
T2
physical time
Serial ordering
T1 T3 T2
T2 after T1
T1 T2 T3
T2 T3 T1
T2 T1 T3
6. External consistency: motivation
• Facebook-like example from OSDI talk
by Tom T3: view Jerry’s profile
T1: unfriend Tom
by Jerry T2: post comment
physical time
Jerry unfriends Tom to write a controversial comment
T2: Jerry posts comment T3: Tom views Jerry’s profile T1: Jerry unfriends Tom
If serial order is as above, Jerry will be in trouble!
Formally, “If commit of T1 preceded the initiation of a new transaction T2 in
wall-clock(physical) time, then commit of T1 should precede commit of T2 in
the serial ordering also. ”
7. Spanner: transactional guarantee
• Additional (weaker)transaction modes for
performance
– Read-only transaction supporting snapshot isolation
• Snapshot isolation
– Transactions read a consistent snapshot of the database
– Values written should not have conflicting updates after the
snapshot was read
– E.g., R1(X)R1(Y) R2(X)R2(Y) W2(Y) W1(X) is allowed
– Weaker than serializability, but more efficient(lock-free)
– Spanner do not allow writes for these transactions
» Probably, that is how they preserve isolation
– Snapshot read
• Read of a consistent state of the database in the past
8. Hierarchical data model
– Universes(Spanner deployment)
• Databases(collection of tables)
– Tables with schemas
» Ordered Rows, columns
» One or more primary-key columns
• Rows named during primary keys
– Hierarchies of tables
» Directory tables(top of table hierarchy)
• Directories
• Each row in directory table(with
key K) along with the rows in
descendant tables that start with
K form a directory
Figures (a),(b) from Spanner, OSDI 2012 paper
Fig: a
9. User perspective: database
configuration
• Database placement and reliability
– Administrator:
• Create options which specify number of replicas and
placement
– E.g., option (a): North America: 5 replicas, Europe: 3 replicas
option (b): Latin America: 3 replicas …
– Application
• Directory is the smallest unit for which these properties can
be specified
• Tag each directory or database with these options
– E.g., TomDir1: option (b)
JerryDir3: option (a) ….
Next: System architecture
10. Spanner architecture: basics
• Replica consistency
– Using Paxos protocol
• Different Paxos groups for different sets of directories
– Can be across data centers
• Concurrency control
– Using two phase locking
• Chose over optimistic methods because of long-lived transactions(order of
minutes)
• Transaction coordination
– 2 phase commit
• 2 phase commit on top of Paxos ensures availability
• Timestamps for transactions and data items
– To support snapshot isolation and snapshot reads
– Multiple timestamped versions of data items maintained
11. Spanner components
Universe master(status + Placement driver(move data
interactive debugging) across zones automatically)
Network
Zone 1(physical location) *True
Time
Zone 2(physical location)
Zone master(assign data) Service
Zone master(assign data)
Location proxy(locate data)
Location proxies(locate data)
Location proxy(locate data)
Location proxies(locate data)
…
…
Span servers(data)
Span servers(data) ……
12. Zones, directories and Paxos groups
Fig: (b)
Figures (a),(b) from Spanner, OSDI 2012 paper
13. Replication-related components
• Tablet: unit of storage
– Bag of directories
– Abstraction on top of underlying DFS Colossus
• Single Paxos state machine(replica) per tablet
• Replicas of each tablet form a Paxos group
• Leader elected among a Paxos group
Paxos group
Paxos leader
Tablet replica: DC1,n2 ….
Tablet replica: DC2,n8
…. ….
dirs
15. Next:
• Serializability ensured by the already
explained components
• External consistency implemented with help
of TrueTime service
– True time service also used for leader election
using timed leases