SlideShare a Scribd company logo
1 of 15
Beyond True Time
Authors:
M. Demirbas and S. Kulkarni
Presenter: Vahid Mirjalili
http://vahidmirjalili.com
Outline
● Spanner True Time (TT)
– Clock uncertainty
● Issues:
– Special hardware responsible for maintaining tightly
synchronized clock
– Transaction delays
● Proposed Augmented-Time (AT)
Spanner & True Time
(in Review)
● Spanner read and writes:
– T1, T2 two transactions
– If ph.T1<ph.T2, ts.T1 < ts.T2
● TT API:
– TT.now() → TT interval [earliest, latest]
– Error bound: ε=(latest – earliest)/2
● Error bound < 6ms (using special purpose time
master machines)
●
Implementation
A zone masterEach zone
(a unit of admin.
deployment)
100-1000 span servers
The zone master assigns
data to span servers
Span servers serve
data to clients
● Each span-server is responsible to 100-1000
tablets (a bag of mapping)
– (key:string, timestamp:int64) → string
Paxos state machines
● Implemented on top of each tablet to support
replication
– Leader replica → lock table (ranges of keys to lock
states)
– Transaction manager → to support distribued
transactions
● A Transaction involves
Only 1 Paxos group:
Bypass transaction manager
More than 1 Paxos group:
Perform 2 phase commit
Types of Transactions
● Read-write
● Read-only
● Snapshot reads
Transaction Types
● Read-write:
– Use a 2-phase locking and 2-phase commit
● Client issues a read to the leader of the appropriate group
● Acquires read locks and reads the most recent data
● 2-phase commit
– Leader assign transaction timestamp when all the locks are
acquired (but not released yet) si=TT.now().latest
– Commit-wait for the uncertainty in TT:
● The leader blocks clients from seeing data committed by Ti until si <
TT.now().earliest
● Read-only:
– Assign a timestmp s_read (s_read = TT.now.latest)
– Execuate a snapshot read at s_read without locking
Causality in Distributed Systems
● Approaches for ordering events:
1. Ignore wall clocks (assuming infinite uncertainty), and track
events using logical clocks
vector clocks: e → f iff vc.e < vc.f
2. Use a dedicated time-synchronization machine, order
events by wall-clock time
● Pros and Cons:
– Vector clocks: Size of VC can be too large
– Spanner TrueTime:
● No communication required
● Commit-wait for uncertainty
Snapshot read:
Reading a consistent-cut in the past,
very difficult to achieve without TT
Augmented Time (AT)
● at.j is a vector:
● at.j[k] knowledge that j has about wallclock of k
● Same update rule as VC
● Update at.j[j] by the wallclock of j
– Synchronization assumption: at.j[k] cannot be less than at.j[j]-ε
– No need to keep track of at.j[k] if no communication between j & k
within the last ε time
– Limiting the size of at.j to only those that communicated with j in
the last ε time
ε small
TT
ε large
VC
ε
ε small
TT
ε small
TT
Augmented Time (AT)
● Integrating AT into Spanner's tablets:
– Backward compatible
– (key:string, timestamp:TTstamp) → 'string , at.j'
Transactions with AT
● Read/Write Transactions:
– Similar to TT execution, except that leader's
timestamp is AT (a vector) instead of a single TT
– No delay
● Snapshot Reads:
– Similar to TT
– Two cases for reading x with latest updated time tx
1. Latest update to x is older than ε → trivial case
2. x has another update within tx-ε
Snapshot Read:
handling overlapping uncertainty intervals
● Issue due to no commit-wait
● Another update at t'x:
– order events at tx and t'x according to AT
– Identify the latest version of x
– This ordering is not possible with single TT
Discussion
● AT is wait-free → higher throughout
● Hidden backchannel dependencies
– Can be resolved by client-notification-wait → not
reducing write throughput
● No need for dedicated GPS and atomic clocks
→ use NTP with moderate uncertainty
Take Home Message
● TT:
– Demands highly synchronized systems
(with GPS and atomic clocks)
– Provides snapshot reads in the past
– Has to wait for the uncertainty interval
– If keeping vector clocks, size of vectors will be too large
● AT:
– Size of vector is reasonably small
– No need to wait for the uncertainty
– Higher write throughout
Questions?

More Related Content

What's hot

RedisDay London 2018 - What happens when Redis runs out of memory
RedisDay London 2018 - What happens when Redis runs out of memoryRedisDay London 2018 - What happens when Redis runs out of memory
RedisDay London 2018 - What happens when Redis runs out of memory
Redis Labs
 
4. concurrency control
4. concurrency control4. concurrency control
4. concurrency control
AbDul ThaYyal
 

What's hot (19)

Heroku's Ryan Smith at Waza 2013: Predictable Failure
Heroku's Ryan Smith at Waza 2013: Predictable FailureHeroku's Ryan Smith at Waza 2013: Predictable Failure
Heroku's Ryan Smith at Waza 2013: Predictable Failure
 
Concurrency control
Concurrency control Concurrency control
Concurrency control
 
Linux em tempo real
Linux em tempo realLinux em tempo real
Linux em tempo real
 
Lab5 s1
Lab5 s1Lab5 s1
Lab5 s1
 
RedisDay London 2018 - What happens when Redis runs out of memory
RedisDay London 2018 - What happens when Redis runs out of memoryRedisDay London 2018 - What happens when Redis runs out of memory
RedisDay London 2018 - What happens when Redis runs out of memory
 
What is system
What is systemWhat is system
What is system
 
Concurrency control
Concurrency controlConcurrency control
Concurrency control
 
4. concurrency control
4. concurrency control4. concurrency control
4. concurrency control
 
Adbms 43 multiversion concurrency control
Adbms 43 multiversion concurrency controlAdbms 43 multiversion concurrency control
Adbms 43 multiversion concurrency control
 
Timestamp protocols
Timestamp protocolsTimestamp protocols
Timestamp protocols
 
Concurrency Control
Concurrency ControlConcurrency Control
Concurrency Control
 
Real time-embedded-system-lec-05
Real time-embedded-system-lec-05Real time-embedded-system-lec-05
Real time-embedded-system-lec-05
 
02 performance
02 performance02 performance
02 performance
 
Real time-embedded-system-lec-04
Real time-embedded-system-lec-04Real time-embedded-system-lec-04
Real time-embedded-system-lec-04
 
Engineering slides venay magen
Engineering slides   venay magenEngineering slides   venay magen
Engineering slides venay magen
 
BKK16-203 Irq prediction or how to better estimate idle time
BKK16-203 Irq prediction or how to better estimate idle timeBKK16-203 Irq prediction or how to better estimate idle time
BKK16-203 Irq prediction or how to better estimate idle time
 
Real time-embedded-system-lec-02
Real time-embedded-system-lec-02Real time-embedded-system-lec-02
Real time-embedded-system-lec-02
 
Elementary vim tricks
Elementary vim tricksElementary vim tricks
Elementary vim tricks
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 

Similar to Beyond TrueTime

Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
Gera Shegalov
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
Sri Prasanna
 

Similar to Beyond TrueTime (20)

Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
synchonization PTP
synchonization PTP synchonization PTP
synchonization PTP
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
multiprocessor real_ time scheduling.ppt
multiprocessor real_ time scheduling.pptmultiprocessor real_ time scheduling.ppt
multiprocessor real_ time scheduling.ppt
 
Distributed systems scheduling
Distributed systems schedulingDistributed systems scheduling
Distributed systems scheduling
 
Designing TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Control
 
Unit iii-Synchronization
Unit iii-SynchronizationUnit iii-Synchronization
Unit iii-Synchronization
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
slides.06.pptx
slides.06.pptxslides.06.pptx
slides.06.pptx
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
 
Istanbul BFT
Istanbul BFTIstanbul BFT
Istanbul BFT
 
Os2
Os2Os2
Os2
 
Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure
Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure
Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Real time system tsp
Real time system tspReal time system tsp
Real time system tsp
 
Real time-embedded-system-lec-02
Real time-embedded-system-lec-02Real time-embedded-system-lec-02
Real time-embedded-system-lec-02
 
Timers
TimersTimers
Timers
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

Beyond TrueTime

  • 1. Beyond True Time Authors: M. Demirbas and S. Kulkarni Presenter: Vahid Mirjalili http://vahidmirjalili.com
  • 2. Outline ● Spanner True Time (TT) – Clock uncertainty ● Issues: – Special hardware responsible for maintaining tightly synchronized clock – Transaction delays ● Proposed Augmented-Time (AT)
  • 3. Spanner & True Time (in Review) ● Spanner read and writes: – T1, T2 two transactions – If ph.T1<ph.T2, ts.T1 < ts.T2 ● TT API: – TT.now() → TT interval [earliest, latest] – Error bound: ε=(latest – earliest)/2 ● Error bound < 6ms (using special purpose time master machines) ●
  • 4. Implementation A zone masterEach zone (a unit of admin. deployment) 100-1000 span servers The zone master assigns data to span servers Span servers serve data to clients ● Each span-server is responsible to 100-1000 tablets (a bag of mapping) – (key:string, timestamp:int64) → string
  • 5. Paxos state machines ● Implemented on top of each tablet to support replication – Leader replica → lock table (ranges of keys to lock states) – Transaction manager → to support distribued transactions ● A Transaction involves Only 1 Paxos group: Bypass transaction manager More than 1 Paxos group: Perform 2 phase commit
  • 6. Types of Transactions ● Read-write ● Read-only ● Snapshot reads
  • 7. Transaction Types ● Read-write: – Use a 2-phase locking and 2-phase commit ● Client issues a read to the leader of the appropriate group ● Acquires read locks and reads the most recent data ● 2-phase commit – Leader assign transaction timestamp when all the locks are acquired (but not released yet) si=TT.now().latest – Commit-wait for the uncertainty in TT: ● The leader blocks clients from seeing data committed by Ti until si < TT.now().earliest ● Read-only: – Assign a timestmp s_read (s_read = TT.now.latest) – Execuate a snapshot read at s_read without locking
  • 8. Causality in Distributed Systems ● Approaches for ordering events: 1. Ignore wall clocks (assuming infinite uncertainty), and track events using logical clocks vector clocks: e → f iff vc.e < vc.f 2. Use a dedicated time-synchronization machine, order events by wall-clock time ● Pros and Cons: – Vector clocks: Size of VC can be too large – Spanner TrueTime: ● No communication required ● Commit-wait for uncertainty Snapshot read: Reading a consistent-cut in the past, very difficult to achieve without TT
  • 9. Augmented Time (AT) ● at.j is a vector: ● at.j[k] knowledge that j has about wallclock of k ● Same update rule as VC ● Update at.j[j] by the wallclock of j – Synchronization assumption: at.j[k] cannot be less than at.j[j]-ε – No need to keep track of at.j[k] if no communication between j & k within the last ε time – Limiting the size of at.j to only those that communicated with j in the last ε time ε small TT ε large VC ε ε small TT ε small TT
  • 10. Augmented Time (AT) ● Integrating AT into Spanner's tablets: – Backward compatible – (key:string, timestamp:TTstamp) → 'string , at.j'
  • 11. Transactions with AT ● Read/Write Transactions: – Similar to TT execution, except that leader's timestamp is AT (a vector) instead of a single TT – No delay ● Snapshot Reads: – Similar to TT – Two cases for reading x with latest updated time tx 1. Latest update to x is older than ε → trivial case 2. x has another update within tx-ε
  • 12. Snapshot Read: handling overlapping uncertainty intervals ● Issue due to no commit-wait ● Another update at t'x: – order events at tx and t'x according to AT – Identify the latest version of x – This ordering is not possible with single TT
  • 13. Discussion ● AT is wait-free → higher throughout ● Hidden backchannel dependencies – Can be resolved by client-notification-wait → not reducing write throughput ● No need for dedicated GPS and atomic clocks → use NTP with moderate uncertainty
  • 14. Take Home Message ● TT: – Demands highly synchronized systems (with GPS and atomic clocks) – Provides snapshot reads in the past – Has to wait for the uncertainty interval – If keeping vector clocks, size of vectors will be too large ● AT: – Size of vector is reasonably small – No need to wait for the uncertainty – Higher write throughout