QCONSF - FaunaDB Deterministic Transactions

Deterministic Global
Database Consistency
Chris Anderson | @jchris

2
Chris Anderson
Director of Developer Evangelism at Fauna
- Cofounder of Couchbase
- Architect of Couchbase Mobile
- Bachelor’s degree in philosophy from Reed
College
- O'Reilly technical book author
- Leads Fauna's developer community
About the Speakers

3
Meeting Agenda
• Fauna overview
• Storage engine
• Transaction protocol

• Founded in 2012 by the team that scaled Twitter’s infrastructure
• $25M Series A, largest ever by an OLTP database company
• Headquartered in San Francisco, CA
• FaunaDB – Operational database for the modern enterprise
Our Angels
Kevin Scott
CTO, Microsoft
Olivier Pomel
CEO, Datadog
Mazen Al-Rawashdeh
VP, eBay
Larry Gadea
CEO, Envoy
Our Investors
Company Background
© Fauna, Inc. 2018 4

FaunaDB: A Relational NoSQL Database
Relational DBs NoSQL DBsDistributed
Scalable
Flexible
Consistent
Transactional
Secure
+
+
Innovations
Cloud-Native

Core Innovations
Distributed ACID Transactions
Single-phase distributed txns algorithm
maximizes throughput and minimizes latency
Fault tolerance
Redundant, self-healing clustering with no loss
of liveness or durability
Operational simplicity
Dramatically simple multi- cluster management
Unified Multi-model Interface
Interact with all your data using the model
that suits your business requirements
High Security
Row-level identity, authentication,
and access control
Multi-tenancy
Shared services environments with QoS
prioritization and chargeback
Temporality
Run queries on historical data at any
point-in-time or as change feeds
Horizontal scalability
Scale from a single machine to multiple
datacenters and clouds with no downtime

Unified vs Partitioned
Consensus
Thanks to Daniel Abadi and for contributing research
and material. Learn more from this blog post
https://fauna.com/blog/faunadb-transaction-protocol

If write completes, all future reads must read that write (or a later
write)
(Formal meaning of “consistency” in CAP theorem)
The term “strict serializability” used for similar guarantee for
transactions
• If transaction A writes to data item X and commits
• If transactions B starts after A commits (in real world time) and reads X
• B’s read of X must be the value that A wrote (or some value written by a transaction that committed after A)
Example:
• Alice wants to throw surprise party for Charlie
• (1) First, she blocks Charlie in our social messaging app
• (2) Then she posts message saying “Surprise party for Charlie tonight!”
• (3) Charlie logs in. His client reads the writes from both (1) and (2) in order to figure out which of Alice’s messages
to display
• Since (2) started after (1) completed, and (3) after (2), it should be impossible for Charlie to see Alice’s new
Linearizability

Consensus protocols ensure that events
never delivered out of order
• So it is impossible for (2) to arrive at a replica
without (1)
Linearizability and consensus
Palo Alto
Boston
London
User Blocks
Alice Bob
Alice Charlie
User Message
Alice I will be hosting a party tonight
Alice What a great party!
Alice Surprise party for Charlie at 8PM!
User Blocks
Alice Bob
Alice Charlie
User Message
User Blocks
Alice Bob
Alice Charlie
User Message
(1)
(2)

Updates happen on different partitions
How should we do consensus?
But what if data is partitioned?
© Fauna, Inc. 2018 10
Palo Alto
Boston
London
User Message
User Message
User Blocks
Alice Bob
User Blocks
Alice Bob
User Message
User Blocks
Alice Bob

Updates happen on different partitions
How should we do consensus?
© Fauna, Inc. 2018 11
Palo Alto
Boston
London
User Message
User Message
User Blocks
Alice Bob
User Blocks
Alice Bob
User Message
(1)
(2)
User Blocks
Alice Bob
Alice Charlie

Option 1: Have separate consensus
protocols per partition
© Fauna, Inc. 2018 12
Palo Alto
Boston
London
User Blocks
Alice Bob
Alice Charlie
User Message
User Blocks
Alice Bob
Alice Charlie
User Blocks
Alice Bob
Alice Charlie
User Message
User Message
(1)
(2)

Option 1: Have separate consensus
protocols per partition
© Fauna, Inc. 2018 13
Palo Alto
Boston
London
User Blocks
Alice Bob
Alice Charlie
User Message
User Blocks
Alice Bob
Alice Charlie
User Blocks
Alice Bob
Alice Charlie
User Message
User Message
(1)
(2)

But now a delayed message can result in
out of order updates
• If Charlie’s browser reads from replica in
London, his surprise will be ruined!
Partitioned consensus
© Fauna, Inc. 2018 14
Palo Alto
Boston
London
User Blocks
Alice Bob
Alice Charlie
User Message
User Blocks
Alice Bob
User Blocks
Alice Bob
Alice Charlie
User Message
User Message
(1)
(2)

Another problem: what if (1) and (2) were
part of same atomic xact?
• Very unlikely that messages for (1) and (2) will
get to different nodes on replica at exactly same
time
• So how do we create a snapshot that doesn’t violate
atomicity?
Partitioned consensus
© Fauna, Inc. 2018 15
Palo Alto
Boston
London
User Blocks
Alice Bob
Alice Charlie
User Message
User Blocks
Alice Bob
Alice Charlie
User Blocks
Alice Bob
Alice Charlie
User Message
User Message
(1)
(2)

Give each transaction a timestamp
• If transaction A commits before transaction B starts, it must
have an earlier timestamp
• But how to ensure if A and B are accessing disjoint set of machines?
• Use real time!
• But then clocks have to be synchronized across machines (within a maximum error
bound)
To take a snapshot that doesn’t violate atomicity
• Choose timestamp
• At each machine, read data as of that timestamp
• Read last version of each data item written before this timestamp
Spanner’s solution
© Fauna, Inc. 2018 16

Option 2: Have just one (unified)
consensus protocol
• Combine atomic transactions ops into
one log event
© Fauna, Inc. 2018 17
Palo Alto
Boston
London
(2)Alice,“SurprisepartyforCharlieat
8PM!”
User Blocks
Alice Bob
Alice Charlie
User Message
User Blocks
Alice Bob
Alice Charlie
User Blocks
Alice Bob
Alice Charlie
User Message
User Message
(1)
(2)

Two huge requirements
• Clocks must be synchronized
• Hard to accomplish in practice
• Google relies on help from hardware --- atomic clocks and GPS
• Rumored to have 16 full time engineers helping to maintain synchronization
• Correctness of protocol assumes certain knowledge of maximum possible
skew
• All transactions must wait by size of this maximum before commit
• Extremely hard to maintain certain knowledge --- stuff happens!
• Assumptions can get violated with sudden clock jumps (e.g. VM migration)
Consequences of these requirements
• IT staff must devote resources to clock synchronization, or pay cloud provider
a premium for this functionality
• Application developers must code around possibility of consistency violations
that may arise as a result of clock skew jumps
• Increases complexity
Partitioned Consensus with Spanner
© Fauna, Inc. 2018 18

Unified consensus does not require clock synchronization
• Reduces IT complexity / costs
• No assumptions about maximum synchronization error bound
• Reduces complexity for application developer
Disadvantage of unified consensus
• Can be a scale bottleneck
• Batching multiple events in each consensus round can alleviate scale problems
• Calvin able to perform consensus on 500,000 transactions a second
• Can go higher with hierachitica
• Temporary outages during consensus leader election more global (entire
system cannot accept writes until new leader elected)
• Outage for partitioned consensus only for that partition
• But either way, this outage is only for a few seconds
• Slightly reduced flexibility in achieving low-latency read-only transactions
• Partitioned consensus can have different leaders in different regions
Unified consensus (Calvin/FaunaDB)
© Fauna, Inc. 2018 19

Deterministic Global
Database Consistency
Thanks to Daniel Abadi and for contributing research
and material. Learn more from this blog post
https://fauna.com/blog/faunadb-transaction-protocol

Transaction Log and MVCC Storage
© Fauna, Inc. 2018 21

Horizontally scalable storage
© Fauna, Inc. 2018 22
Each replica can be sharded into its own
number of partitions.
Reads use the version for the correct snapshot

Replicas in multiple regions
© Fauna, Inc. 2018 23
Single phase commit protocol
is uniquely well suited to
applications with widely
dispersed regions.

Deterministic Transaction Protocol
© Fauna, Inc. 2018 24
Different from Google
Spanner
No atomic clocks
Software only
Stronger guarantees

Deterministic Transaction Protocol
© Fauna, Inc. 2018 25
Use a preprocessor to handle client communications, and create a log of
submitted xacts
Send log in batched to DBMS
Every xact immediately requests all locks it will need (in order of log)
If it doesn’t know what it will need
• Run enough of the xact to find out, but do not change the database state
• Reissue xact to the preprocessor with lock requirements include as parameter
• Run enough of the new xact to find out if it locked the correct items
(database state might have changed in the meantime)
• If so, then xact can proceed as normal
• If not, reissue again to the preprocessor and repeat as necessary
Trivial to prove this is deterministic and deadlock-free

Transaction Protocol Example
© Fauna, Inc. 2018 26
Two customers want to purchase the same item, submit transactions to
geographically dispersed replicas

Transaction Protocol Example
© Fauna, Inc. 2018 27
Two customers want to
purchase the same item,
submit transactions to
geographically dispersed
replicas

Coordinators issue reads
© Fauna, Inc. 2018 28
The set of read data is
calculated by the coordinator
before the transaction is
proposed to the log. Some data
may be fetched from other
hosts in the replica. Snapshot
reads make this consistent.

Coordinators calculate write effects
© Fauna, Inc. 2018 29
Writes are known locally before
the transaction is submitted,
and are submitted as part of
the transaction.

Coordinators submit transactions to the log
© Fauna, Inc. 2018 30
All transactions have a
consensus position in the log,
determined by a RAFT like
protocol.

Replicas validate transaction reads
© Fauna, Inc. 2018 31
Transactions are validated before
they are applied.
Reads that were made on the
coordinator, at an earlier
snapshot, are rerun at the log’s
current snapshot, preserving
serializability.

Replicas apply transactions
© Fauna, Inc. 2018 32
If the reads are the same, the
transaction can commit its
buffered writes.

Validating a transaction with changed reads
© Fauna, Inc. 2018 33
In this case, re-running the
reads gives IN STOCK = 0
instead of 1, which the
coordinator was expecting.

Abort a transaction when reads don’t match
© Fauna, Inc. 2018 34
This is deterministic, so replicas
can decide to abort without
coordination.
The transaction is retried by the
coordinator, and fails with the
expected “out of stock” error.

© Fauna, Inc. 2018 44
Strong global data consistency in real-time
• Patent-pending single-phase commit
provides lowest possible global latency
• Transactions are not restricted to
data in a single site or shard
• Distributed log based algorithm
scales throughput with cluster size
• Reads are always site-local and
transactionally consistent
• Global transaction order preserves
external cause-and-effect
100% ACID

NoSQL, yet Relational
• Semi-structured documents that can store any
application data model
• Real-time indexing system enables querying using
multiple models within the same query
• Transaction expressiveness is similar to a modern
programming language
• Query language designed to fit into the current
development paradigms
• Embedded in application code, similar to LINQ or an
ORM
• Record-level access control enforces application
permissions
Unified Multi-Model Data Interactions
Read
Write
FQL API Interface
45© Fauna, Inc. 2018
Multi-Model Interface

© Fauna, Inc. 2018 46
Operational Simplicity
Operational simplicity unlike any other
• Setup multi-node, multi-region clusters in minutes
• Automatically manages database state to match
declared topology
• Built-in request routing shields cluster topology
from clients
• Rich command line interface / API to manipulate
nodes and clusters
• ACID compliant macro commands drastically
reduce risk
Elastic DevOps

© Fauna, Inc. 2018 47
Policy-based data isolation and security
• All client interactions require an API key by default
• Per-record access policy is highly customizable to application needs
• End-user credentials management secures data access on per-user basis
• QoS: Identity-based allocation of cluster resources
• Client/server and server/server network connections encrypted via TLS
• Temporal data model builds in auditing capabilities
High Security
Secure by Design

© Fauna, Inc. 2018 48
Roll out your shared services without risk
• Separates developer, schema admin, and cluster management
• Allows a single database group to service multiple internal teams
• Application teams can be self-service and isolated
• Tenant-aware QoS allows for safe mixed workloads
• Track who is consuming what and how much (for chargeback)
• Policy-based access to shared transactional data
Multi-Tenancy
Prioritized Requests | Schema Isolation | Seamless Clustering
Built-in Multi-tenancy

© Fauna, Inc. 2018 49
Snapshot isolation and configurable data retention
• Maintains the evolution of data over time, much like change tracking
in documents
• Queries can be run against the current time or a snapshot point in
time or a composite
• Event queries can alert you when new results match specific criteria
• Data retention supports intelligent change management, including
retroactive event updates to repair data input errors
• Extremely useful in use cases such as social activity feeds, recent
updates, syncing occasionally connected mobile and IoT devices
Temporality
Temporality: Track Your Data over Time

© Fauna, Inc. 2018 50
FaunaDB Node
Self-contained FaunaDB Node: Single Binary
Priority-based Scheduler
Fauna ACID Transaction Engine
(FATE)
Temporal Data Storage
Fauna Query Interface
Schemaless
Docs
Relational
Indexes
Tenant / Role
Access Control
Replicated
Write Ahead Log
Cluster Manager

© Fauna, Inc. 2018 52
Future-proofs your data architecture
• Limitless horizontal and vertical scale
• Infrastructure agnostic, minimal dependencies (jvm)
• Easily deploys and performs on commodity hardware
• Multi-cloud and hybrid deployment options
Designed for Modern Datacenters

© Fauna, Inc. 2018 53
Roadmap
Workload Consolidation
Search,
Graph, Geo
HTAP
Streaming
Data
Data Governance
Policy-based
Control
Encrypted Queries
Locality
Control
Datacenter Automation
Compute
Convergence
Automated
Recovery
Automated Scaling
Mission Critical Platform
100% ACID
Complete Flexibility
Operational
Simplicity

QCONSF - FaunaDB Deterministic Transactions

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie QCONSF - FaunaDB Deterministic Transactions

Ähnlich wie QCONSF - FaunaDB Deterministic Transactions (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

QCONSF - FaunaDB Deterministic Transactions

Hinweis der Redaktion