SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
F1 - The Distributed SQL
Database Supporting
Google's Ad Business
Bart Samwel
bsamwel@google.com
Distributed Matters
Barcelona, November 2015
What's This Talk About?
● Part 1: Distributed databases and the challenges of "web scale"
● Part 2: F1, the distributed SQL database from Google
Distributed Databases
and the challenges of
"web scale"
Traditional Relational Databases
● A typical traditional database setup:
○ Relational (SQL) databases with ACID transactions
○ Master/slave replication
○ Scale by partitioning or "sharding" data over multiple databases
● Transaction = unit of database work.
○ Set of related reads + writes followed by a "commit".
● An ACID transaction is:
○ Atomic: All-or-nothing.
○ Consistent: Database moves from one consistent state to another.
○ Isolated: Transactions (seem to) happen one after the other.
○ Durable: After transaction commits, the data is safe.
Web Challenge #1: Scaling
● Web company data sets are huge and tend to grow
exponentially¹
● Traditional SQL database model takes a lot of effort to scale:
adding servers, repartitioning data. Not impossible, but hard.
● Move towards distributed database systems with equal peers
○ All servers created equal, dynamic partitioning
○ Workload can shift dynamically between servers
○ Add more servers = handle more data and more requests
● Rise of "NoSQL"
○ Non-relational data stores without full ACID transaction support.
○ Can be key/value, document stores or other non-relational formats.
○ Examples: Google Bigtable, Apache HBase, MongoDB, Amazon
DynamoDB.
¹For companies that do well.
● Simple setups use master/slave replication even with NoSQL
systems and across datacenter boundaries
○ Only one datacenter can commit writes
○ commit != fully replicated across datacenters. Too slow.
● Web companies want to run multiple symmetric data centers
that each serve fraction of web requests.
● Multimaster replication:
○ Every datacenter can process write requests
○ Often eventually consistent
○ Prevent or resolve conflicts
● Example systems:
○ Amazon Dynamo (!= DynamoDB)
○ Apache Cassandra
Web challenge #2:
Multiple Datacenters
More Observations
● Tradeoff between consistency and latency.
○ Latency matters. High page load time => users go somewhere else
● Availability is king.
○ If you aren't available then you can't serve your users / sell things
○ Continue operating even if data centers can't talk to each other
● Trade-offs result in:
○ Complexity for applications
○ "Interesting" failure recovery
Google F1
What is Google F1?
F1 - A Hybrid Database combining the
● Scalability of Bigtable
● Usability and functionality of SQL databases
Built on Spanner, Google's globally replicated storage system
Key Ideas
● Scalability: Auto-sharded storage
● Availability & Consistency: Synchronous replication
● High commit latency, but can be hidden using
○ Hierarchical schema
○ Protocol buffer column types
○ Efficient client code
Can you have a scalable database without going NoSQL? Yes.
The AdWords Ecosystem
One shared database backing Google's core AdWords business
DB
log aggregation
ad logs
ad approvalsad servers
SOAP API web UI reports
advertiser
Java / "frontend"
C++ / "backend"
spam analysis
ad-hoc
SQL users
Our Legacy DB: Sharded MySQL
Sharding Strategy
● Sharded by customer
● Apps optimized using shard awareness
Limitations
● Availability
○ Master / slave replication -> downtime during failover
○ Schema changes -> downtime for table locking
● Scaling
○ Grow by adding shards
○ Rebalancing shards is extremely difficult and risky
○ Therefore, limit size and growth of data stored in database
● Functionality
○ Can't do cross-shard transactions or joins
Demanding Users
Critical applications driving Google's core ad business
● 24/7 availability, even with datacenter outages
● Consistency required
○ Can't afford to process inconsistent data
○ Eventual consistency too complex and painful
● Scale: 10s of TB, replicated to 1000s of machines
Shared schema
● Dozens of systems sharing one database
● Constantly evolving - several schema changes per week
SQL Query
● Query without code
Our Solution: F1
A new database,
● built from scratch,
● designed to operate at Google scale,
● without compromising on RDBMS features.
Co-developed with new lower-level storage system, Spanner
Google's globally distributed storage system (OSDI, 2012)
Scalable: transparent sharding, data movement
Replication
● Synchronous cross-datacenter replication
○ Paxos protocol: majority of replicas must acknowledge
● Master/slave replication with consistent snapshot reads at slaves
ACID Transactions
● Standard 2-phase row-level locking
● Local or cross-machine (using two-phase commit, 2PC)
● Transactions serializable because of TrueTime (see paper)
What is Spanner?
F1
Architecture
● Sharded Spanner servers
○ data on GFS and in memory
● Stateless F1 server
● Worker pools for distributed SQL execution
Features
● Relational schema
○ Consistent indexes
○ Extensions for hierarchy and rich data types
○ Non-blocking schema changes
● Multiple interfaces
○ SQL, key/value R/W, MapReduce
● Change history & notifications
F1 server
Spanner
server
GFS
Client
F1 query
workers
Hierarchical Schema
Relational tables, with hierarchical clustering. Example:
● Customer: Key (CustomerId)
● Campaign: Key (CustomerId, CampaignId)
● AdGroup: Key (CustomerId, CampaignId, AdGroupId)
Customer (1)
Campaign (1,3)
AdGroup (1,3,5)
AdGroup (1,3,6)
Campaign (1,4)
AdGroup (1,4,7)
Customer (2)
Campaign (2,5)
AdGroup (2,5,8)
1
1,3 1,4
1,4,71,3,61,3,5
2
2,5
2,5,8
Storage LayoutRows and PKs
Clustered Storage
● Child rows under one root row form a cluster
● Cluster stored on one machine (unless huge)
● Transactions within one cluster are most efficient
● Very efficient joins inside cluster (can merge with no sorting)
Customer (1)
Campaign (1,3)
AdGroup (1,3,5)
AdGroup (1,3,6)
Campaign (1,4)
AdGroup (1,4,7)
Customer (2)
Campaign (2,5)
AdGroup (2,5,8)
1
1,3 1,4
1,4,71,3,61,3,5
2
2,5
2,5,8
Storage LayoutRows and PKs
Protocol Buffer Column Types
Protocol Buffers
● Structured data types with optional and repeated fields
● Open-sourced by Google, APIs in several languages
Column data types are mostly Protocol Buffers
● Stored as blobs in Spanner
● SQL syntax extensions for reading nested fields
● Coarser schema with fewer tables - inlined objects instead
Why useful?
● Protocol Buffers pervasive at Google -> no impedance mismatch
● Simplified schema and code - apps use the same objects
○ Don't need foreign keys or joins if data is inlined
SQL on Protocol Buffers
SELECT CustomerId, f.*
FROM Customer c
JOIN c.Whitelist.feature f
WHERE f.feature_id IN (269, 302)
AND f.status = 'ENABLED'
SELECT CustomerId, Whitelist
FROM Customer
CustomerId Whitelist
123 feature {
feature_id: 18
status: DISABLED
}
feature {
feature_id: 269
status: ENABLED
}
feature {
feature_id: 302
status: ENABLED
}
CustomerId feature_id status
123 269 ENABLED
123 302 ENABLED
F1 SQL Query Engine
● Fully functional SQL - joins, aggregations, indexes, etc.
● Highly parallel global scans
○ Complex plans: arbitrary joins, partitioning and shuffling, DAGs
○ In-memory and streaming whenever possible
● Local joining in Spanner when possible (hierarchical schema!)
● Reading data at timestamp T
○ Always consistent
○ Always current data (a few seconds old)
One query engine used for
● User-facing applications (OLTP)
● Live reporting
● Analysis (OLAP)
● Joining to external data sources with stats and logs data
Example Distributed Query Plan
SELECT *
FROM Campaign JOIN Customer USING (CustomerId)
WHERE Customer.Info.country = 'US'
Spanner Spanner SpannerSpanner
HASH
JOIN
F1 Server Collect
SCAN Customer
HASH TABLE
by CustomerId
SCAN Campaign
Send to worker #
CustomerId MOD 200Send to worker #
CustomerId MOD 200
200 F1 workers
200 F1 workers
200 F1 workers
User
F1 Change History
● Every F1 transaction writes a Change History row
○ Keys, before & after values (as protocol buffer)
● Publish notifications to subscribers
○ "Customer X changed at time T"
Subscriber
● Checkpoint time CT per Customer
● Read changes in (CT, T)
● Process changes, update checkpoint
Uses
● Incremental extraction, streaming processing of changes
● Caching data in clients
○ Force catch-up to T, with no invalidation protocol
How We Deploy
Five replicas needed for high availability
● Why not three?
○ Assume one datacenter down
○ Then one more machine crash => partial outage
Geography
● Replicas spread across the country to survive regional disasters
○ Up to 100ms apart
Performance
● Very high commit latency - 50-100ms
● Reads have extra network hops - 5-10ms
● High throughput - 100s of kQPS
Coping with High Latency
Preferred transaction structure
● One read phase: Avoid serial reads
○ Read in batches
○ Read asynchronously in parallel
● Buffer writes in client, send as one RPC
Use coarse schema and hierarchy
● Fewer tables and columns
● Fewer joins, less "foreign key chasing"
For bulk operations
● Use small transactions in parallel - high throughput
Avoid ORMs that add hidden costs
Adjusting Clients
Typical MySQL ORM:
● Obscures database operations from app developers
● for loops doing one query per iteration
● Unwanted joins, loading unnecessary data
F1: ORM without the "R"
● Never uses relational joins
● All objects are loaded explicitly
○ Hierarchical schema and protocol buffers make this easy
○ Don't join - just load child objects with a range read
● Ask explicitly for parallel and async reads
Results: Development
● Code is slightly more complex
○ But predictable performance, scales well by default
● Developers happy
○ Simpler schema
○ Rich data types -> lower impedance mismatch
● One system for OLTP and OLAP workloads
○ No need for copies in bigtable
Results: Performance
User-Facing Latency
● Avg user action: ~200ms - on par with legacy system
● Flatter distribution of latencies
SQL Query Latency
● Similar or faster than MySQL
● More resources -> more parallelism -> faster
No Compromise Storage
Sharded
MySQL
NoSQL
(Bigtable)
F1 &
Spanner
Consistent reads
and ACID
✔
(per shard)
✔
(global)
SQL Query ✔
(per shard)
✔
(global)
Schema mgmt. ✔
(downtime required)
✔
(nonblocking)
Indexes ✔ ✔
"Infinite" scaling ✔ ✔
MapReduce ✔ ✔
High Availability Mostly ✔
We moved a large and critical application suite from MySQL to F1.
This gave us
● Better scalability
● Better availability
● Strong consistency guarantees
● More scalable SQL query
And also similar application latency, using
● Coarser schema with rich column types
● Smarter client coding patterns
In short, we made our database scale, without giving up
key database features along the way.
Summary
Questions...
Links
Research at Google: research.google.com
Careers at Google: google.com/about/careers/
Bart Samwel: bsamwel@google.com

Weitere ähnliche Inhalte

Mehr von distributed matters

Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015
Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015
Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015distributed matters
 
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...distributed matters
 
Conflict resolution with guns - Mark Nadal
Conflict resolution with guns - Mark NadalConflict resolution with guns - Mark Nadal
Conflict resolution with guns - Mark Nadaldistributed matters
 
A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenn
A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp KrennA tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenn
A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenndistributed matters
 
NoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin EsmannNoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin Esmanndistributed matters
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael HacksteinNoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael Hacksteindistributed matters
 
No Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil Calcado
No Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil CalcadoNo Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil Calcado
No Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil Calcadodistributed matters
 
Disque: a detailed overview of the distributed implementation - Salvatore San...
Disque: a detailed overview of the distributed implementation - Salvatore San...Disque: a detailed overview of the distributed implementation - Salvatore San...
Disque: a detailed overview of the distributed implementation - Salvatore San...distributed matters
 
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...distributed matters
 
Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes
 Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes
Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnesdistributed matters
 

Mehr von distributed matters (11)

Actors evolved- Rotem Hermon
Actors evolved- Rotem HermonActors evolved- Rotem Hermon
Actors evolved- Rotem Hermon
 
Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015
Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015
Jepsen V - Kyle Kingsbury - Key Note distributed matters Berlin 2015
 
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
 
Conflict resolution with guns - Mark Nadal
Conflict resolution with guns - Mark NadalConflict resolution with guns - Mark Nadal
Conflict resolution with guns - Mark Nadal
 
A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenn
A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp KrennA tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenn
A tale of queues — from ActiveMQ over Hazelcast to Disque - Philipp Krenn
 
NoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin EsmannNoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin Esmann
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael HacksteinNoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
No Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil Calcado
No Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil CalcadoNo Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil Calcado
No Free Lunch, Indeed: Three Years of Microservices at SoundCloud - Phil Calcado
 
Disque: a detailed overview of the distributed implementation - Salvatore San...
Disque: a detailed overview of the distributed implementation - Salvatore San...Disque: a detailed overview of the distributed implementation - Salvatore San...
Disque: a detailed overview of the distributed implementation - Salvatore San...
 
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
 
Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes
 Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes
Microservices with Netflix OSS & Spring Cloud - Arnaud Cogoluègnes
 

Kürzlich hochgeladen

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Kürzlich hochgeladen (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

F1 The Distributed SQL Database Supporting Google's Ad Business -Bart Samwell

  • 1. F1 - The Distributed SQL Database Supporting Google's Ad Business Bart Samwel bsamwel@google.com Distributed Matters Barcelona, November 2015
  • 2. What's This Talk About? ● Part 1: Distributed databases and the challenges of "web scale" ● Part 2: F1, the distributed SQL database from Google
  • 3. Distributed Databases and the challenges of "web scale"
  • 4. Traditional Relational Databases ● A typical traditional database setup: ○ Relational (SQL) databases with ACID transactions ○ Master/slave replication ○ Scale by partitioning or "sharding" data over multiple databases ● Transaction = unit of database work. ○ Set of related reads + writes followed by a "commit". ● An ACID transaction is: ○ Atomic: All-or-nothing. ○ Consistent: Database moves from one consistent state to another. ○ Isolated: Transactions (seem to) happen one after the other. ○ Durable: After transaction commits, the data is safe.
  • 5. Web Challenge #1: Scaling ● Web company data sets are huge and tend to grow exponentially¹ ● Traditional SQL database model takes a lot of effort to scale: adding servers, repartitioning data. Not impossible, but hard. ● Move towards distributed database systems with equal peers ○ All servers created equal, dynamic partitioning ○ Workload can shift dynamically between servers ○ Add more servers = handle more data and more requests ● Rise of "NoSQL" ○ Non-relational data stores without full ACID transaction support. ○ Can be key/value, document stores or other non-relational formats. ○ Examples: Google Bigtable, Apache HBase, MongoDB, Amazon DynamoDB. ¹For companies that do well.
  • 6. ● Simple setups use master/slave replication even with NoSQL systems and across datacenter boundaries ○ Only one datacenter can commit writes ○ commit != fully replicated across datacenters. Too slow. ● Web companies want to run multiple symmetric data centers that each serve fraction of web requests. ● Multimaster replication: ○ Every datacenter can process write requests ○ Often eventually consistent ○ Prevent or resolve conflicts ● Example systems: ○ Amazon Dynamo (!= DynamoDB) ○ Apache Cassandra Web challenge #2: Multiple Datacenters
  • 7. More Observations ● Tradeoff between consistency and latency. ○ Latency matters. High page load time => users go somewhere else ● Availability is king. ○ If you aren't available then you can't serve your users / sell things ○ Continue operating even if data centers can't talk to each other ● Trade-offs result in: ○ Complexity for applications ○ "Interesting" failure recovery
  • 9. What is Google F1? F1 - A Hybrid Database combining the ● Scalability of Bigtable ● Usability and functionality of SQL databases Built on Spanner, Google's globally replicated storage system Key Ideas ● Scalability: Auto-sharded storage ● Availability & Consistency: Synchronous replication ● High commit latency, but can be hidden using ○ Hierarchical schema ○ Protocol buffer column types ○ Efficient client code Can you have a scalable database without going NoSQL? Yes.
  • 10. The AdWords Ecosystem One shared database backing Google's core AdWords business DB log aggregation ad logs ad approvalsad servers SOAP API web UI reports advertiser Java / "frontend" C++ / "backend" spam analysis ad-hoc SQL users
  • 11. Our Legacy DB: Sharded MySQL Sharding Strategy ● Sharded by customer ● Apps optimized using shard awareness Limitations ● Availability ○ Master / slave replication -> downtime during failover ○ Schema changes -> downtime for table locking ● Scaling ○ Grow by adding shards ○ Rebalancing shards is extremely difficult and risky ○ Therefore, limit size and growth of data stored in database ● Functionality ○ Can't do cross-shard transactions or joins
  • 12. Demanding Users Critical applications driving Google's core ad business ● 24/7 availability, even with datacenter outages ● Consistency required ○ Can't afford to process inconsistent data ○ Eventual consistency too complex and painful ● Scale: 10s of TB, replicated to 1000s of machines Shared schema ● Dozens of systems sharing one database ● Constantly evolving - several schema changes per week SQL Query ● Query without code
  • 13. Our Solution: F1 A new database, ● built from scratch, ● designed to operate at Google scale, ● without compromising on RDBMS features. Co-developed with new lower-level storage system, Spanner
  • 14. Google's globally distributed storage system (OSDI, 2012) Scalable: transparent sharding, data movement Replication ● Synchronous cross-datacenter replication ○ Paxos protocol: majority of replicas must acknowledge ● Master/slave replication with consistent snapshot reads at slaves ACID Transactions ● Standard 2-phase row-level locking ● Local or cross-machine (using two-phase commit, 2PC) ● Transactions serializable because of TrueTime (see paper) What is Spanner?
  • 15. F1 Architecture ● Sharded Spanner servers ○ data on GFS and in memory ● Stateless F1 server ● Worker pools for distributed SQL execution Features ● Relational schema ○ Consistent indexes ○ Extensions for hierarchy and rich data types ○ Non-blocking schema changes ● Multiple interfaces ○ SQL, key/value R/W, MapReduce ● Change history & notifications F1 server Spanner server GFS Client F1 query workers
  • 16. Hierarchical Schema Relational tables, with hierarchical clustering. Example: ● Customer: Key (CustomerId) ● Campaign: Key (CustomerId, CampaignId) ● AdGroup: Key (CustomerId, CampaignId, AdGroupId) Customer (1) Campaign (1,3) AdGroup (1,3,5) AdGroup (1,3,6) Campaign (1,4) AdGroup (1,4,7) Customer (2) Campaign (2,5) AdGroup (2,5,8) 1 1,3 1,4 1,4,71,3,61,3,5 2 2,5 2,5,8 Storage LayoutRows and PKs
  • 17. Clustered Storage ● Child rows under one root row form a cluster ● Cluster stored on one machine (unless huge) ● Transactions within one cluster are most efficient ● Very efficient joins inside cluster (can merge with no sorting) Customer (1) Campaign (1,3) AdGroup (1,3,5) AdGroup (1,3,6) Campaign (1,4) AdGroup (1,4,7) Customer (2) Campaign (2,5) AdGroup (2,5,8) 1 1,3 1,4 1,4,71,3,61,3,5 2 2,5 2,5,8 Storage LayoutRows and PKs
  • 18. Protocol Buffer Column Types Protocol Buffers ● Structured data types with optional and repeated fields ● Open-sourced by Google, APIs in several languages Column data types are mostly Protocol Buffers ● Stored as blobs in Spanner ● SQL syntax extensions for reading nested fields ● Coarser schema with fewer tables - inlined objects instead Why useful? ● Protocol Buffers pervasive at Google -> no impedance mismatch ● Simplified schema and code - apps use the same objects ○ Don't need foreign keys or joins if data is inlined
  • 19. SQL on Protocol Buffers SELECT CustomerId, f.* FROM Customer c JOIN c.Whitelist.feature f WHERE f.feature_id IN (269, 302) AND f.status = 'ENABLED' SELECT CustomerId, Whitelist FROM Customer CustomerId Whitelist 123 feature { feature_id: 18 status: DISABLED } feature { feature_id: 269 status: ENABLED } feature { feature_id: 302 status: ENABLED } CustomerId feature_id status 123 269 ENABLED 123 302 ENABLED
  • 20. F1 SQL Query Engine ● Fully functional SQL - joins, aggregations, indexes, etc. ● Highly parallel global scans ○ Complex plans: arbitrary joins, partitioning and shuffling, DAGs ○ In-memory and streaming whenever possible ● Local joining in Spanner when possible (hierarchical schema!) ● Reading data at timestamp T ○ Always consistent ○ Always current data (a few seconds old) One query engine used for ● User-facing applications (OLTP) ● Live reporting ● Analysis (OLAP) ● Joining to external data sources with stats and logs data
  • 21. Example Distributed Query Plan SELECT * FROM Campaign JOIN Customer USING (CustomerId) WHERE Customer.Info.country = 'US' Spanner Spanner SpannerSpanner HASH JOIN F1 Server Collect SCAN Customer HASH TABLE by CustomerId SCAN Campaign Send to worker # CustomerId MOD 200Send to worker # CustomerId MOD 200 200 F1 workers 200 F1 workers 200 F1 workers User
  • 22. F1 Change History ● Every F1 transaction writes a Change History row ○ Keys, before & after values (as protocol buffer) ● Publish notifications to subscribers ○ "Customer X changed at time T" Subscriber ● Checkpoint time CT per Customer ● Read changes in (CT, T) ● Process changes, update checkpoint Uses ● Incremental extraction, streaming processing of changes ● Caching data in clients ○ Force catch-up to T, with no invalidation protocol
  • 23. How We Deploy Five replicas needed for high availability ● Why not three? ○ Assume one datacenter down ○ Then one more machine crash => partial outage Geography ● Replicas spread across the country to survive regional disasters ○ Up to 100ms apart Performance ● Very high commit latency - 50-100ms ● Reads have extra network hops - 5-10ms ● High throughput - 100s of kQPS
  • 24. Coping with High Latency Preferred transaction structure ● One read phase: Avoid serial reads ○ Read in batches ○ Read asynchronously in parallel ● Buffer writes in client, send as one RPC Use coarse schema and hierarchy ● Fewer tables and columns ● Fewer joins, less "foreign key chasing" For bulk operations ● Use small transactions in parallel - high throughput Avoid ORMs that add hidden costs
  • 25. Adjusting Clients Typical MySQL ORM: ● Obscures database operations from app developers ● for loops doing one query per iteration ● Unwanted joins, loading unnecessary data F1: ORM without the "R" ● Never uses relational joins ● All objects are loaded explicitly ○ Hierarchical schema and protocol buffers make this easy ○ Don't join - just load child objects with a range read ● Ask explicitly for parallel and async reads
  • 26. Results: Development ● Code is slightly more complex ○ But predictable performance, scales well by default ● Developers happy ○ Simpler schema ○ Rich data types -> lower impedance mismatch ● One system for OLTP and OLAP workloads ○ No need for copies in bigtable
  • 27. Results: Performance User-Facing Latency ● Avg user action: ~200ms - on par with legacy system ● Flatter distribution of latencies SQL Query Latency ● Similar or faster than MySQL ● More resources -> more parallelism -> faster
  • 28. No Compromise Storage Sharded MySQL NoSQL (Bigtable) F1 & Spanner Consistent reads and ACID ✔ (per shard) ✔ (global) SQL Query ✔ (per shard) ✔ (global) Schema mgmt. ✔ (downtime required) ✔ (nonblocking) Indexes ✔ ✔ "Infinite" scaling ✔ ✔ MapReduce ✔ ✔ High Availability Mostly ✔
  • 29. We moved a large and critical application suite from MySQL to F1. This gave us ● Better scalability ● Better availability ● Strong consistency guarantees ● More scalable SQL query And also similar application latency, using ● Coarser schema with rich column types ● Smarter client coding patterns In short, we made our database scale, without giving up key database features along the way. Summary
  • 31. Links Research at Google: research.google.com Careers at Google: google.com/about/careers/ Bart Samwel: bsamwel@google.com