Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
F1 - The Distributed SQL
Database Supporting
Google's Ad Business
Bart Samwel
bsamwel@google.com
Distributed Matters
Barce...
What's This Talk About?
● Part 1: Distributed databases and the challenges of "web scale"
● Part 2: F1, the distributed SQ...
Distributed Databases
and the challenges of
"web scale"
Traditional Relational Databases
● A typical traditional database setup:
○ Relational (SQL) databases with ACID transactio...
Web Challenge #1: Scaling
● Web company data sets are huge and tend to grow
exponentially¹
● Traditional SQL database mode...
● Simple setups use master/slave replication even with NoSQL
systems and across datacenter boundaries
○ Only one datacente...
More Observations
● Tradeoff between consistency and latency.
○ Latency matters. High page load time => users go somewhere...
Google F1
What is Google F1?
F1 - A Hybrid Database combining the
● Scalability of Bigtable
● Usability and functionality of SQL dat...
The AdWords Ecosystem
One shared database backing Google's core AdWords business
DB
log aggregation
ad logs
ad approvalsad...
Our Legacy DB: Sharded MySQL
Sharding Strategy
● Sharded by customer
● Apps optimized using shard awareness
Limitations
● ...
Demanding Users
Critical applications driving Google's core ad business
● 24/7 availability, even with datacenter outages
...
Our Solution: F1
A new database,
● built from scratch,
● designed to operate at Google scale,
● without compromising on RD...
Google's globally distributed storage system (OSDI, 2012)
Scalable: transparent sharding, data movement
Replication
● Sync...
F1
Architecture
● Sharded Spanner servers
○ data on GFS and in memory
● Stateless F1 server
● Worker pools for distributed...
Hierarchical Schema
Relational tables, with hierarchical clustering. Example:
● Customer: Key (CustomerId)
● Campaign: Key...
Clustered Storage
● Child rows under one root row form a cluster
● Cluster stored on one machine (unless huge)
● Transacti...
Protocol Buffer Column Types
Protocol Buffers
● Structured data types with optional and repeated fields
● Open-sourced by ...
SQL on Protocol Buffers
SELECT CustomerId, f.*
FROM Customer c
JOIN c.Whitelist.feature f
WHERE f.feature_id IN (269, 302)...
F1 SQL Query Engine
● Fully functional SQL - joins, aggregations, indexes, etc.
● Highly parallel global scans
○ Complex p...
Example Distributed Query Plan
SELECT *
FROM Campaign JOIN Customer USING (CustomerId)
WHERE Customer.Info.country = 'US'
...
F1 Change History
● Every F1 transaction writes a Change History row
○ Keys, before & after values (as protocol buffer)
● ...
How We Deploy
Five replicas needed for high availability
● Why not three?
○ Assume one datacenter down
○ Then one more mac...
Coping with High Latency
Preferred transaction structure
● One read phase: Avoid serial reads
○ Read in batches
○ Read asy...
Adjusting Clients
Typical MySQL ORM:
● Obscures database operations from app developers
● for loops doing one query per it...
Results: Development
● Code is slightly more complex
○ But predictable performance, scales well by default
● Developers ha...
Results: Performance
User-Facing Latency
● Avg user action: ~200ms - on par with legacy system
● Flatter distribution of l...
No Compromise Storage
Sharded
MySQL
NoSQL
(Bigtable)
F1 &
Spanner
Consistent reads
and ACID
✔
(per shard)
✔
(global)
SQL Q...
We moved a large and critical application suite from MySQL to F1.
This gave us
● Better scalability
● Better availability
...
Questions...
Links
Research at Google: research.google.com
Careers at Google: google.com/about/careers/
Bart Samwel: bsamwel@google.com
Nächste SlideShare
Wird geladen in …5
×

F1 The Distributed SQL Database Supporting Google's Ad Business -Bart Samwell

1.456 Aufrufe

Veröffentlicht am

Large scale internet operations such as Google, Facebook, and Amazon manage amazing amounts of data. Doing so requires databases that are distributed across multiple servers or even multiple data centers, with high throughput, strong latency requirements, "five nines" of availability, and often with strict data consistency requirements. This talk starts by introducing relational SQL databases, NoSQL databases, and the current state of the art in such databases as deployed in industry. It then provides an introduction to Google F1, a SQL database based on Google's Spanner distributed storage system. F1 is used to store the data for AdWords, Google's search advertising product. F1 and Spanner represent a new, hybrid approach to distributed databases that combines the scalability and availability of NoSQL storage systems like Google's Bigtable and Amazon's DynamoDB, with the convenience and consistency guarantees provided by traditional SQL relational databases.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

F1 The Distributed SQL Database Supporting Google's Ad Business -Bart Samwell

  1. 1. F1 - The Distributed SQL Database Supporting Google's Ad Business Bart Samwel bsamwel@google.com Distributed Matters Barcelona, November 2015
  2. 2. What's This Talk About? ● Part 1: Distributed databases and the challenges of "web scale" ● Part 2: F1, the distributed SQL database from Google
  3. 3. Distributed Databases and the challenges of "web scale"
  4. 4. Traditional Relational Databases ● A typical traditional database setup: ○ Relational (SQL) databases with ACID transactions ○ Master/slave replication ○ Scale by partitioning or "sharding" data over multiple databases ● Transaction = unit of database work. ○ Set of related reads + writes followed by a "commit". ● An ACID transaction is: ○ Atomic: All-or-nothing. ○ Consistent: Database moves from one consistent state to another. ○ Isolated: Transactions (seem to) happen one after the other. ○ Durable: After transaction commits, the data is safe.
  5. 5. Web Challenge #1: Scaling ● Web company data sets are huge and tend to grow exponentially¹ ● Traditional SQL database model takes a lot of effort to scale: adding servers, repartitioning data. Not impossible, but hard. ● Move towards distributed database systems with equal peers ○ All servers created equal, dynamic partitioning ○ Workload can shift dynamically between servers ○ Add more servers = handle more data and more requests ● Rise of "NoSQL" ○ Non-relational data stores without full ACID transaction support. ○ Can be key/value, document stores or other non-relational formats. ○ Examples: Google Bigtable, Apache HBase, MongoDB, Amazon DynamoDB. ¹For companies that do well.
  6. 6. ● Simple setups use master/slave replication even with NoSQL systems and across datacenter boundaries ○ Only one datacenter can commit writes ○ commit != fully replicated across datacenters. Too slow. ● Web companies want to run multiple symmetric data centers that each serve fraction of web requests. ● Multimaster replication: ○ Every datacenter can process write requests ○ Often eventually consistent ○ Prevent or resolve conflicts ● Example systems: ○ Amazon Dynamo (!= DynamoDB) ○ Apache Cassandra Web challenge #2: Multiple Datacenters
  7. 7. More Observations ● Tradeoff between consistency and latency. ○ Latency matters. High page load time => users go somewhere else ● Availability is king. ○ If you aren't available then you can't serve your users / sell things ○ Continue operating even if data centers can't talk to each other ● Trade-offs result in: ○ Complexity for applications ○ "Interesting" failure recovery
  8. 8. Google F1
  9. 9. What is Google F1? F1 - A Hybrid Database combining the ● Scalability of Bigtable ● Usability and functionality of SQL databases Built on Spanner, Google's globally replicated storage system Key Ideas ● Scalability: Auto-sharded storage ● Availability & Consistency: Synchronous replication ● High commit latency, but can be hidden using ○ Hierarchical schema ○ Protocol buffer column types ○ Efficient client code Can you have a scalable database without going NoSQL? Yes.
  10. 10. The AdWords Ecosystem One shared database backing Google's core AdWords business DB log aggregation ad logs ad approvalsad servers SOAP API web UI reports advertiser Java / "frontend" C++ / "backend" spam analysis ad-hoc SQL users
  11. 11. Our Legacy DB: Sharded MySQL Sharding Strategy ● Sharded by customer ● Apps optimized using shard awareness Limitations ● Availability ○ Master / slave replication -> downtime during failover ○ Schema changes -> downtime for table locking ● Scaling ○ Grow by adding shards ○ Rebalancing shards is extremely difficult and risky ○ Therefore, limit size and growth of data stored in database ● Functionality ○ Can't do cross-shard transactions or joins
  12. 12. Demanding Users Critical applications driving Google's core ad business ● 24/7 availability, even with datacenter outages ● Consistency required ○ Can't afford to process inconsistent data ○ Eventual consistency too complex and painful ● Scale: 10s of TB, replicated to 1000s of machines Shared schema ● Dozens of systems sharing one database ● Constantly evolving - several schema changes per week SQL Query ● Query without code
  13. 13. Our Solution: F1 A new database, ● built from scratch, ● designed to operate at Google scale, ● without compromising on RDBMS features. Co-developed with new lower-level storage system, Spanner
  14. 14. Google's globally distributed storage system (OSDI, 2012) Scalable: transparent sharding, data movement Replication ● Synchronous cross-datacenter replication ○ Paxos protocol: majority of replicas must acknowledge ● Master/slave replication with consistent snapshot reads at slaves ACID Transactions ● Standard 2-phase row-level locking ● Local or cross-machine (using two-phase commit, 2PC) ● Transactions serializable because of TrueTime (see paper) What is Spanner?
  15. 15. F1 Architecture ● Sharded Spanner servers ○ data on GFS and in memory ● Stateless F1 server ● Worker pools for distributed SQL execution Features ● Relational schema ○ Consistent indexes ○ Extensions for hierarchy and rich data types ○ Non-blocking schema changes ● Multiple interfaces ○ SQL, key/value R/W, MapReduce ● Change history & notifications F1 server Spanner server GFS Client F1 query workers
  16. 16. Hierarchical Schema Relational tables, with hierarchical clustering. Example: ● Customer: Key (CustomerId) ● Campaign: Key (CustomerId, CampaignId) ● AdGroup: Key (CustomerId, CampaignId, AdGroupId) Customer (1) Campaign (1,3) AdGroup (1,3,5) AdGroup (1,3,6) Campaign (1,4) AdGroup (1,4,7) Customer (2) Campaign (2,5) AdGroup (2,5,8) 1 1,3 1,4 1,4,71,3,61,3,5 2 2,5 2,5,8 Storage LayoutRows and PKs
  17. 17. Clustered Storage ● Child rows under one root row form a cluster ● Cluster stored on one machine (unless huge) ● Transactions within one cluster are most efficient ● Very efficient joins inside cluster (can merge with no sorting) Customer (1) Campaign (1,3) AdGroup (1,3,5) AdGroup (1,3,6) Campaign (1,4) AdGroup (1,4,7) Customer (2) Campaign (2,5) AdGroup (2,5,8) 1 1,3 1,4 1,4,71,3,61,3,5 2 2,5 2,5,8 Storage LayoutRows and PKs
  18. 18. Protocol Buffer Column Types Protocol Buffers ● Structured data types with optional and repeated fields ● Open-sourced by Google, APIs in several languages Column data types are mostly Protocol Buffers ● Stored as blobs in Spanner ● SQL syntax extensions for reading nested fields ● Coarser schema with fewer tables - inlined objects instead Why useful? ● Protocol Buffers pervasive at Google -> no impedance mismatch ● Simplified schema and code - apps use the same objects ○ Don't need foreign keys or joins if data is inlined
  19. 19. SQL on Protocol Buffers SELECT CustomerId, f.* FROM Customer c JOIN c.Whitelist.feature f WHERE f.feature_id IN (269, 302) AND f.status = 'ENABLED' SELECT CustomerId, Whitelist FROM Customer CustomerId Whitelist 123 feature { feature_id: 18 status: DISABLED } feature { feature_id: 269 status: ENABLED } feature { feature_id: 302 status: ENABLED } CustomerId feature_id status 123 269 ENABLED 123 302 ENABLED
  20. 20. F1 SQL Query Engine ● Fully functional SQL - joins, aggregations, indexes, etc. ● Highly parallel global scans ○ Complex plans: arbitrary joins, partitioning and shuffling, DAGs ○ In-memory and streaming whenever possible ● Local joining in Spanner when possible (hierarchical schema!) ● Reading data at timestamp T ○ Always consistent ○ Always current data (a few seconds old) One query engine used for ● User-facing applications (OLTP) ● Live reporting ● Analysis (OLAP) ● Joining to external data sources with stats and logs data
  21. 21. Example Distributed Query Plan SELECT * FROM Campaign JOIN Customer USING (CustomerId) WHERE Customer.Info.country = 'US' Spanner Spanner SpannerSpanner HASH JOIN F1 Server Collect SCAN Customer HASH TABLE by CustomerId SCAN Campaign Send to worker # CustomerId MOD 200Send to worker # CustomerId MOD 200 200 F1 workers 200 F1 workers 200 F1 workers User
  22. 22. F1 Change History ● Every F1 transaction writes a Change History row ○ Keys, before & after values (as protocol buffer) ● Publish notifications to subscribers ○ "Customer X changed at time T" Subscriber ● Checkpoint time CT per Customer ● Read changes in (CT, T) ● Process changes, update checkpoint Uses ● Incremental extraction, streaming processing of changes ● Caching data in clients ○ Force catch-up to T, with no invalidation protocol
  23. 23. How We Deploy Five replicas needed for high availability ● Why not three? ○ Assume one datacenter down ○ Then one more machine crash => partial outage Geography ● Replicas spread across the country to survive regional disasters ○ Up to 100ms apart Performance ● Very high commit latency - 50-100ms ● Reads have extra network hops - 5-10ms ● High throughput - 100s of kQPS
  24. 24. Coping with High Latency Preferred transaction structure ● One read phase: Avoid serial reads ○ Read in batches ○ Read asynchronously in parallel ● Buffer writes in client, send as one RPC Use coarse schema and hierarchy ● Fewer tables and columns ● Fewer joins, less "foreign key chasing" For bulk operations ● Use small transactions in parallel - high throughput Avoid ORMs that add hidden costs
  25. 25. Adjusting Clients Typical MySQL ORM: ● Obscures database operations from app developers ● for loops doing one query per iteration ● Unwanted joins, loading unnecessary data F1: ORM without the "R" ● Never uses relational joins ● All objects are loaded explicitly ○ Hierarchical schema and protocol buffers make this easy ○ Don't join - just load child objects with a range read ● Ask explicitly for parallel and async reads
  26. 26. Results: Development ● Code is slightly more complex ○ But predictable performance, scales well by default ● Developers happy ○ Simpler schema ○ Rich data types -> lower impedance mismatch ● One system for OLTP and OLAP workloads ○ No need for copies in bigtable
  27. 27. Results: Performance User-Facing Latency ● Avg user action: ~200ms - on par with legacy system ● Flatter distribution of latencies SQL Query Latency ● Similar or faster than MySQL ● More resources -> more parallelism -> faster
  28. 28. No Compromise Storage Sharded MySQL NoSQL (Bigtable) F1 & Spanner Consistent reads and ACID ✔ (per shard) ✔ (global) SQL Query ✔ (per shard) ✔ (global) Schema mgmt. ✔ (downtime required) ✔ (nonblocking) Indexes ✔ ✔ "Infinite" scaling ✔ ✔ MapReduce ✔ ✔ High Availability Mostly ✔
  29. 29. We moved a large and critical application suite from MySQL to F1. This gave us ● Better scalability ● Better availability ● Strong consistency guarantees ● More scalable SQL query And also similar application latency, using ● Coarser schema with rich column types ● Smarter client coding patterns In short, we made our database scale, without giving up key database features along the way. Summary
  30. 30. Questions...
  31. 31. Links Research at Google: research.google.com Careers at Google: google.com/about/careers/ Bart Samwel: bsamwel@google.com

×