During this session Greg Brandt and Liyin Tang, Data Infrastructure engineers from Airbnb, will discuss the design and architecture of Airbnb's streaming ETL infrastructure, which exports data from RDS for MySQL and DynamoDB into Airbnb's data warehouse, using a system called SpinalTap. We will also discuss how we leverage Spark Streaming to compute derived data from tracking topics and/or database tables, and HBase to provide immediate data access and generate cleanly time-partitioned Hive tables.
5. New Challenges
• Co-processing logic breaks down out of process/transaction context
• Primary tables/indices on many machines, not single RDBMS
• Specialized systems needed for certain use cases (analytics, search,
etc.)
6. Architectural Tenants
• Build for production
• Plan for the future, build for today
• Prefer existing solutions and patterns that we have
experience with in production
• Services should own their data and not share their
storage
• Mutations to data should be propagated via
standardized events
7. Change Data Capture (CDC)
Goal: Provide streams of data mutations
• In near real time
• With timeline consistency
To keep all these systems in sync
8. Option 1: Application-Driven Dual Writes
• Consistency hard
• (2PC/consensus needed)
• Data model easy
• (Schema controlled by application)
• Development easy
• Use queue e.g. Kafka, RabbitMQ in addition to RDBMS
10. We Chose Database Log Mining
• Parsing is easier than consensus
• Many libraries/APIs exist to make parsing easy
• Consuming stream of commits gives timeline
consistency by default
12. Requirements
• Timeline consistency with at-least-once message
delivery
• Easily add new sources to consume (new machines if
necessary)
• Support low latency and high throughput use cases
• High availability with automatic failover
• Heterogeneous data sources (MySQL, Amazon
DynamoDB)
14. DynamoDB Streams
• Using DynamoDB Streams Kinesis
Adapter
• Guarantees
• Each stream record appears exactly once
in the stream.
• Stream records appear in the same
sequence as the actual modifications to
the item
• Monotonically increasing logical clock
is hard
• Need to incorporate shard id, parent/child
splitting semantics
• SequenceNumber is not global
15. Abstract Mutation
• Provide monotonically increasing* id
from logical clock
• Source-specific metadata (e.g. MySQL
binlog filename/offset)
• The beforeImage of the row in DB
(possibly null)
• The afterImage of the row in DB
(possibly null)
• Encode this using source-agnostic
format (e.g. Thrift)
• Write this object to message bus (e.g.
Kafka)
{
id: Long,
opCode: [
INSERT,
UPDATE,
DELETE
],
metadata: Map<String, String>,
beforeImage: Record,
afterImage: Record
}
16. Clustering/Configuration
• LEADER/STANDBY state model
• Each machine is LEADER for a subset of
sources
• Workload distributed evenly
• Use ZooKeeper-based Apache Helix
framework for cluster management
• http://helix.apache.org/
• Dynamic source configuration changes
• Helix Instance group tags to separate
MySQL/DynamoDB nodes
17. Fault Tolerance
• Controller handles node failure/elects
new LEADER for sources
• Maintain leader_epoch counter in Helix
ZooKeeper property store
• Prefix generated ids with leader_epoch
for monotonicity
• E.g. (leader_epoch, binlog_file,
binlog_pos)
18. Pub/Sub
• Produce mutations to Kafka with
durable configuration*
• Async coprocessors consume
messages, produce new streams
• Model streaming library allows
encapsulation of DB table schema
• Service controls both API endpoint and
streaming view of data
• Keep 24 hours of MySQL binlog
• Alert / rewind on failures in this tier
19. Online Validation
• Download binlog after it is flushed/immutable
• Check for holes/ordering violations by consuming stream from Kafka
• Allows us to maintain low latency with confidence in consistency of stream
• Auto-healing
• Reset binlog position to earlier if too many failures
20. Production Lessons
• Need schema history store for regions of commit log to support rewind
• E.g. write DDL to commit log, apply to local MySQL while processing stream to obtain
range/schema mapping
• Be careful about table encodings! (latin1, utf8...)
• request.required.acks = all can potentially hit every broker…
• (Group produce requests by broker to avoid hitting too many)
• Per-source produce buffer size
• (Tune for throughput/latency)
25. Point-in-Time Restore based DB Export
• Pros:
• Simple
• Especially for schema change
• Consistent
• Cons:
• No SLA for RDS PITR restoration time
• No near real time ad hoc query
• No hourly snapshot
• High storage cost
35. Key Space Design
• Multiplex all DB tables on Single HBase Table
• Fast point look up based on primary keys
• Efficient sequential scans for one table
• Load balance
36. HBase Row Keys – Primary Keys
• Hash Key= md5(DB_TABLE, PK1=v1, PK2=v2)
• Row Key = Hash Key + DB_TABLE + PK1=v1 +
Pk2=v2
• Fast point lookup based on primary keys
• Efficient sequential scan for all the keys in same
DB/Table
• Balanced based on hash key
Hash DB_TABLE PK1=v1 PK2=v2
37. HBase Row Keys – Secondary Keys
• Hash Key= md5(DB_TABLE, Index_1=v1)
• Row Key = Hash Key + DB_TABLE + Index_1=v1 +
PK1=vpk1
• Prefix scan for a given secondary index
Hash DB_TABLE Index=v1 PK1=vpk1
38. HBase Versioning
Rows CF:Columns Version Value
<ShardKey><DB_TABLE_#1><
PK_a=A>
id FriMay1900:33:192016 101
<ShardKey><DB_TABLE_#1><
PK_a=A>
city FriMay1900:33:192016 SanFrancisco
<ShardKey><DB_TABLE_#1><
PK_a=A>
city FriMay1000:34:192016 NewYork
<ShardKey><DB_TABLE_#2><
PK_a=A’>
id FriMay1900:33:192016 1
39. Version by Timestamp
Binlog Order
TXN 1
COMMIT_T
S: 101
TXN 2
COMMIT_T
S: 102
TXN 3
COMMIT_T
S: 103
TXN N
COMMIT_T
S: N’
…
40. Version by Timestamp
Binlog Order
TXN 1
COMMIT_T
S: T1
TXN 2
COMMIT_T
S: T3
TXN 3
COMMIT_T
S: T2
TXN N
COMMIT_T
S: N’
…
mysql-
bin.00000:1
00
mysql-
bin.00000:1
01
mysql-
bin.00000:1
02
mysql-
bin.00000:
N
NTP
41. HBase Versioning
Rows CF:Columns Version CommitTS
<ShardKey><DB_TABLE_#1><
PK_a=A>
id mysql-bin.00000:100 T0
<ShardKey><DB_TABLE_#1><
PK_a=A>
id mysql-bin.00000:101 T1
<ShardKey><DB_TABLE_#1><
PK_a=A>
id mysql-bin.00000:102 T3
<ShardKey><DB_TABLE_#1><
PK_a=A>
id mysql-bin.00000:103 T2
43. PITR Semantics: Binlog Commit Time Index
Rows Version(LogicalOffset) Value
<ShardKey><DB_TABLE_#1><
2016-05-2323><100>
100 mysql-bin.00000:100
<ShardKey><DB_TABLE_#1><
2016-05-2323><101>
101 mysql-bin.00000:101
<ShardKey><DB_TABLE_#1><
2016-05-2323><103>
103 mysql-bin.00000:103
<ShardKey><DB_TABLE_#1><2
016-05-2400><102>
102 mysql-bin.00000:102
First mutation
across PITR
The last
mutation before
PITR
44. Streaming DB Export
• Pros:
• Consistent
• High SLA for the daily snapshot
• Consistent as PITR semantics
• Near real time ad hoc query
• Hive/Spark compatible
• Hourly snapshot view
• Low storage cost
• Cons:
• Schema change