4. Reproducing these results
h t t p s : / / d 0. a ws s t a t i c. c o m / p r o d u c t - m a rk e t i n g/ A u r o r a / R DS _ A u r o r a _ Pe r f o r m a n c e _A s s es s me n t _ B e n c h m a r k i ng _v 1 - 2. p d f
AMAZON
AURORA
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
R3.8XLARGE
• Create an Amazon VPC (or use an existing one).
• Create four EC2 R3.8XL client instances to run the SysBench
client. All four should be in the same AZ.
• Enable enhanced networking on your clients.
• Tune your Linux settings (see whitepaper).
• Install Sysbench version 0.5.
• Launch a r3.8xlarge Amazon Aurora DB instance in the sam
e VPC and AZ as your clients.
• Start your benchmark!
1
2
3
4
5
6
7
6. Do fewer IOs
Minimize network packets
Cache prior results
Offload the database engine
DO LESS WORK
Process asynchronously
Reduce latency path
Use lock-free data structures
Batch operations together
BE MORE EFFICIENT
How do we achieve these results?
DATABASESAREALLABOUT I/O
NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND
HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
7. IO traffic in RDS MySQL
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T YPE O F WRI T E
MYSQL WITH STANDBY
Issue write to EBS – EBS issues to mirror, ack when both done
Stage write to standby instance using DRBD
Issue write to EBS on standby instance
IO FLOW
Steps 1, 3, 5 are sequential and synchronous
This amplifies both latency and jitter
Many types of writes for each user operation
Have to write data blocks twice to avoid torn writes
OBSERVATIONS
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
PERFORMANCE
30 minute SysBench write-only workload, 100 GB data set, RDS SingleAZ, 30K PIOPS
EBS mirrorEBS mirror
AZ 1 AZ 2
Amazon S3
EBS
Amazon Elastic Bloc
k Store (EBS)
Primary
instance
Standby
instance
1
2
3
4
5
8. IO traffic in Aurora (database)
AZ 1 AZ 3
Primary
instance
Amazon S3
AZ 2
Replica
instance
AMAZON AURORA
ASYNC
4/6 QUORUM
DISTRIBUTED WRITE
S
BINLOG DATA DOUBLE-WRITELOG FRM FILES
T YPE O F WRI T E
30 minute SysBench write-only workload, 100 GB data set
IO FLOW
Only write redo log records; all steps asynchronous
No data block writes (checkpoint, cache replacement)
6X more log writes, but 9X lessnetwork traffic
Tolerant of network and storage outlier latency
OBSERVATIONS
27,378K transactions 35X MORE
950K I/Os per 1M txns (6X amplification) 7.7X LESS
PERFORMANCE
Boxcar redo log records – fully ordered by LSN
Shuffle to appropriate segments – partially ordered
Boxcar to storage nodes and issue writes
9. IO traffic in Aurora (storage node)
LOG RECORDS
Primary in
stance
INCOMING QUEUE
STORAGE NODE
S3 BACKUP
1
2
3
4
5
6
7
8
UPDATE
QUEUE
ACK
HOT
LOG
DATA
BLOCKS
POINT IN TIME
SNAPSHOT
GC
SCRUB
COALESCE
SORT
GROUP
PEER-TO-PEER GOSSIPPeer
storage
nodes
All steps are asynchronous
Only steps 1 and 2 are in foreground latency path
Input queue is 46X lessthan MySQL (unamplified, per node)
Favor latency-sensitive operations
Use disk space to buffer against spikes in activity
OBSERVATIONS
IO FLOW
① Receive record and add to in-memory queue
② Persist record and ACK
③ Organize records and identify gaps in log
④ Gossip with peers to fill in holes
⑤ Coalesce log records into new data block versions
⑥ Periodically stage log and new block versions to S3
⑦ Periodically garbage collect old versions
⑧ Periodically validate CRC codes on blocks
10. Asynchronous group commits
Read
Write
Commit
Read
Read
T1
Commit (T1 )
Commit (T2 )
Commit (T3)
LSN 1 0
LSN 1 2
LSN 22
LSN 5 0
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit (T4)
Commit (T5)
Commit (T6)
Commit (T7)
Commit (T8 )
LSN GROWTH
Durable LSN at head-node
COMMIT QUEUE
Pending commits in LSN order
TIME
GROUP
COMMIT
TRANSACTIONS
Read
Write
Commit
Read
Read
T1
Read
Write
Commit
Read
Read
Tn
§ TRADITIONAL APPROACH AMAZON AURORA
Maintain a buffer of log records to write out to disk
Issue write when buffer full or time out waiting for writes
First writer has latency penalty when write rate is low
Request I/O with first write, fill buffer till write picked up
Individual write durable when 4 of 6 storage nodes ACK
Advance DB durable point up to earliest pending ACK
11. § Re-entrant connections multiplexed to active threads
§ Kernel-space epoll() inserts into latch-free event queue
§ Dynamically size threads pool
§ Gracefully handles 5,000+ concurrent client sessions on r3.8xl
Standard MySQL – one thread per connection
Doesn’t scale with connection count
MySQL EE – connections assigned to thread group
Requires careful stall threshold tuning
CLIENT CONNECTION
CLIENT CONNECTION
LATCH FREE
TASK QUEUE
epoll()
MYSQL THREAD MODEL AURORA THREAD MODEL
Adaptive thread pool
12. IO traffic in Aurora (read replica)
PAGE CACHE
UPDATE
Aurora master
30% Read
70% Write
Aurora replica
100% New Reads
Shared Multi-AZ Storage
MySQL master
30% Read
70% Write
MySQL replica
30% New Reads
70% Write
SINGLE-THREADED
BINLOG APPLY
Data volume Data volume
§ Logical: Ship SQL statements to replica.
§ Write workload similar on both instances.
§ Independent storage.
§ Can result in data drift between master and
replica.
Physical: Ship redo from master to replica.
Replica shares storage. No writes performed.
Cached pages have redo applied.
Advance read view when all commits seen.
MYSQL READ SCALING AMAZON AURORA READ SCALING
13. Improvements over the past few months
Write batch size tuning
Asynchronous send for read/write I/Os
Purge thread performance
Bulk insert performance
BATCH OPERATIONS
Failover time reductions
Malloc reduction
System call reductions
Undo slot caching patterns
Cooperative log apply
OTHER
Binlog and distributed transactions
Lock compression
Read-ahead
CUSTOMER FEEDBACK
Hot row contention
Dictionary statistics
Mini-transaction commit code path
Query cache read/write conflicts
Dictionary system mutex
LOCK CONTENTION
15. Storage node availability
§ Quorum system for read/write; latency tolerant
§ Peer-to-peer gossip replication to fill in holes
§ Continuous backup to S3 (designed for 11 9s
durability)
§ Continuous scrubbing of data blocks
§ Continuous monitoring of nodes and disks for
repair
§ 10 GB segments as unit of repair or hotspot
rebalance to quickly rebalance load
§ Quorum membership changes do not stall writes
AZ 1 AZ 2 AZ 3
Amazon S3
16. Traditional databases
§ Have to replay logs since the last
checkpoint
§ Typically 5 minutes between checkpoints
§ Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
§ Underlying storage replays redo records
on demand as part of a disk read
§ Parallel, distributed, asynchronous
§ No replay for startup
Checkpointed data Redo log
Crash at T0 requires
a reapplication of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo logs being applied t
o each segment on demand, in parallel, asynchro
nously
Instant crash recovery
17. Survivable caches
§ We moved the cache out of the
database process
§ Cache remains warm in the event of
a database restart
§ Lets you resume fully loaded
operations much faster
§ Instant crash recovery + survivable
cache = quick and easy recovery
from DB failures
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
Caching process is outside the DB process
and remains warm across a database restart
18. Faster, more predictable failover
App
runningFailure detection DNS propagation
Recovery Recovery
DB
failure
MYSQL
App
running
Failure detection DNS propagation
Recovery
DB
failure
AURORA WITH MARIADB DRIVER
1 5 - 2 0 s e c
3 - 2 0 s e c
20. § “Amazon Aurora was able to satisfy all of our scale
requirements with no degradation in performance. With
Alfresco on Amazon Aurora we scaled to 1 billion documents
with a throughput of 3 million per hour, which is 10 times
faster than our MySQL environment. It just works!"
- John Newton, Founder and CTO of Alfresco
21. § “We ran our compatibility test suites againstAmazon Aurora and
everything just worked.Amazon Aurora paired with Tableau means
data users can take advantage of the 5x throughputAmazon Aurora
provides and deliver faster analytic insights throughouttheir
organizations.We look forward to offering our Amazon Aurora Tableau
connector."
- Dan Jewett, Vice President of Product Managementat Tableau
22. § "기존 RDS는 스토리지의 용량과 IOPS의 필요치를 예측해서 설정해야 했습니다. 하지만
Aurora에서는 이를 예측할 필요 없이 필요한 만큼 사용할 수 있습니다.덕분에 비용 절감은
물론이고, I/O 병목에 대한 걱정을 덜 수 있었습니다."
§ "기존 DB에서 Aurora로 migration하는 데에 어려움이 있었습니다.하지만 최근에 나온 AWS
Database Migration Service를 활용하면 좀 더 쉽게 migration을 진행할 수 있으리라
봅니다."