Weitere ähnliche Inhalte
Ähnlich wie Predictable Big Data Performance in Real-time (20)
Mehr von Aerospike, Inc. (6)
Kürzlich hochgeladen (20)
Predictable Big Data Performance in Real-time
- 2. Response time: Hours, Weeks
TB to PB
Read Intensive
TRANSACTIONS (OLTP)
Response time: Seconds
Gigabytes of data
Balanced Reads/Writes
ANALYTICS (OLAP)
STRUCTURED
DATA
Response time: Seconds
Terabytes of data
Read Intensive
© 2013 Aerospike. All rights reserved. Confidential Pg. 2
BIG DATA ANALYTICS
Real-time Transactions
Response time: < 10 ms
1-20 TB
Balanced Reads/Writes
24x7x365 Availability
UNSTRUCTURED DATA
REAL-TIME BIG DATA
Database Landscape
- 3. Requirements for Internet Enterprises
1. Know who the Interaction is
with
Monitor 200+ Million US Consumers,
5+ Billion mobile devices and sensors
1. Determine intent based on
current context
Page views, search terms, game state,
last purchase, friends list, ads served,
location
1. Respond now, use big data for
more accurate decisions
Display the most relevant Ad
Recommend the best product
Deliver the richest gaming experience
Eliminate fraud…
1. Service can NEVER go down!
© 2013 Aerospike. All rights reserved. Confidential Pg. 3
- 4. Challenges
1. Handle extremely high rates of persistent
read/write transactions
2. Avoid hot spots to maintain tight latency SLAs
3. Provide immediate consistency with replication
4. Allow long running tasks with transactions
5. Scale linearly as data sizes increase
1. Add capacity with no service interruption
© 2013 Aerospike. All rights reserved. Pg. 4
- 5. Native Flash Performance
➤ Low Latency at High Throughput
© 2012 Aerospike. All rights reserved. Confidential Pg. 5
- 6. © 2013 Aerospike. All rights reserved. Confidential Pg. 6
“Only Aerospike was able to function in synchronous mode with a replication
factor of two.. it is a significant advantage that Aerospike is able to function
reliably on a smaller amount of hardware while still maintaining true consistency.”
- 7. Shared-Nothing Architecture
© 2013 Aerospike. All rights reserved. Pg. 7
OHIO Data Center
➤ Every node in a cluster is identical,
handles both transactions and long
running tasks
➤ Data is replicated synchronously with
immediate consistency within the
cluster
➤ Data is replicated asynchronously
across data centers
- 8. Distributed Hash Table
How Data Is Distributed (Replication Factor 2)
➤ Every key is hashed into a
20 byte (fixed length) string
using the RIPEMD160 hash function
➤ This hash + additional data
(fixed 64 bytes)
are stored in RAM in the index
➤ 4 bytes of this hash are used to
compute the partition id
➤ There are 4096 partitions
➤ Partition id maps to node id
based on cluster membership
© 2013 Aerospike. All rights reserved. Pg. 8
cookie-abcdefg-12345678cookie-abcdefg-12345678
182023kh15hh3kahdjsh182023kh15hh3kahdjsh
Partition
ID
Master
node
Replica
node
… 1 4
1820 2 3
1821 3 2
4096 4 1
- 9. Organizing the cluster
➤ Automatic multicast gossip protocol for node discovery
➤ Paxos consensus algorithm determines nodes in cluster
➤ Ordered list of nodes determines data location
➤ Data partitions balanced for minimal data motion
➤ Vote initiated and terminated in 100 milliseconds
© 2013 Aerospike. All rights reserved. Pg. 9
- 10. How it Works
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory
and replica memory
synchronously
4. Queue operations to disk
5. Signal completed transaction
(optional storage commit wait)
6. Master applies conflict resolution
policy (rollback/ rollforward)
© 2013 Aerospike. All rights reserved. Pg. 10
master replica
1. Cluster discovers new node via
gossip protocol
2. Paxos vote determines new data
organization
3. Partition migrations scheduled
4. When a partition migration starts,
write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data
deleted
transactions
continue
Writing with Immediate Consistency Adding a Node
- 11. Intelligent Client
Shields Applications from the Complexity of the Cluster
➤ Implements Aerospike API
➤ Optimistic row locking
➤ Optimized binary protocol
➤ Cluster tracking
Learns about cluster
changes, partition map
Gossip protocol
➤ Transaction semantics
Global transaction ID
Retransmit and timeout
© 2013 Aerospike. All rights reserved. Pg. 11
- 12. Cross Data Center Replication (XDR)
➤ Asynchronous replication for long link
delays and outages
➤ Namespace is configured to replicate to a
destination cluster – master / slave,
including star and ring
➤ Replication process
Transaction journal on partition master and
replica
XDR process writes batches to destination
Transmission state shared with source
replica
Retransmission in case of network fault
When data arrives back at originating
cluster, transaction ID matching prevents
subsequent application and forwarding
➤ In master / master replication, conflict
resolution via multiple versions, or
timestamp
© 2013 Aerospike. All rights reserved. Confidential Pg. 12
- 13. Multi-core Optimization
Right Architecture
Shared nothing
In-memory (or multiple SSDs)
Tight code loop
Lock free isolation
OS, Programming Language, Libraries
Modern Linux kernel
C language
Use epoll
Tweaks
Pin threads to processor cores
IRQ affinity settings for NIC
CPU Socket Isolation via pairing of CPU to NIC
Russ’s 10 Ingredient Recipe for
Making 1 Million TPS on $5K Hardware
© 2013 Aerospike. All rights reserved. Pg. 13
- 14. Flash-optimized Storage Layer
➤ Direct device access
Direct attach performance
Data written in flash optimal
large block patterns
All indexes in RAM for low wear
Constant background
defragmentation
Log structured file system, “copy
on write”
Clean restart through shared
memory
➤ Random distribution using hash
does not require RAID hardware
© 2013 Aerospike. All rights reserved. Pg. 14
…
SSD performance varies widely
•Aerospike has a certified
hardware list
•Free SSD certification tool,
CIO, is also available
- 15. Native Flash 17x better TCO
“…data-in-DRAM implementations like SAP HANA..should be bypassed…
..current leading data-in-flash database for transactional analytic apps
is Aerospike.” - David Floyer, CTO, Wikibon
© 2012 Aerospike. All rights reserved. Confidential | Pg. 15
$$$
http://wikibon.org/wiki/v/Data_in_DRAM_i
s_a_Flash_in_the_Pan
- 17. Proven in Production
➤ AppNexus - #2 RTB after Google
27 Billion auctions per day
600+ QPS
Aerospike servers in 6 clusters in 3
data centers
➤ Chango – #2 Search after Google
Sees more Searches than
Yahoo! + bing
Data on 300 Million users
➤ TradeDesk – first Ad Exchange
Facebook Exchange partner
FBX serves 25% of Ads on the
Internet
1200% growth in 2012
“Aerospike has operated
without interruptions
and easily scaled to meet
our performance demands.”
– Mike Nolet, CTO, AppNexus
© 2013 Aerospike. All rights reserved. Confidential Pg. 17
- 18. Proven in Production
➤ eXelate – Data on 500 Million users
Online data plus Nielsen, Mastercard,
Autobytel, Bizo data..
Data on 400 million users
20 Billion Transactions per month
4x2 TB data per cluster
4 clusters across 4 data centers
“Scale.
Real-time performance.
Real-time replication at 4 datacenters.
Aerospike delivered.”
- Elad Efraim, eXelate CTO
➤ BlueKai – Serves half the Fortune 30
#1 Data Exchange
2 Trillion Transactions per month
© 2013 Aerospike. All rights reserved. Confidential
- 19. Fast? Scale & Never Fail?
➤ Cluster-aware Client Layer
➤ Per Node Optimizations
Thread-core-pinning
Real-time prioritization
➤ Extremely efficient
primary index scheme
Index in DRAM
64 byte index entry size
Kernel quality C code;
no degradation due to
Java garbage collection
➤ Flash-optimized Data
Layer
➤ Shared-nothing
Distribution Layer
Intelligent data
migration and re-
balancing
Smart data expiration
and eviction
Rolling upgrades and
background backups
➤ Cross Datacenter
Replication (XDR)
What makes Aerospike…
➤ © 2013 Aerospike. All rights reserved. Pg. 19
- 20. Mission
➤ Build the Modern Real-time Data Platform
1. Scaling the Internet of Everything
2. Pushing the limits of modern hardware
3. No data loss and No downtime
© 2013 Aerospike. All rights reserved. Confidential Pg. 20
Publish &
Subscribe
• ASQL & NoSQL
• Powerful Aggregations
(MapReduce++)
• ASQL & NoSQL
• Powerful Aggregations
(MapReduce++)
• Secondary Index Queries
Transactions
• User Defined Functions (UDF)
Security
Encryption
Compression
AEROSPIKE REAL-TIME DATA DATA
PLATFORM
• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters
• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List
• Storage– DRAM, SSD, HDD
- 21. How to get Aerospike?
Free
Community Edition Enterprise Edition
➤ For developers looking
for speed and stability
and transparently scale
as they grow
All features for
2 nodes, 100GB
1 cluster
1 datacenter
Community support
➤ For mission critical apps
needing to scale right from
the start
Unlimited number of
nodes, clusters, data
centers
Cross data center
replication
Premium 24x7 support
Priced by TBs of unique
data (not replicas)
➤ © 2013 Aerospike. All rights reserved. Pg. 21