From financial services, to digital advertising, omni-channel marketing and retail, companies are pushing to grow revenue by personalizing the customer experience in real-time based on knowing what they care about, where they are, and what they are doing now. For growing numbers of these businesses, this means developing applications that combine the historical analysis provided by Hadoop with real-time analysis through Storm and within NoSQL databases, themselves. This session will examine the design considerations and development approaches for successfully delivering interactive applications that incorporate real-time and batch analysis using a combination of Hadoop, Storm and NoSQL. Key topics will include:· A review of the respective roles that Hadoop, Storm and NoSQL databases play.· Considerations in choosing which technology to use in areas where their capabilities overlap.· An overview of a typical solution architecture.· Strategies for addressing the diverse data types required for providing a complete view of the customers.· Approaches to managing large data types to ensure reliable real-time responses.Throughout the discussion, concepts will be illustrated by use cases of businesses that have implemented real-time applications using Hadoop, Storm and NoSQL, which are in production today.
This presentation was given at the 2014 NoSQL Matters conference in Cologne, Germany.
37. Aerospike: the gold standard for high throughput,
low latency, high reliability transactions
Performance
• Over ten trillion transactions per
month
• 99% of transactions faster than 2
ms
• 150K TPS per server
Scalability
• Billions of Internet users
• Clustered Software
• Automatic Data Rebalancing
Reliability
• 50 customers; zero service down-
time
• Immediate Consistency
• Rapid Failover; Data Center
Replication
Price/Performance
• Makes impossible projects
affordable
• Flash-optimized
• 1/10 the servers required
41. OHIO
1) No Hotspots
– DHT with RIPEMD160
simplifies data
partitioning
2) Smart Client – 1 hop to
data, no load balancers
3) Shared Nothing
Architecture,
every node identical
7) XDR – asynch replication
across data centers ensures
Zero Downtime
4) Single row ACID
– synch replication in cluster
5) Smart Cluster, Zero Touch
– auto-failover, rebalancing,
rolling upgrades..
6) Transactions and long running
tasks prioritized real-time
Simpler Scaling: Fewer Servers, ACID, Zero Touch
42. DISTRIBUTED QUERIES
1. “Scatter” requests to all nodes
2. Indexes in DRAM for fast map of secondary à primary keys
3. Indexes co-located with data to guarantee ACID,
manage migrations
4. Records read in parallel from all SSDs
using lock free concurrency control
5. Aggregate results on each node
6. “Gather” results from all nodes on client
STREAM AGGREGATIONS
1. Push Code/ Security Policies/ Rules to Data with UDFs
2. Pipe Query results through UDFs to
Filter, Transform, Aggregate.. Map, Reduce
REAL-TIME ANALYTICS on OPERATIONAL DATA (No ETL)
➤ In Database, within the same Cluster
➤ On the same Data, on XDR Replicated Clusters
Real-time Analytics on Operational Data