Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, Spark and Aerospike (Kiran Matty, Aerospike) Kafka Summit 2020

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 13 Anzeige

Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, Spark and Aerospike (Kiran Matty, Aerospike) Kafka Summit 2020

Herunterladen, um offline zu lesen

Real-time connectivity of databases and systems is critical in enterprises adopting digital transformation to support super-fast decisioning to drive applications like fraud detection, digital payments, recommendation engines. This talk will focus on the many functions that database streaming serves with Kafka, Spark and Aerospike. We will explore how to eliminate the wall between transaction processing and analytics by synthesizing streaming data with system of record data, to gain key insights in real-time.

Real-time connectivity of databases and systems is critical in enterprises adopting digital transformation to support super-fast decisioning to drive applications like fraud detection, digital payments, recommendation engines. This talk will focus on the many functions that database streaming serves with Kafka, Spark and Aerospike. We will explore how to eliminate the wall between transaction processing and analytics by synthesizing streaming data with system of record data, to gain key insights in real-time.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, Spark and Aerospike (Kiran Matty, Aerospike) Kafka Summit 2020 (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, Spark and Aerospike (Kiran Matty, Aerospike) Kafka Summit 2020

  1. 1. 1 Distributed Data Storage and Streaming for Real- time Decisioning using Kafka, Spark and Aerospike Kiran Matty August 27, 2020
  2. 2. 2 ▪ Director of PM for Ecosystem @ Aerospike ▪ Domain experience spans Big Data Infrastructure and Data Security @ Visa, Hortonworks, HPE, and Cisco ▪ Interests include large scale distributed systems and AI/ML ▪ Lego builder in spare time Who Am I?
  3. 3. 3 About Aerospike Aerospike Delivers Superior Reliability and Performance at the Lowest TCO Lowest TCO TCO ($) Scale TB Large, growing, unmet need Alternative TCO Aerospike TCO Performance Scale TB Significant functional overlap - Commodity DB problem set Strategic Operational Apps - Superior Uptime & Resiliency - Transactional - Low latency
  4. 4. 4 The Aerospike Difference Patented Flash Optimized Storage Layer ü Significantly higher performance & IOPS Multi-threaded Massively Parallel ü ‘Scale up’ and ‘Scale out’ Self-healing clusters ü Superior Uptime, Availability and Reliability ü Single-hop to data Storage indices in DRAM Data on optimized SSD’s ü Predictable Performance regardless of scale patented Aerospike Hybrid Memory Architecture TM
  5. 5. 5 A Seamless Bridge b/w Transactional Systems & Biz Apps and Solutions
  6. 6. 6 Aerospike Connect for Real-time Streaming Use Cases Real-time monitoring Fintech IIoT/Predictive Maintenance AIPersonalization/360o profile Fraud Detection
  7. 7. 7 What is Aerospike Connect for Kafka? Outbound* Inbound* Kafka Producer IOT/edge devices Supported Formats *works on both Apache Kafka and Confluent Platform Change Notification
  8. 8. 8 Spark/Kafka Connector Python Client StorageNotebook&ML Packages PlatformOrchestration Aerospike Connect for AI/ML (a Blueprint) Compute
  9. 9. 9 Data Warehouse Data Lake Legacy RDBMS HDFS Based XDR Edge Systems System of Record Aerospike Database API API Aerospike Connect @ Scale Aerospike Database Aerospike Database XDR API XDR StreamingAPI Filesystem HDFS BasedLegacy RDBMS Spark Cluster (300 nodes) ü 33 Node Aerospike cluster used ü 4,096 Aerospike partitions mapped to 215 (32,768)max Spark partitions per namespace to achieve massive parallelization ü Max 32 namespaces are supported per cluster Spark Connector API 2 1 1 2 Training Inference 3rd party data Kafka Connector
  10. 10. 10 Real-Time Processing of Trading Data w/ Aerospike Aerospike Database Aerospike Database Real-time stock ticker data Note: Conceptual view
  11. 11. 11 HPE COVID-19 Response w/ Aerospike Solution: Schema-less Data Mining Architecture for Rapid COVID-19 Knowledge Discovery: “How quickly is COVID-19 spreading?” “How likely is its recurrence after recovery?”, etc. Requirements for Aerospike: 1. Sub-millisecond read/write latency 2. Millions of IOPS 3. Low TCO 4. Strong Consistency Results: Aerospike was successfully used for high velocity ingest to enable Real-time analytics downstream Source: Theresa Melvin, Chief Architect of AI Driven Big Data Solutions, HPE Aerospike Python Client Aerospike Flink Connector* *Not a GA’d product
  12. 12. 12 ▪ Lower TCO – Combines cost efficiencies of Kafka and Aerospike ▪ Reduce time to insight – Combines the speed and parallelism of Kafka and Aerospike ▪ Deploy Anywhere Why Aerospike Connect for Apache Kafka?
  13. 13. 13 Thank you Questions? Email me at kmatty@aerospike.com

×