SlideShare a Scribd company logo
1 of 28
Using Apache Cassandra and
Apache Kafka to Scale Next Gen
Applications
Adam Zegelin
Founding Software Engineer, Instaclustr
1.Xxxxxxxxx
xxxxx
Introduction
• Adam Zegelin
• Co-founded Instaclustr 5 years ago
• In Canberra, Australia
• Current focus is Cassandra on Kubenetes
• Instaclustr
• Managed Apache Cassandra, Spark and Kafka in the ☁️
 AWS, GCP, Azure & IBM
 3000 nodes under management
 24×7×365 support
• Consulting
 Schema & application design
 Workshops & Training
• 2nd-level on-call support for on-premise deployments
Agenda
• Introduction to Cassandra and Kafka
• Real-world Use Cases
• Worldpay
• Lendi
• Instaclustr
• Partitioning: the key to scale
• Fitting and architecting for your use case
• Linearly Scalable
• Always Available
• Multi-Region Data
Store
• Apache Cassandra is the leading NoSQL operational
database for high-scale and high-reliability applications.
• Shared nothing peer-to-peer architecture provides
reliability up to 100% (with Instaclustr SLAs).
• replicated data and multiple nodes capable of fulfilling queries
 Node outage? Service just keeps running
• full online maintenance and in-place upgrades
• Low latency for operational applications
• Sub-10ms P95 reads and writes achievable
• Native active-active multi data center support
• Geographic distribution (to meet latency requirements)
• Disaster resilience
• Workload isolation (analytics)
• Cassandra is a data storage system, not an
analytics/query engine or place to run logic
Typical Use Cases
• High write to read ratio
• Data is rarely updated
• Including explicit deletes
• The Primary Key is known at read time
• Limited filtering & aggregation
• No JOINs or referential integrity
• Transaction logging
• Time series data
• IoT status and event history
• Health tracker data
• Order & package statuses & tracking
• Weather service history
• Messages and email envelopes
Queuing, Pub/Sub and
Streaming at Scale
• Apache Kafka is a distributed streaming platform
• Publish and subscribe to streams of records
 Similar to a message queue or EMS
• Store streams of records
 Fault-tolerant
 Durable
• Process streams of records
 as they occur
 randomly, any position in the stream
• Replicated architecture
• High-level similarities to Cassandra
• Scalability
• Reliability
Typical Use Cases
• As a message bus
• Loose coupling between producers and consumers
• Basis for micro-services
• As a commit log
• A store of logical transactions
• Populating analytical data stores or edge caches
• As a buffer
• Manage backpressure & workload spikes
And when combined with Kafka Streams/Spark Streaming…
• As the basis of a streaming architecture
• (near) real-time analytics
• Data processing pipelines
Typical Use Cases
cont’d
• Website activity tracking
• Page views
• Searches
• Other user actions
• Metrics
• Operational monitoring data
• Log aggregation
• Centralized logging
• Event sourcing
• Application state changes
• “we don't just want to see where we are, we also want to know
how we got there”
Case study
• Payment processor
• spun out of RBS in 2010
• merged with Vantive in US in Jan 2018 for USD 10.4B to form
WorldPay Inc.
• Processes
• >40 Million transactions per day
• for 400,000 merchants
• 42% of all UK non-cash transactions
Case study
cont’d
• Re-architecting of WorldPay’s XML Payment API
• facilitates ~40M transactions per month
• New architecture based on open source technologies
• including Cassandra and Kafka
• to provide scalability, availability and reduced costs
• New Idempotency Service
• first project to use the new architecture
• provides capabilities to ensure payments are not repeated
Case study
cont’d
• Challenges
• Tight deployment timeframe
• Very high availability expectations
• Low latency requirements
• Utilises Cassandra to provide highest levels of availability
and scalability
• 18 node cluster
• 3 AWS regions (in Europe)
• Leverages Cassandras tuneable consistency
 QUORUM = strong consistency across regions
 still able to operate with a whole region unavailable
 Latency is tolerable (restricted to EU)
• Simple data model with atomic reads/writes
 fits well with Cassandra capability
Case study
cont’d
• Worked with Instaclustr to accelerate development and
time to stable service:
• Consulting engagement assisted with data model design
• Cassandra cluster run on Instaclustr managed service
 production ready in weeks
• Initial preference was to run on-prem
• security compliance
• did not expect cloud to meet latency requirements
• However, timeframes did not allow establishment of
internal deployment
• Used Instaclustr’s managed Cassandra service on AWS for
initial go-live.
• Now satisfied as a long-term solution
Case study
• Australia’s leading online home loan lender
• Processing over 90% of Australia’s online lending enquiries.
• Re-architecture of their platform following a major
funding round
• customer and data-centric
Case study
cont’d
• Integration-heavy environment
• Bespoke interfaces with banks, etc.
• Moving to a micro-services architecture
• Kafka as a message bus
• New architecture
• Decoupled application code from embedded data sets from
various business applications
• Unified data models from the various point solutions and
market segments
• Enabled extensive scale
 supports rapid and large growth in data as the consumer base
grows
Case study
• Cassandra
• Storage for monitoring metrics & events
• Custom collector
• RabbitMQ transport
 Will eventually move to Kafka as the transport
• Metrics are processed by Riemann
 Raises PagerDuty alerts, tickets, emails
 Writes to Cassandra
• Kafka
• Centralised logging
• Events are collected by fluentd
• Pumped into LogStash via Kafka
• Indexed via ElasticSearch
• Viewed with Kibana
Partitioning
The key to scale
• Partitioning
• using a key in your data to split the data across multiple
servers
• Manual partitioning is possible but painful
• Cassandra and Kafka make partitioning transparent
• needs conscious consideration
1.Xxxxxxxxx
xxxxx
Cassandra Cluster
Cluster
Data Center (optional)
Rack (optional, recommended)
Node
1.Xxxxxxxxx
xxxxx
Partitioning
Partitioning
Partitioning
1.Xxxxxxxxx
xxxxx
Cassandra Partitions
Queuing and Streaming at Scale
1.Xxxxxxxxx
xxxxxQueuing and Streaming at Scale
● Broker
○ Node/server/VM
● Topic
○ Logical grouping of data (category/feed/name)
○ Settings:
○ Replication
○ Partition count
○ Retention
○ Compaction
○ …
Kafka Brokers, Topics and Partitions
1.Xxxxxxxxx
xxxxxQueuing and Streaming at Scale
Partition
○ Subset of messages in a topic
■ Have a single master broker
■ Guarantee ordered delivery within that
subset
○ Number of partitions is set on topic creation
Kafka Topics and Partitions (cont’d)
1.Xxxxxxxxx
xxxxxQueuing and Streaming at Scale
• Messages are mapped to a partition by the Producer
• Randomly/round-robin
• Hash of record key
• Consumers are members of Consumer Groups
• Consumer Groups register to consume records from
Topics
• Each Consumer in a Consumer Group is the exclusive
consumer of a “fair share” of partitions in the topic.
Kafka Partitions in Action
Fitting and
architecting
for your
use case
Cassandra
• Big data
• one or more individually big (>1TB) tables
• Need to pre-determine read pattern
• at least to partition key
• Very low cost writes
• great for high write / read ratio use cases
• Ideal for small reads
• 1, 10, 100, 1000 rows at a time
• No limits to horizontal scaling (data size or ops/sec)
• provided you can find a partition that fits.
• No relational integrity
• No Foreign Keys, no JOIN’s
• Limited filtering, aggregation
Fitting and
architecting
for your
use case
Kafka
• Big data
• 5k+ message/topic/second
• Not transactional
• unlike traditional MQ tech
• although guaranteed once delivery now available
• Kafka Streams very powerful tool for analysis and
mutations on data streams
Adam Zegelin
adam@instaclustr.com
Founding Software
Engineer
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

More Related Content

What's hot

Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...HostedbyConfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Streaming and Social Media
Streaming and Social MediaStreaming and Social Media
Streaming and Social MediaJoe Olson
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Building event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemBuilding event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...confluent
 
Modernising Change - Lime Point - Confluent - Kong
Modernising Change - Lime Point - Confluent - KongModernising Change - Lime Point - Confluent - Kong
Modernising Change - Lime Point - Confluent - Kongconfluent
 
Architecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant AnalyticsArchitecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant Analyticsconfluent
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Guido Schmutz
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka EcosystemGuido Schmutz
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafkaconfluent
 

What's hot (20)

Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Streaming and Social Media
Streaming and Social MediaStreaming and Social Media
Streaming and Social Media
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Building event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemBuilding event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka Ecosystem
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
The Bridge to Cloud (Peter Gustafsson, Confluent) London 2019 Confluent Strea...
 
Modernising Change - Lime Point - Confluent - Kong
Modernising Change - Lime Point - Confluent - KongModernising Change - Lime Point - Confluent - Kong
Modernising Change - Lime Point - Confluent - Kong
 
Architecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant AnalyticsArchitecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant Analytics
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 

Similar to Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxJohn Burwell
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...RightScale
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_finalSergioBruno21
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...J On The Beach
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementDATAVERSITY
 
NephoScale Elastic Networking
NephoScale Elastic NetworkingNephoScale Elastic Networking
NephoScale Elastic NetworkingNephoScale
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 

Similar to Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications (20)

Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...Lessons learnt from building a globally distributed database service from the...
Lessons learnt from building a globally distributed database service from the...
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
 
NephoScale Elastic Networking
NephoScale Elastic NetworkingNephoScale Elastic Networking
NephoScale Elastic Networking
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

  • 1. Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications Adam Zegelin Founding Software Engineer, Instaclustr
  • 2. 1.Xxxxxxxxx xxxxx Introduction • Adam Zegelin • Co-founded Instaclustr 5 years ago • In Canberra, Australia • Current focus is Cassandra on Kubenetes • Instaclustr • Managed Apache Cassandra, Spark and Kafka in the ☁️  AWS, GCP, Azure & IBM  3000 nodes under management  24×7×365 support • Consulting  Schema & application design  Workshops & Training • 2nd-level on-call support for on-premise deployments
  • 3. Agenda • Introduction to Cassandra and Kafka • Real-world Use Cases • Worldpay • Lendi • Instaclustr • Partitioning: the key to scale • Fitting and architecting for your use case
  • 4. • Linearly Scalable • Always Available • Multi-Region Data Store • Apache Cassandra is the leading NoSQL operational database for high-scale and high-reliability applications. • Shared nothing peer-to-peer architecture provides reliability up to 100% (with Instaclustr SLAs). • replicated data and multiple nodes capable of fulfilling queries  Node outage? Service just keeps running • full online maintenance and in-place upgrades • Low latency for operational applications • Sub-10ms P95 reads and writes achievable • Native active-active multi data center support • Geographic distribution (to meet latency requirements) • Disaster resilience • Workload isolation (analytics) • Cassandra is a data storage system, not an analytics/query engine or place to run logic
  • 5. Typical Use Cases • High write to read ratio • Data is rarely updated • Including explicit deletes • The Primary Key is known at read time • Limited filtering & aggregation • No JOINs or referential integrity • Transaction logging • Time series data • IoT status and event history • Health tracker data • Order & package statuses & tracking • Weather service history • Messages and email envelopes
  • 6. Queuing, Pub/Sub and Streaming at Scale • Apache Kafka is a distributed streaming platform • Publish and subscribe to streams of records  Similar to a message queue or EMS • Store streams of records  Fault-tolerant  Durable • Process streams of records  as they occur  randomly, any position in the stream • Replicated architecture • High-level similarities to Cassandra • Scalability • Reliability
  • 7. Typical Use Cases • As a message bus • Loose coupling between producers and consumers • Basis for micro-services • As a commit log • A store of logical transactions • Populating analytical data stores or edge caches • As a buffer • Manage backpressure & workload spikes And when combined with Kafka Streams/Spark Streaming… • As the basis of a streaming architecture • (near) real-time analytics • Data processing pipelines
  • 8. Typical Use Cases cont’d • Website activity tracking • Page views • Searches • Other user actions • Metrics • Operational monitoring data • Log aggregation • Centralized logging • Event sourcing • Application state changes • “we don't just want to see where we are, we also want to know how we got there”
  • 9. Case study • Payment processor • spun out of RBS in 2010 • merged with Vantive in US in Jan 2018 for USD 10.4B to form WorldPay Inc. • Processes • >40 Million transactions per day • for 400,000 merchants • 42% of all UK non-cash transactions
  • 10. Case study cont’d • Re-architecting of WorldPay’s XML Payment API • facilitates ~40M transactions per month • New architecture based on open source technologies • including Cassandra and Kafka • to provide scalability, availability and reduced costs • New Idempotency Service • first project to use the new architecture • provides capabilities to ensure payments are not repeated
  • 11. Case study cont’d • Challenges • Tight deployment timeframe • Very high availability expectations • Low latency requirements • Utilises Cassandra to provide highest levels of availability and scalability • 18 node cluster • 3 AWS regions (in Europe) • Leverages Cassandras tuneable consistency  QUORUM = strong consistency across regions  still able to operate with a whole region unavailable  Latency is tolerable (restricted to EU) • Simple data model with atomic reads/writes  fits well with Cassandra capability
  • 12. Case study cont’d • Worked with Instaclustr to accelerate development and time to stable service: • Consulting engagement assisted with data model design • Cassandra cluster run on Instaclustr managed service  production ready in weeks • Initial preference was to run on-prem • security compliance • did not expect cloud to meet latency requirements • However, timeframes did not allow establishment of internal deployment • Used Instaclustr’s managed Cassandra service on AWS for initial go-live. • Now satisfied as a long-term solution
  • 13. Case study • Australia’s leading online home loan lender • Processing over 90% of Australia’s online lending enquiries. • Re-architecture of their platform following a major funding round • customer and data-centric
  • 14. Case study cont’d • Integration-heavy environment • Bespoke interfaces with banks, etc. • Moving to a micro-services architecture • Kafka as a message bus • New architecture • Decoupled application code from embedded data sets from various business applications • Unified data models from the various point solutions and market segments • Enabled extensive scale  supports rapid and large growth in data as the consumer base grows
  • 15. Case study • Cassandra • Storage for monitoring metrics & events • Custom collector • RabbitMQ transport  Will eventually move to Kafka as the transport • Metrics are processed by Riemann  Raises PagerDuty alerts, tickets, emails  Writes to Cassandra • Kafka • Centralised logging • Events are collected by fluentd • Pumped into LogStash via Kafka • Indexed via ElasticSearch • Viewed with Kibana
  • 16. Partitioning The key to scale • Partitioning • using a key in your data to split the data across multiple servers • Manual partitioning is possible but painful • Cassandra and Kafka make partitioning transparent • needs conscious consideration
  • 17. 1.Xxxxxxxxx xxxxx Cassandra Cluster Cluster Data Center (optional) Rack (optional, recommended) Node
  • 22. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale ● Broker ○ Node/server/VM ● Topic ○ Logical grouping of data (category/feed/name) ○ Settings: ○ Replication ○ Partition count ○ Retention ○ Compaction ○ … Kafka Brokers, Topics and Partitions
  • 23. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale Partition ○ Subset of messages in a topic ■ Have a single master broker ■ Guarantee ordered delivery within that subset ○ Number of partitions is set on topic creation Kafka Topics and Partitions (cont’d)
  • 24. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale • Messages are mapped to a partition by the Producer • Randomly/round-robin • Hash of record key • Consumers are members of Consumer Groups • Consumer Groups register to consume records from Topics • Each Consumer in a Consumer Group is the exclusive consumer of a “fair share” of partitions in the topic. Kafka Partitions in Action
  • 25. Fitting and architecting for your use case Cassandra • Big data • one or more individually big (>1TB) tables • Need to pre-determine read pattern • at least to partition key • Very low cost writes • great for high write / read ratio use cases • Ideal for small reads • 1, 10, 100, 1000 rows at a time • No limits to horizontal scaling (data size or ops/sec) • provided you can find a partition that fits. • No relational integrity • No Foreign Keys, no JOIN’s • Limited filtering, aggregation
  • 26. Fitting and architecting for your use case Kafka • Big data • 5k+ message/topic/second • Not transactional • unlike traditional MQ tech • although guaranteed once delivery now available • Kafka Streams very powerful tool for analysis and mutations on data streams

Editor's Notes

  1. Lower throughput system