SlideShare a Scribd company logo
1 of 25
Distributed Data Processing
for Real-time Applications
Distributed Data Expert and
Solutions Architect at AWS
Mahee Gunturu
Outline
1. Modern Application Architectures
i. Microservices
ii. Serverless
2. Fundamentals of Event Processing
i. Basic Terminology
ii. Advancements in Event Processing Platforms
iii. Connectors
iv. Benefits
3. Patterns in Event Processing
4. State Propagation in Distributed Systems
i. Data mesh architectures
2
Modern
Application
Architectures
1
4
4
Application architectures are evolving!
EAI
EII
ETL ESB
SOA
(WS-*)
CEP
iP
aaS
Service
Mesh
Data
Mesh
2020’s
2010’s
2000’s
1990’s
1980’s
Modern Real-time Applications Architecture
Business logic
Purpose-built
databases
Events Events
APIs
Queues/Messages
5
5
6
6
Benefits of Serverless
No servers to manage
Automatically scales
Only pay for consumption
Event-Driven Architectures
Write less code
7
7
If your application is cloud-native,
large-scale, or distributed, and
doesn’t include a messaging
component, that’s probably a bug.
Tim Bray
General-purpose internet-software geek
Fundamentals of
Event Processing
2
What is an Event?
■ An event is a change in state, or an update that carries entropy
and has a timestamp associated with it. Events can either carry
state or can be identifiers.
Event based architectures have three key components:
● event producers
● event routers
● event consumers
9
What is an Event Stream?
■ A series of continuous events that represent the changing
behaviour of a system form an event stream.
Event stream processing usually combines the information
from many streams and/or data sources in real-time,
to derive actionable insights.
10
Stream Processing
11
Microservices
IOT
Operational
Applications
Databases
Producers
Consumers
Database
changes
Microservices
events
IOT
Logs
Point of Sale
txns
Streams of real time events
Stream processing apps
Databases
Point of sale
Basic Terminology
12
Producer
API
Consumer
API
Publisher Subscriber
● Producer
● Broker
● Topic
● Topic Partition
● Offset
● Consumer
● Schema Registry
Brokers
Topics
Pulsar Terminology
13
● Producer
● Broker
● Topic
● Cursor
● Consumer
● Namespace/bundles
● Subscription
■ Key-Shared
■ Failover
■ Exclusive
■ Shared
● Reader
● Schema Registry
● ack/nack
● Message Delivery
■ At-most Once
■ At-least once
■ Exactly once
● Bookie/Bookkeeper
● Segment
● Dispatcher
● Ledger
● Tenant
Tiered Storage
Stateless
Stateful
Multi-tenancy Model in Pulsar
14
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Instance
Pulsar Cluster
Pulsar Cluster Pulsar Cluster
Global Replication
15
Data Center 3
Data Center 2
Data Center 1
● Data is replicated
Asynchronously
● Kafka uses Mirror-Maker to
replicate the data
● Pulsar has built-in cross data
center replication that is used in
production already
Clients
■ Client API
● Language bindings for Java, Go, Python, C++, Node.js and C#.
● Community client drivers for rust, erlang, haskell, websockets etc…
■ SQL
● Trino (Presto SQL)
■ Kafka on Pulsar
● Client wrapper for Kafka API compatibility
■ Apache Flink native integration
● Exactly once guarantees.
■ Pulsar adaptor for Apache Spark streaming
● Receives raw data from pulsar.
16
Pulsar Connectors
17
Support for multiple
protocols and multi-tenancy
FunctionMesh for K8
Connector Management
Preserves data schema
Integrated Observability
Processing Guarantees,
Load Balancing
Benefits of Event Stream Processing Platforms
■ Centralized infrastructure
■ Intermediate layer for buffering
■ Impedance mismatch between applications
■ Data export and import capabilities
■ Publish CDC (Change Data Capture) streams
■ Ability to recreate state
■ Streaming Data Transformations
■ Integrate with various applications
■ Event based custom Streaming workflows
18
Patterns in Event-
Processing
3
Patterns in Event Processing
■ Event Driven
● Message queues/routing
● Loosely coupled
● Async communication
■ Event Streaming
● Transformations
● Aggregator
● Filtering
20
■ Event Sourcing
● CQRS
● Stateful service pattern
● Saga pattern
State Propagation
in Distributed
Systems
4
2
Event Streaming DDD Microservices Data Warehouses
Domain
Inventory
Orders
Shipments
Data Product
Data Mesh
...
What is Data Mesh?
22
22
Design Principles of a Data Mesh
1. Data is a product
i. Each organization owns their data end-to-end
ii. Responsible for building, operating and serving their data
iii. Publish CDC logs across bounded contexts
iv. Organization needs to resolve any issues and make it easy for data
discoverability
2. Federated data governance
i. Provides data lineage
ii. Validates data quality
iii. Data is shared securely and appropriate access controls are enforced.
iv. Auditing and reporting are enabled
3. Platform should be self-service
i. Accessible via purpose built infra tooling
23
Keep in touch!
Mahee Gunturu
Solutions Architect
AWS
linkedin.com/in/maheedhargunturu/
@Vanguard_space
Implement a Data Mesh: Cheat Sheet
25
■ Centralize data in motion
Introduce a central event streaming platform
■ Nominate data owners
Firm owners for all key datasets in the organization
Make ownership information broadly accessible
■ Data on demand
Events are either stored in messaging systems
or can be republished by data products on demand
■ Handle schema change
Owners publish schema information to the mesh
Process for schema change approval
■ Secure event streams
Access to event streams is permissioned by a central
body
■ Connect from any database
Sink Connectors are made available for all
supported database types to ease the provisioning
of new output data ports in the mesh
■ Central user interface
Events are either stored in messaging systems
or can be republished by data products on demand
■ Discovery & registration of event streams
■ Search data schemas
■ Data lineage views
■ Request to access event streams
■ Previewing event streams

More Related Content

Similar to Distributed Data Processing for Real-time Applications

High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputinginside-BigData.com
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! elangovans
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Tammy Bednar
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGateJeffrey T. Pollock
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Deconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven DesignDeconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven DesignVMware Tanzu
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...confluent
 
QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk M sharifi
 
Digital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyDigital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyIlham Ahmed
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorialAnna Liu
 

Similar to Distributed Data Processing for Real-time Applications (20)

High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputing
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers!
 
VAS - VMware CMP
VAS - VMware CMPVAS - VMware CMP
VAS - VMware CMP
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
kumarResume
kumarResumekumarResume
kumarResume
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Technical Skillwise
Technical SkillwiseTechnical Skillwise
Technical Skillwise
 
Deconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven DesignDeconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven Design
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk
 
Digital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyDigital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility company
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Distributed Data Processing for Real-time Applications

  • 1. Distributed Data Processing for Real-time Applications Distributed Data Expert and Solutions Architect at AWS Mahee Gunturu
  • 2. Outline 1. Modern Application Architectures i. Microservices ii. Serverless 2. Fundamentals of Event Processing i. Basic Terminology ii. Advancements in Event Processing Platforms iii. Connectors iv. Benefits 3. Patterns in Event Processing 4. State Propagation in Distributed Systems i. Data mesh architectures 2
  • 4. 4 4 Application architectures are evolving! EAI EII ETL ESB SOA (WS-*) CEP iP aaS Service Mesh Data Mesh 2020’s 2010’s 2000’s 1990’s 1980’s
  • 5. Modern Real-time Applications Architecture Business logic Purpose-built databases Events Events APIs Queues/Messages 5 5
  • 6. 6 6 Benefits of Serverless No servers to manage Automatically scales Only pay for consumption Event-Driven Architectures Write less code
  • 7. 7 7 If your application is cloud-native, large-scale, or distributed, and doesn’t include a messaging component, that’s probably a bug. Tim Bray General-purpose internet-software geek
  • 9. What is an Event? ■ An event is a change in state, or an update that carries entropy and has a timestamp associated with it. Events can either carry state or can be identifiers. Event based architectures have three key components: ● event producers ● event routers ● event consumers 9
  • 10. What is an Event Stream? ■ A series of continuous events that represent the changing behaviour of a system form an event stream. Event stream processing usually combines the information from many streams and/or data sources in real-time, to derive actionable insights. 10
  • 12. Basic Terminology 12 Producer API Consumer API Publisher Subscriber ● Producer ● Broker ● Topic ● Topic Partition ● Offset ● Consumer ● Schema Registry Brokers Topics
  • 13. Pulsar Terminology 13 ● Producer ● Broker ● Topic ● Cursor ● Consumer ● Namespace/bundles ● Subscription ■ Key-Shared ■ Failover ■ Exclusive ■ Shared ● Reader ● Schema Registry ● ack/nack ● Message Delivery ■ At-most Once ■ At-least once ■ Exactly once ● Bookie/Bookkeeper ● Segment ● Dispatcher ● Ledger ● Tenant Tiered Storage Stateless Stateful
  • 14. Multi-tenancy Model in Pulsar 14 Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Instance Pulsar Cluster Pulsar Cluster Pulsar Cluster
  • 15. Global Replication 15 Data Center 3 Data Center 2 Data Center 1 ● Data is replicated Asynchronously ● Kafka uses Mirror-Maker to replicate the data ● Pulsar has built-in cross data center replication that is used in production already
  • 16. Clients ■ Client API ● Language bindings for Java, Go, Python, C++, Node.js and C#. ● Community client drivers for rust, erlang, haskell, websockets etc… ■ SQL ● Trino (Presto SQL) ■ Kafka on Pulsar ● Client wrapper for Kafka API compatibility ■ Apache Flink native integration ● Exactly once guarantees. ■ Pulsar adaptor for Apache Spark streaming ● Receives raw data from pulsar. 16
  • 17. Pulsar Connectors 17 Support for multiple protocols and multi-tenancy FunctionMesh for K8 Connector Management Preserves data schema Integrated Observability Processing Guarantees, Load Balancing
  • 18. Benefits of Event Stream Processing Platforms ■ Centralized infrastructure ■ Intermediate layer for buffering ■ Impedance mismatch between applications ■ Data export and import capabilities ■ Publish CDC (Change Data Capture) streams ■ Ability to recreate state ■ Streaming Data Transformations ■ Integrate with various applications ■ Event based custom Streaming workflows 18
  • 20. Patterns in Event Processing ■ Event Driven ● Message queues/routing ● Loosely coupled ● Async communication ■ Event Streaming ● Transformations ● Aggregator ● Filtering 20 ■ Event Sourcing ● CQRS ● Stateful service pattern ● Saga pattern
  • 22. 2 Event Streaming DDD Microservices Data Warehouses Domain Inventory Orders Shipments Data Product Data Mesh ... What is Data Mesh? 22 22
  • 23. Design Principles of a Data Mesh 1. Data is a product i. Each organization owns their data end-to-end ii. Responsible for building, operating and serving their data iii. Publish CDC logs across bounded contexts iv. Organization needs to resolve any issues and make it easy for data discoverability 2. Federated data governance i. Provides data lineage ii. Validates data quality iii. Data is shared securely and appropriate access controls are enforced. iv. Auditing and reporting are enabled 3. Platform should be self-service i. Accessible via purpose built infra tooling 23
  • 24. Keep in touch! Mahee Gunturu Solutions Architect AWS linkedin.com/in/maheedhargunturu/ @Vanguard_space
  • 25. Implement a Data Mesh: Cheat Sheet 25 ■ Centralize data in motion Introduce a central event streaming platform ■ Nominate data owners Firm owners for all key datasets in the organization Make ownership information broadly accessible ■ Data on demand Events are either stored in messaging systems or can be republished by data products on demand ■ Handle schema change Owners publish schema information to the mesh Process for schema change approval ■ Secure event streams Access to event streams is permissioned by a central body ■ Connect from any database Sink Connectors are made available for all supported database types to ease the provisioning of new output data ports in the mesh ■ Central user interface Events are either stored in messaging systems or can be republished by data products on demand ■ Discovery & registration of event streams ■ Search data schemas ■ Data lineage views ■ Request to access event streams ■ Previewing event streams