More Related Content
Similar to Streaming patterns revolutionary architectures (20)
More from Carol McDonald (12)
Streaming patterns revolutionary architectures
- 1. © 2017 MapR Technologies
Streaming Patterns, Revolutionary
Architectures
Carol McDonald
@caroljmcdonald
- 2. © 2017 MapR Technologies
Agenda
Streams Core Components
Patterns
• Event Sourcing
• Duality of Streams and Databases
• Command Query Responsibility Separation
• Polyglot Persistence, Multiple Materialized Views
• Turning the Database Upside Down
Real World Examples
• Retail Monolith to Microservice
• Healthcare Exchange
- 3. © 2017 MapR Technologies
What’s a Stream ?
Producers ConsumersEvents_Stream
A stream is an unbounded sequence of events carried
from a set of producers to a set of consumers.
Events
- 4. © 2017 MapR Technologies
What is Streaming Data? Got Some Examples?
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
- 5. © 2017 MapR Technologies
Why Streams?
Trigger Events:
• Stock Prices
• User Activity
• Sensor Data
Topic
Many Big Data sources are Event Oriented
StreamStreamStream
Event Data
TopicTopic
Real-Time Analytics
- 6. © 2017 MapR Technologies
Analyze Data
What if you need to analyze data as it arrives?
- 7. © 2017 MapR Technologies
It was hot
at 6:05
yesterday!
Batch Processing
Analyze
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
90°90°
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
- 8. © 2017 MapR Technologies
Event Processing with Streams
6:05 P.M.: 90°
To
pic
Stream
Temperature
Turn on the air
conditioning!
- 9. © 2017 MapR Technologies
Organize Data
What if you need to organize data as it arrives?
- 10. © 2017 MapR Technologies
Integrating Many Data Sources and Applications
Sources
(Producers)
Applications
(Consumers)
Unorganized, Complicated, and Tightly Coupled.
- 11. © 2017 MapR Technologies
Organize Data into Topics with MapR Streams
Topics Organize Events into Categories and Decouple Producers from Consumers
Consumers
MapR Cluster
Topic: Pressure
Topic: Temperature
Topic: Warnings
Consumers
Consumers
Kafka API Kafka API
- 12. © 2017 MapR Technologies
Process High Volume of Data
What if you need to process a high volume of data as it arrives?
- 13. © 2017 MapR Technologies
What if BP had detected problems before the oil hit the water ?
• 1M samples/sec
• High performance at
scale is necessary!
- 14. © 2017 MapR Technologies
Traditional Message queue
Huge performance hit:
• Lots of disk I/O
- 15. © 2017 MapR Technologies
Scalable Messaging with MapR Streams
Server 1
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Server 2
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Server 3
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
Topics are
partitioned for
throughput and
scalability
- 16. © 2017 MapR Technologies
Scalable Messaging with MapR Streams
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
Producers are load
balanced between partitions
Kafka API
- 17. © 2017 MapR Technologies
Scalable Messaging with MapR Streams
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
Consumers
Consumers
Consumers
Consumer groups can read in parallel
Kafka API
- 18. © 2017 MapR Technologies
Partition is like a Queue
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
New Messages are
appended to the end
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
New
Message
6 5 4 3 2 1
Old
Message
- 19. © 2017 MapR Technologies
Events are delivered in the order they are received, like a queue
messages are delivered in the order they are received
MapR Cluster
6 5 4 3 2 1
Consumer
groupProducers
Read cursors
Consumer
group
- 20. © 2017 MapR Technologies
Unlike a queue, events are persisted even after they’re delivered
Messages remain on the partition, available to other consumers
Minimizes Non-Sequential disk read-writes
MapR Cluster (1 Server)
Topic: Warning
Partition
1
3 2 1 Unread Events
Get Unread
3 2 1
Client Library ConsumerPoll
- 21. © 2017 MapR Technologies
When Are Messages Deleted?
• Messages can be persisted forever
• Or
• Older messages can be deleted automatically based on time to live
MapR Cluster (1 Server)
6 5 4 3 2 1Partition
1
Older
message
- 22. © 2017 MapR Technologies
Processing Same Message for Different Purposes
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
- 24. © 2017 MapR Technologies
Message Recovery
What if you need to recover messages in case of server failure?
- 25. © 2017 MapR Technologies
Partitions are Replicated for Fault Tolerance
Producer
Producer
Server 2 Partition2: Topic - Warning
Producer
Server 1 Partition1: Topic - Warning
Server 3 Partition3: Topic - Warning
Server 2
Server 3
Server 1
Server 3
Server 1
Server 2
- 26. © 2017 MapR Technologies
Partition1: Warning
Partition2: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition2: Warning Replica
Partition3: Warning
Producer
Producer
Producer
Server 1
Server 2
Server 3
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition2: Warning
Partitions are Replicated for Fault Tolerance
- 27. © 2017 MapR Technologies
Partitions are Replicated for Fault Tolerance
Producer
Producer
Producer
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition1: Warning
Partition2: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition2: Warning Replica
Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning
- 28. © 2017 MapR Technologies
Partitions are Replicated for Fault tolerance
Producer
Producer
Producer
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition1: Warning
Partition2: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition2: Warning Replica
Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning
- 29. © 2017 MapR Technologies
Streams and High Availability
- 30. © 2017 MapR Technologies
Real-time Access
What if you need real-time access to live data distributed across multiple clusters
and multiple data centers?
- 31. © 2017 MapR Technologies
Streams and Replication
Streams:
• can be replicated worldwide
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
Replicating to
another
cluster
- 32. © 2017 MapR Technologies
Streams:
• high availability
• disaster recovery
Streams and Replication
Topic: A
Topic: B
Topic: C
Fail Over
- 36. © 2017 MapR Technologies
Event Sourcing
Updates
Imagine each event as a change to an entry in a database.
Account Id Balance
WillO 80.00
BradA 20.00
1: WillO : Deposit : 100.00
2: BradA : Deposit : 50.00
3: BradA : Withdraw : 30.00
4: WillO : Withdraw: 20.00
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Change log
4 3 2 1
queue of all deposit and withdrawal events current account balances
- 37. © 2017 MapR Technologies
Replication
Change Log
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
3 2 1 3 2 1
3 2 1
Duality of Streams and Tables
Master:
Append writes
Slave:
Apply writes in order
- 38. © 2017 MapR Technologies
Which Makes a Better System of Record?
Which of these can be used to reconstruct the other?
1: WillO : Deposit : 100.00
2: BradA : Deposit : 50.00
3: BradA : Withdraw : 30.00
4: WillO : Withdraw: 20.00
Account Id Balance
WillO 80.00
BradA 20.00
Change Log
3 2 1
- 39. © 2017 MapR Technologies
Rewind: Reprocessing Events
MapR Cluster
6 5 4 3 2 1Producers
Reprocess from
oldest
Consumer
Create new view, Index, cache
- 40. © 2017 MapR Technologies
Rewind Reprocessing Events
MapR Cluster
6 5 4 3 2 1Producers
To Newest
Consumer new view
Read from
new view
- 41. © 2017 MapR Technologies
Event Sourcing, Command Query Responsibility Separation:
Turning the Database Upside Down
Key-Val Document Graph
Wide
Column
Time
Series
Relational
???Events Updates
- 42. © 2017 MapR Technologies
What Else Do I Use My Stream For?
Lineage - “how did BradA’s balance get so low?”
Auditing - “who deposited/withdrew from BradA’s account?”
History – to see the status of the accounts last year
Integrity - “can I trust this data hasn’t been tampered with?”
• Yup - Streams are immutable
0: WillO : Deposit : 100.00
1: BradA : Deposit : 50.00
2: BradA : Withdraw : 30.00
3: WillO : Withdraw: 20.00
- 43. © 2017 MapR Technologies
What Do I Need For This to Work?
Infinitely persisted events
A way to query your persisted stream data
An integrated security model across the stream and databases
- 45. © 2017 MapR Technologies
Breaking up Online shopping rating items into Microservices
Concurrency
bottleneck
- 46. © 2017 MapR Technologies
Separate Write from Read using CQRS
Command Query Responsibility Separation
Separate the Rate Item write “command”
from the Get Item Ratings read “query” using event sourcing
{
"itemid": "sku124",
"rating": "4",
"userid": "cmcdonald",
"comment": "works well"
}
{
"itemid": "sku124",
"pname": "bluetooth earbud",
"ratings": [
{
"rating": "4",
"userid": "cmcdonald",
"comment": "works well"
},
{
"rating": "1",
"userid": "diego",
"comment": "hated it"
}]
}
- 47. © 2017 MapR Technologies
NoSQL Scaling Fast Reads and Writes
Design your schema so that the data that is read together is
stored together
- 48. © 2017 MapR Technologies
Event Sourcing: New Uses of Data
Add new Services like Recommendations
- 49. © 2017 MapR Technologies
Fraud Detection
Point of Sale -> Data Center is Transaction Fraud ?
• Lots of requests
• Need answer within ~ 50 100 milliseconds
Data
Center
Point of Sale
Location, time, card#
Fraud yes/no ?
- 50. © 2017 MapR Technologies
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
1. Look up last card use
2. Compute the card velocity:
• Subtract last location, time from
current location, time
3. Update last card use
- 51. © 2017 MapR Technologies
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
1. Read last card use
2. Compute the card velocity
3. Update last card use
- 52. © 2017 MapR Technologies
Service Isolation: Separate Read from Write
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
Read
Read last card use
- 53. © 2017 MapR Technologies
Separate Read Model from the Write Model:
Command Query Responsibility Separation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
Read
Event last card use
Write last card use
- 54. © 2017 MapR Technologies
Event Sourcing: New Uses of Data
Processing Same Message for Different Views
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
- 55. © 2017 MapR Technologies
Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
Multiple fraud detectors can use the same message queue
- 56. © 2017 MapR Technologies
Lessons
De-coupling and isolation are key
Propagate events, not table updates
- 58. © 2017 MapR Technologies
Use Case: Streaming System of Record for Healthcare
Objective:
• Build a flexible, secure
healthcare exchange
Records Analysis
Applications
Challenges:
• Many different data models
• Security and privacy issues
• HIPAA compliance
Records
- 59. © 2017 MapR Technologies59
ALLOY Health:
Exchange State HIE
Clinical Data Viewer
Reporting and Analytics
Clinical Data
Financial Data
Provider
Organizations
- 60. © 2017 MapR Technologies
This is a PAIN !
COMPLIAN
CE
SECURITY CONTROLS
COMPLIANCE
FEATURES
PRIVACY
PCI DSS
3.0
21 CFR Part
11
SSAE16 /
SOC2
HIPAA/HITECH
- 61. © 2017 MapR Technologies
WHY NOW?
2014 FQ4 profit
$ -440 M
Total Cost Estimate
$ -12 B
- 62. © 2017 MapR Technologies
Why Now? The Relational database is not the only tool
1234
Attribute Value
patient_id 1234
Name Jon Smith
Age 50
999
Attribute Value
patient_id 999
Name Jonathan
Smith
DOB Jun 1965
86
9876
Attribute Value
provider_id 86
Name Dr. Nora Paige
Specialty Diabetes
Attribute Value
rx_id 9876
Name Sitagliptin
Dosage 325mg
Visited
Prescribed
WasPrescribed
Patient
Patient
Prescription
Provider
Context and Relationships
- 64. © 2017 MapR Technologies
Streaming System of Record for Healthcare
Stream
Topic
Records
Applications
6 5 4 3 2 1
Search
Graph DB
JSON
HBase
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
A
P
I
Streaming System of Record Materialized
Views
Consumer
workflow
Consumer
workflow
Consumer
workflowImmutable Log
pre-
processor
- 65. © 2017 MapR Technologies
65
Immutable Log
Raw
Data
workflow
Key/Value
(MapR-DB)
materialized
view
workflow
Search
Engine
materialized
view
CEP
k v v v v v
k v v v
k v v
k v v v v
k v v v
k v v v v v
Document Log
(MapR-FS)
log
API
App
pre-
processor
workflow
Graph
(ArangoDB)
materialized
view
workflow
Time
Series
(OpenTSDB)
materialized
view
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
App AppApp
...
The Promised Land
Compliance
Auditor
smiley faces
Data Lineage
Audit Logging
- 66. © 2017 MapR Technologies
Solution
Design/architecture solved some
• Streams
• Data Lineage/System of Record
• Kappa Architecture (Kreps/Kleppman)
MapR solved others
• Unified Security
• Replication DC to DC
• Converge Kafka/HBase/Hadoop to one cluster
• Multi-tenancy (lots of topics, for lots of tenants)
66
- 68. © 2017 MapR Technologies
Challenge: Major Latency from Batch File Transfer
20-30 Minutes
- 69. © 2017 MapR Technologies
Regional Datacenter
Topic
Elasticsearch
Kibana
File Server
Producer
(Java)
Consumer
(Java) Index
Filtering config
• Monitoring directory
• Parsing CSV files
• Publishing messages to
topic
• Parsing master data
• Subscribing topic
• Join tables
• Aggregation
Dashboard
- 70. © 2017 MapR Technologies
Streams and Replication
Streams:
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
Replicating to
another
cluster
- 71. © 2017 MapR Technologies
Central Data Center
Ad-hoc
analysis
Other Data
Sources
Real-time
analysis
Reporting
Streaming
Stream
Topic
Replicating
Regional Data Centers
Stream
Topic
Stream
Topic
Performance
and other
monitoring
related data.
Aggregation of data across all regional data centers
- 72. © 2017 MapR Technologies
Stream Processing
Building a Complete Data Architecture
MapR File System
(MapR-FS)
MapR Converged Data Platform
MapR Database
(MapR-DB)
MapR Streams
Sources/Apps Bulk Processing
- 73. © 2017 MapR Technologies
To Learn More:
• Streaming Architecture ebook
• https://mapr.com/streaming-architecture-using-apache-kafka-mapr-streams/
- 75. © 2017 MapR Technologies
MapR Blog
• https://www.mapr.com/blog/
- 76. © 2017 MapR Technologies
To Learn More:
• End to End Application for Monitoring Uber Data using Spark ML
• https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-
learning-streaming-and-kafka-api-part-1/
- 77. © 2017 MapR Technologies
…helping you put data technology to work
● Find answers
● Ask technical questions
● Join on-demand training course
discussions
● Follow release announcements
● Share and vote on product ideas
● Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
- 78. © 2017 MapR Technologies
To Learn More:
• MapR Free ODT http://learn.mapr.com/