Weitere Àhnliche Inhalte
Ăhnlich wie Map r seattle streams meetup oct 2016 (20)
Mehr von Nitin Kumar (16)
KĂŒrzlich hochgeladen (20)
Map r seattle streams meetup oct 2016
- 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential
© 2016 MapR Technologies
When Your Stream is
the System of Record
Seattle Kafka Meetup Will Ochandarena
Sr Dir, Product
October 24 2016
- 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential
Agenda
âą Streaming System of Record - What?
âą A Little About MapR Streams
âą Versioning a Real-time Data Pipeline
â Demo - MapR + StreamSets
- 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential © 2016 MapR Technologies
Streaming System of Record
System of Record (n): information storage system that is
the authoritative data source for a given data element or
piece of information.
- 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential
Who Does This Today?
Events
Processing
DB
More
Processing
Long Term Storage
- 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential
Reprocessing is Hard
Events
Processing
DB
More
Processing
Long Term Storage
?
Medium Term Storage
3d ago -> Now
1 Year ago -> ~an hour ago
- 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential
Easy Fix - Streaming System of Persistence
Events
Processing
DB
More
Processing
Long Term Storage
Long Term Storage
Events
- 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential
DMV_Updates
Imagine each event as a change to an entry in a database.
DL_ID City Points
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
WillO
BradA
Mountain View
Atlanta
0
0
San Jose
2
How Can a Stream Be a System of Record?
- 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential
Key-Val Document Graph
Wide Column Time Series Relational
???Inserts Updates
Streams and Databases in Harmony
- 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential
Which of these can be used to reconstruct the other?
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
DL_ID City Points
Will0 San Jose 0
BradA Atlanta 2
Which Makes a Better System of Record?
- 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential
âą Auditing - âhow did BradAâs points get so high?â
âą Lineage - âwho added points to BradA license?â
âą History - âwhere did WillO used to live?â
âą Integrity - âcan I trust this data hasnât been tampered with?â
âą Yup - Streams are immutable
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
Other Benefits of Streaming System of Record
- 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential
âą Infinitely persisted events
âą A way to query your persisted stream data
âą An integrated security model across data services
What Do I Need For This to Work?
âą Applied Streaming System of Record @ Liaison Blog
- 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential © 2016 MapR Technologies
About MapR & MapR Streams
- 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential
MapR Streams:
Global Pub-sub Event Streaming System for Big Data
Producers publish billions of events/sec
to a topic in a stream.
Events persisted and immediately
delivered to all consumers, guaranteed.
Tie together geo-dispersed clusters.
Worldwide.
Standard real-time API (Kafka).
Integrates with Spark Streaming, Storm,
Apex, and Flink.
Direct data access (OJAI API) from
analytics frameworks.
To
pi
c
Stream
TopicProducers Consumers
Remote sites and consumers
Batch analytics
- 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential
Streams Offers a Durable,
Persistent System of Record
[
{âTopic1Part0Seq5001â: {
âtimestampâ : 1456246886,
âtopicâ : âTopic1â,
âpartitionâ : 0,
âproducerâ : âwochandaâ,
âoffsetâ : 5001,
âkeyâ : âMsgKeyâ,
âdataâ : {...}
},
{âTopic2Part0Seq5002â: { ⊠} },
âŠ
]
â Reliable
â Secure
â Immutable
â Auditable
â Replayable
- 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential
Streams Enables Global Applications and Analytics
Provides
â Arbitrary topology of thousands of clusters
â Automatic loop prevention
â DNS-based discovery
â Globally synchronized message offsets
and consumer cursors
Enables
â Global applications & data collection
â Producer & consumer failover
â Analysis/filtering/aggregation at the edge
â âOccasionalâ connections
Producers
Consumers
- 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential
Fun Facts
MapR Streams
Converged Global Scale
Secure & Multi-Tenant
Single cluster for files,
tables, and streams. Global, IoT-scale âfabricsâ
with failover.
Tenant-owned streams,
logical grouping of topics
and messages.
Authentication,
authorization, encryption.
Unified policy with all
other platform services.
Infinite âsystem of
recordâ persistence.
Metadata tracked
internally, no
dependencies on ZK.
Consumers, topics scale
into millions.
- 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential
Open Source Engines & Tools Commercial Engines & Applications
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Global Namespace | No Single Point of Failure | Data Protection | Multi-tenancy | Workload Management
Multi Temperature | Global Multi Datacenter | High Performance Low Latency | Security | Management & Monitoring
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
MapR Data Platform Services
Commodity Hardware/Storage, Clouds, & Containers
- 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential © 2016 MapR Technologies
Versioning a Real-time Data Pipeline
- 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential
Challenges of a Streaming App Developer
Pre-Production
Streaming System
Database Hadoop Cluster
App Environment
events
logs
events2
logs2
v2
v2 /clicks /clicks2
... ...
... ...
- 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential
Challenges with Versioning
Post-Production
Input Data App Logic Output Data+ =
Output Streams
Database Tables
Logs, Metrics
What if you deploy a
new version of your
application?
What happens
to all of this?
- 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential
Example: Versioning in Production
45 40 60 30 37 39 72 79 60
Input_Stream
45 35 70
Output_Stream
Calculate_Mean_3
Time Value
00:00:00 70
00:00:05 35
00:00:10 45
Output_Table
Calculate_Mean_3Calculate_Median_3
- 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Calculate_Mean_3 Volume
Versioning with Converged App Volumes
45 40 60 30 37 39 72 79 60
Input_Stream
35 70
Output_Stream
Calculate_Mean_3
Time Value
00:00:00 70
00:00:05 35
00:00:10
Output_Table
Calculate_Mean_3Calculate_Median_3
Calculate_Median_3 Volume
Time Value
00:00:00 72
00:00:05 37
00:00:10 45
45 37 72
Output_Stream
Output_Table
- 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential
Versioning & A/B Testing
80%
10%
10%
A
B
C
- 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential © 2016 MapR Technologies
DEMO - MapR & Streamsets
Versioning a Production Data Pipeline
Rupal Shah - Streamsets
- 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential
StreamSets Data Collectorâą
Adaptable Pipelines -> Efficiency
â Intent-driven ingest (minimal schema specification).
â Data drift handling.
Pipeline KPIs -> Visibility
â Real-time stage, edge and bad data metrics.
â Alerts via profiling, sampling and threshold-based rules.
Containerized Architecture -> Agility
â Flexible deployment: edge, cluster, embedded, pipeline,
pub/sub
â Zero-downtime upgrades due to logical component
isolation.
StreamSets Data Collectorâą is open source software for building and deploying individual any-
to-any ingest pipelines in the face of data drift.
- 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential
StreamSets Dataflow Performance
Managerâą
StreamSets Dataflow Performance
Manager (DPMâą) provides a single
pane of glass to map, measure and
master big data in motion.
MASTER
Availability & Accuracy
Proactive Remediation
MEASURE
Any Path
Any Time
MAP
Dataflow Lineage
Live Data Architecture
- 27. © 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential
âŠhelping you put data technology to work
â Find answers
â Ask technical questions
â Join on-demand training course discussions
â Follow release announcements
â Share and vote on product ideas
â Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
- 28. © 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies
Backup
- 29. © 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential
bit.ly/tbd
Find my slides & other related materials to this talk here:
or search: