SlideShare ist ein Scribd-Unternehmen logo
1 von 29
© 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential
© 2016 MapR Technologies
When Your Stream is
the System of Record
Seattle Kafka Meetup Will Ochandarena
Sr Dir, Product
October 24 2016
© 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential
Agenda
‱ Streaming System of Record - What?
‱ A Little About MapR Streams
‱ Versioning a Real-time Data Pipeline
– Demo - MapR + StreamSets
© 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential © 2016 MapR Technologies
Streaming System of Record
System of Record (n): information storage system that is
the authoritative data source for a given data element or
piece of information.
© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential
Who Does This Today?
Events
Processing
DB
More
Processing
Long Term Storage
© 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential
Reprocessing is Hard
Events
Processing
DB
More
Processing
Long Term Storage
?
Medium Term Storage
3d ago -> Now
1 Year ago -> ~an hour ago
© 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential
Easy Fix - Streaming System of Persistence
Events
Processing
DB
More
Processing
Long Term Storage
Long Term Storage
Events
© 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential
DMV_Updates
Imagine each event as a change to an entry in a database.
DL_ID City Points
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
WillO
BradA
Mountain View
Atlanta
0
0
San Jose
2
How Can a Stream Be a System of Record?
© 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential
Key-Val Document Graph
Wide Column Time Series Relational
???Inserts Updates
Streams and Databases in Harmony
© 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential
Which of these can be used to reconstruct the other?
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
DL_ID City Points
Will0 San Jose 0
BradA Atlanta 2
Which Makes a Better System of Record?
© 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential
‱ Auditing - “how did BradA’s points get so high?”
‱ Lineage - “who added points to BradA license?”
‱ History - “where did WillO used to live?”
‱ Integrity - “can I trust this data hasn’t been tampered with?”
‱ Yup - Streams are immutable
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
Other Benefits of Streaming System of Record
© 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential
‱ Infinitely persisted events
‱ A way to query your persisted stream data
‱ An integrated security model across data services
What Do I Need For This to Work?
‱ Applied Streaming System of Record @ Liaison Blog
© 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential © 2016 MapR Technologies
About MapR & MapR Streams
© 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential
MapR Streams:
Global Pub-sub Event Streaming System for Big Data
Producers publish billions of events/sec
to a topic in a stream.
Events persisted and immediately
delivered to all consumers, guaranteed.
Tie together geo-dispersed clusters.
Worldwide.
Standard real-time API (Kafka).
Integrates with Spark Streaming, Storm,
Apex, and Flink.
Direct data access (OJAI API) from
analytics frameworks.
To
pi
c
Stream
TopicProducers Consumers
Remote sites and consumers
Batch analytics
© 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential
Streams Offers a Durable,
Persistent System of Record
[
{“Topic1Part0Seq5001”: {
“timestamp” : 1456246886,
“topic” : “Topic1”,
“partition” : 0,
“producer” : “wochanda”,
“offset” : 5001,
“key” : “MsgKey”,
“data” : {...}
},
{“Topic2Part0Seq5002”: { 
 } },


]
● Reliable
● Secure
● Immutable
● Auditable
● Replayable
© 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential
Streams Enables Global Applications and Analytics
Provides
● Arbitrary topology of thousands of clusters
● Automatic loop prevention
● DNS-based discovery
● Globally synchronized message offsets
and consumer cursors
Enables
● Global applications & data collection
● Producer & consumer failover
● Analysis/filtering/aggregation at the edge
● “Occasional” connections
Producers
Consumers
© 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential
Fun Facts
MapR Streams
Converged Global Scale
Secure & Multi-Tenant
Single cluster for files,
tables, and streams. Global, IoT-scale “fabrics”
with failover.
Tenant-owned streams,
logical grouping of topics
and messages.
Authentication,
authorization, encryption.
Unified policy with all
other platform services.
Infinite “system of
record” persistence.
Metadata tracked
internally, no
dependencies on ZK.
Consumers, topics scale
into millions.
© 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential
Open Source Engines & Tools Commercial Engines & Applications
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Global Namespace | No Single Point of Failure | Data Protection | Multi-tenancy | Workload Management
Multi Temperature | Global Multi Datacenter | High Performance Low Latency | Security | Management & Monitoring
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
MapR Data Platform Services
Commodity Hardware/Storage, Clouds, & Containers
© 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential © 2016 MapR Technologies
Versioning a Real-time Data Pipeline
© 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential
Challenges of a Streaming App Developer
Pre-Production
Streaming System
Database Hadoop Cluster
App Environment
events
logs
events2
logs2
v2
v2 /clicks /clicks2
... ...
... ...
© 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential
Challenges with Versioning
Post-Production
Input Data App Logic Output Data+ =
Output Streams
Database Tables
Logs, Metrics
What if you deploy a
new version of your
application?
What happens
to all of this?
© 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential
Example: Versioning in Production
45 40 60 30 37 39 72 79 60
Input_Stream
45 35 70
Output_Stream
Calculate_Mean_3
Time Value
00:00:00 70
00:00:05 35
00:00:10 45
Output_Table
Calculate_Mean_3Calculate_Median_3
© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Calculate_Mean_3 Volume
Versioning with Converged App Volumes
45 40 60 30 37 39 72 79 60
Input_Stream
35 70
Output_Stream
Calculate_Mean_3
Time Value
00:00:00 70
00:00:05 35
00:00:10
Output_Table
Calculate_Mean_3Calculate_Median_3
Calculate_Median_3 Volume
Time Value
00:00:00 72
00:00:05 37
00:00:10 45
45 37 72
Output_Stream
Output_Table
© 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential
Versioning & A/B Testing
80%
10%
10%
A
B
C
© 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential © 2016 MapR Technologies
DEMO - MapR & Streamsets
Versioning a Production Data Pipeline
Rupal Shah - Streamsets
© 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential
StreamSets Data Collectorℱ
Adaptable Pipelines -> Efficiency
❑ Intent-driven ingest (minimal schema specification).
❑ Data drift handling.
Pipeline KPIs -> Visibility
❑ Real-time stage, edge and bad data metrics.
❑ Alerts via profiling, sampling and threshold-based rules.
Containerized Architecture -> Agility
❑ Flexible deployment: edge, cluster, embedded, pipeline,
pub/sub
❑ Zero-downtime upgrades due to logical component
isolation.
StreamSets Data Collectorℱ is open source software for building and deploying individual any-
to-any ingest pipelines in the face of data drift.
© 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential
StreamSets Dataflow Performance
Managerℱ
StreamSets Dataflow Performance
Manager (DPMℱ) provides a single
pane of glass to map, measure and
master big data in motion.
MASTER
Availability & Accuracy
Proactive Remediation
MEASURE
Any Path
Any Time
MAP
Dataflow Lineage
Live Data Architecture
© 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential

helping you put data technology to work
● Find answers
● Ask technical questions
● Join on-demand training course discussions
● Follow release announcements
● Share and vote on product ideas
● Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
© 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies
Backup
© 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential
bit.ly/tbd
Find my slides & other related materials to this talk here:
or search:

Weitere Àhnliche Inhalte

Was ist angesagt?

Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai WĂ€hner
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to MicroservicesExpress Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
confluent
 

Was ist angesagt? (20)

Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event Sourcing
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
Enabling Smarter Cities and Connected Vehicles with an Event Streaming Platfo...
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to MicroservicesExpress Scripts: Driving Digital Transformation from Mainframe to Microservices
Express Scripts: Driving Digital Transformation from Mainframe to Microservices
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
 
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
 
Event streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architectureEvent streaming: A paradigm shift in enterprise software architecture
Event streaming: A paradigm shift in enterprise software architecture
 
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
Confluent Platform 5.4 + Apache Kafka 2.4 Overview (RBAC, Tiered Storage, Mul...
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns SimplifiedRedis and Kafka - Advanced Microservices Design Patterns Simplified
Redis and Kafka - Advanced Microservices Design Patterns Simplified
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 

Andere mochten auch

Redefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath FordeRedefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath Forde
sapientindia
 
Michael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex AnalyticsMichael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex Analytics
MassTLC
 
Consumer decision journey HBR
Consumer decision journey HBR Consumer decision journey HBR
Consumer decision journey HBR
Sameer Mathur
 

Andere mochten auch (20)

Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensStrata+Hadoop 2015 Keynote: Impacting Business as it Happens
Strata+Hadoop 2015 Keynote: Impacting Business as it Happens
 
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
SapientNitro: Multi-channel and the Convergence of Marketing, Commerce & Cust...
 
Redefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath FordeRedefining Perspectives 6 - Session 1 Jarlath Forde
Redefining Perspectives 6 - Session 1 Jarlath Forde
 
Hadoop Self-Service Data Prep Fuels Analytics
Hadoop Self-Service Data Prep Fuels AnalyticsHadoop Self-Service Data Prep Fuels Analytics
Hadoop Self-Service Data Prep Fuels Analytics
 
Digital Velocity Europe 2015 | Sapient Nitro Presentation
Digital Velocity Europe 2015 | Sapient Nitro PresentationDigital Velocity Europe 2015 | Sapient Nitro Presentation
Digital Velocity Europe 2015 | Sapient Nitro Presentation
 
Databeers Dub #1 - Krithika Ram - Customer Journey Analytics
Databeers Dub #1 - Krithika Ram - Customer Journey AnalyticsDatabeers Dub #1 - Krithika Ram - Customer Journey Analytics
Databeers Dub #1 - Krithika Ram - Customer Journey Analytics
 
Connecting the Dots: Analytics and the Customer Journey
Connecting the Dots: Analytics and the Customer JourneyConnecting the Dots: Analytics and the Customer Journey
Connecting the Dots: Analytics and the Customer Journey
 
Understanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big DataUnderstanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big Data
 
Tamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael StonebrakerTamr | Strata hadoop 2014 Michael Stonebraker
Tamr | Strata hadoop 2014 Michael Stonebraker
 
Primavera vision & roadmap collaborate13 april 2013
Primavera vision & roadmap collaborate13 april 2013Primavera vision & roadmap collaborate13 april 2013
Primavera vision & roadmap collaborate13 april 2013
 
Content Strategy for the Customer Journey: Personalization Done Right Confab ...
Content Strategy for the Customer Journey: Personalization Done Right Confab ...Content Strategy for the Customer Journey: Personalization Done Right Confab ...
Content Strategy for the Customer Journey: Personalization Done Right Confab ...
 
Consumer Decision Journey in the Digital Age
Consumer Decision Journey in the Digital AgeConsumer Decision Journey in the Digital Age
Consumer Decision Journey in the Digital Age
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Tamr | cdo-summit
Tamr | cdo-summitTamr | cdo-summit
Tamr | cdo-summit
 
Michael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex AnalyticsMichael Stonebraker How to do Complex Analytics
Michael Stonebraker How to do Complex Analytics
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Consumer decision journey HBR
Consumer decision journey HBR Consumer decision journey HBR
Consumer decision journey HBR
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHow PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
 
Customer Journey Mapping
Customer Journey MappingCustomer Journey Mapping
Customer Journey Mapping
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Ähnlich wie Map r seattle streams meetup oct 2016

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 

Ähnlich wie Map r seattle streams meetup oct 2016 (20)

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 

Mehr von Nitin Kumar

Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafka
Nitin Kumar
 

Mehr von Nitin Kumar (16)

Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
 
Processing trillions of events per day with apache
Processing trillions of events per day with apacheProcessing trillions of events per day with apache
Processing trillions of events per day with apache
 
Ren cao kafka connect
Ren cao   kafka connectRen cao   kafka connect
Ren cao kafka connect
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetup
 
Kafka eos
Kafka eosKafka eos
Kafka eos
 
Microsoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka serviceMicrosoft challenges of a multi tenant kafka service
Microsoft challenges of a multi tenant kafka service
 
Net flix kafka seattle meetup
Net flix kafka seattle meetupNet flix kafka seattle meetup
Net flix kafka seattle meetup
 
Avvo fkafka
Avvo fkafkaAvvo fkafka
Avvo fkafka
 
Brandon obrien streaming_data
Brandon obrien streaming_dataBrandon obrien streaming_data
Brandon obrien streaming_data
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Microsoft kafka load imbalance
Microsoft   kafka load imbalanceMicrosoft   kafka load imbalance
Microsoft kafka load imbalance
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafka
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
 

KĂŒrzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

KĂŒrzlich hochgeladen (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Map r seattle streams meetup oct 2016

  • 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies When Your Stream is the System of Record Seattle Kafka Meetup Will Ochandarena Sr Dir, Product October 24 2016
  • 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential Agenda ‱ Streaming System of Record - What? ‱ A Little About MapR Streams ‱ Versioning a Real-time Data Pipeline – Demo - MapR + StreamSets
  • 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential © 2016 MapR Technologies Streaming System of Record System of Record (n): information storage system that is the authoritative data source for a given data element or piece of information.
  • 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential Who Does This Today? Events Processing DB More Processing Long Term Storage
  • 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential Reprocessing is Hard Events Processing DB More Processing Long Term Storage ? Medium Term Storage 3d ago -> Now 1 Year ago -> ~an hour ago
  • 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential Easy Fix - Streaming System of Persistence Events Processing DB More Processing Long Term Storage Long Term Storage Events
  • 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential DMV_Updates Imagine each event as a change to an entry in a database. DL_ID City Points 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } WillO BradA Mountain View Atlanta 0 0 San Jose 2 How Can a Stream Be a System of Record?
  • 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential Key-Val Document Graph Wide Column Time Series Relational ???Inserts Updates Streams and Databases in Harmony
  • 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential Which of these can be used to reconstruct the other? 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } DL_ID City Points Will0 San Jose 0 BradA Atlanta 2 Which Makes a Better System of Record?
  • 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential ‱ Auditing - “how did BradA’s points get so high?” ‱ Lineage - “who added points to BradA license?” ‱ History - “where did WillO used to live?” ‱ Integrity - “can I trust this data hasn’t been tampered with?” ‱ Yup - Streams are immutable 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } Other Benefits of Streaming System of Record
  • 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential ‱ Infinitely persisted events ‱ A way to query your persisted stream data ‱ An integrated security model across data services What Do I Need For This to Work? ‱ Applied Streaming System of Record @ Liaison Blog
  • 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential © 2016 MapR Technologies About MapR & MapR Streams
  • 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential MapR Streams: Global Pub-sub Event Streaming System for Big Data Producers publish billions of events/sec to a topic in a stream. Events persisted and immediately delivered to all consumers, guaranteed. Tie together geo-dispersed clusters. Worldwide. Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink. Direct data access (OJAI API) from analytics frameworks. To pi c Stream TopicProducers Consumers Remote sites and consumers Batch analytics
  • 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential Streams Offers a Durable, Persistent System of Record [ {“Topic1Part0Seq5001”: { “timestamp” : 1456246886, “topic” : “Topic1”, “partition” : 0, “producer” : “wochanda”, “offset” : 5001, “key” : “MsgKey”, “data” : {...} }, {“Topic2Part0Seq5002”: { 
 } }, 
 ] ● Reliable ● Secure ● Immutable ● Auditable ● Replayable
  • 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential Streams Enables Global Applications and Analytics Provides ● Arbitrary topology of thousands of clusters ● Automatic loop prevention ● DNS-based discovery ● Globally synchronized message offsets and consumer cursors Enables ● Global applications & data collection ● Producer & consumer failover ● Analysis/filtering/aggregation at the edge ● “Occasional” connections Producers Consumers
  • 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential Fun Facts MapR Streams Converged Global Scale Secure & Multi-Tenant Single cluster for files, tables, and streams. Global, IoT-scale “fabrics” with failover. Tenant-owned streams, logical grouping of topics and messages. Authentication, authorization, encryption. Unified policy with all other platform services. Infinite “system of record” persistence. Metadata tracked internally, no dependencies on ZK. Consumers, topics scale into millions.
  • 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential Open Source Engines & Tools Commercial Engines & Applications DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Global Namespace | No Single Point of Failure | Data Protection | Multi-tenancy | Workload Management Multi Temperature | Global Multi Datacenter | High Performance Low Latency | Security | Management & Monitoring MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps HDFS API POSIX, NFS HBase API JSON API Kafka API MapR Converged Data Platform MapR Data Platform Services Commodity Hardware/Storage, Clouds, & Containers
  • 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential © 2016 MapR Technologies Versioning a Real-time Data Pipeline
  • 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential Challenges of a Streaming App Developer Pre-Production Streaming System Database Hadoop Cluster App Environment events logs events2 logs2 v2 v2 /clicks /clicks2 ... ... ... ...
  • 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential Challenges with Versioning Post-Production Input Data App Logic Output Data+ = Output Streams Database Tables Logs, Metrics What if you deploy a new version of your application? What happens to all of this?
  • 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential Example: Versioning in Production 45 40 60 30 37 39 72 79 60 Input_Stream 45 35 70 Output_Stream Calculate_Mean_3 Time Value 00:00:00 70 00:00:05 35 00:00:10 45 Output_Table Calculate_Mean_3Calculate_Median_3
  • 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Calculate_Mean_3 Volume Versioning with Converged App Volumes 45 40 60 30 37 39 72 79 60 Input_Stream 35 70 Output_Stream Calculate_Mean_3 Time Value 00:00:00 70 00:00:05 35 00:00:10 Output_Table Calculate_Mean_3Calculate_Median_3 Calculate_Median_3 Volume Time Value 00:00:00 72 00:00:05 37 00:00:10 45 45 37 72 Output_Stream Output_Table
  • 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential Versioning & A/B Testing 80% 10% 10% A B C
  • 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential © 2016 MapR Technologies DEMO - MapR & Streamsets Versioning a Production Data Pipeline Rupal Shah - Streamsets
  • 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential StreamSets Data Collectorℱ Adaptable Pipelines -> Efficiency ❑ Intent-driven ingest (minimal schema specification). ❑ Data drift handling. Pipeline KPIs -> Visibility ❑ Real-time stage, edge and bad data metrics. ❑ Alerts via profiling, sampling and threshold-based rules. Containerized Architecture -> Agility ❑ Flexible deployment: edge, cluster, embedded, pipeline, pub/sub ❑ Zero-downtime upgrades due to logical component isolation. StreamSets Data Collectorℱ is open source software for building and deploying individual any- to-any ingest pipelines in the face of data drift.
  • 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential StreamSets Dataflow Performance Managerℱ StreamSets Dataflow Performance Manager (DPMℱ) provides a single pane of glass to map, measure and master big data in motion. MASTER Availability & Accuracy Proactive Remediation MEASURE Any Path Any Time MAP Dataflow Lineage Live Data Architecture
  • 27. © 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential 
helping you put data technology to work ● Find answers ● Ask technical questions ● Join on-demand training course discussions ● Follow release announcements ● Share and vote on product ideas ● Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  • 28. © 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies Backup
  • 29. © 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential bit.ly/tbd Find my slides & other related materials to this talk here: or search: