SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Fully Fault Tolerant Real Time
Data Pipeline with Docker and
Mesos
Rahul Kumar
Technical Lead
LinuxCon  /  ContainerCon  -­ Berlin,  Germany
Agenda
Data Pipeline
Mesos + Docker
Reactive Data Pipeline
Goal
Analyzing data always have great benefits and is one of the
greatest challenge for an organization.
Today’s business generates massive amount of digital data.
which is cumbersome to store, transport and analyze
Making distributed system and off-
loading workload to commodity
clusters is one of the better approach to
solve data problem
Characteristics Of a distributed
system
❏ Resource Sharing
❏ Openness
❏ Concurrency
❏ Scalability
❏ Fault Tolerance
❏ Transparency
Collect
Store
Process
Analyze
Data Center
Manually Scale Frameworks & Install services
Complex
Very Limited
Inefficient
Low Utilization
Static Partitioning Blocker for Fault
Tolerant data pipeline
Failure make it even more complex to
manage
Apache Mesos
“Apache Mesos abstracts CPU, memory, storage,
and other compute resources away from machines
(physical or virtual), enabling fault-tolerant and
elastic distributed systems to easily be built and run
effectively.”
Mesos Features
Scalability: scale up to 10,000s of nodes
Fault-tolerant: replicated master and slaves using ZooKeeper
Docker support: Support for Docker containers
Native Container: Linux Native isolation between tasks with Linux Containers
Scheduling: Multi-resource scheduling (memory, CPU, disk, and ports)
API supports: Java, Python and C++ APIs for developing new parallel applications
Monitoring: Web UI for viewing cluster state
Resource Isolation
Docker  Containerizer
Mesos adds the support for launching tasks that contains Docker
images
Users can either launch a Docker image as a Task, or as an
Executor.
To run the mesos-agent to enable the Docker Containerizer,
“docker” must be set as one of the containerizers option
mesos-agent --containerizers=docker,mesos
Mesos Frameworks
Aurora: Aurora was developed at Twitter and the migrated to Apache Project later.
Aurora is a framework that keeps service running across a shared pool of
machines, and responsible for keeping them running forever.
Marathon: It is a framework for container orchestration for Mesos. Marathon helps
to run other framework on Mesos. Marathon also runs other application container
such as Jetty, JBoss Server, Play Server.
Chronos: Fault tolerance job scheduler for Mesos, It was developed at Airbnb as
replacement of cron.
Resilient Distributed Datasets
(RDDs)
- Big collection of data
which is:
- Immutable
- Distributed
- Lazily evaluated
- Type Inferred
- Cacheable
Spark Stack
Many big-data applications need to process large data streams in near-real time
Monitoring  Systems
Alert  Systems
Computing  Systems
Why  Spark  Streaming?
Taken  from  Apache  Spark.
What  is  Spark  Streaming?
Framework  for  large  scale  stream  processing  
➔ Created  at  UC  Berkeley  
➔ Scales  to  100s  of  nodes
➔ Can  achieve  second  scale  latencies
➔ Provides  a  simple  batch-­like  API  for  implementing  complex  algorithm
➔ Can  absorb  live  data  streams  from  Kafka,  Flume,  ZeroMQ,  Kinesis  etc.
What  is  Spark  Streaming?
Run  a  streaming  computation  as  a  series  of  very  small,  deterministic  batch  jobs
-­ Chop  up  the  live  stream  into  batches  of  X  seconds
-­ Spark  treats  each  batch  of  data  as  RDDs  
and  processes  them  using  RDD  operations
-­ Finally,  the  processed  results  of  the  RDD  
operations  are  returned  in  batches
Spark  Streaming
Point  of  Failure
Simple  Streaming  Pipeline
● To  use  Mesos  from  Spark,  you  need  a  Spark  binary  package  available  in  a  
place  accessible  (http/s3/hdfs)  by  Mesos,  and  a  Spark  driver  program  
configured  to  connect  to  Mesos.
● Configuring  the  driver  program  to  connect  to  Mesos:
val sconf  =  new  SparkConf()
.setMaster("mesos://zk://10.121.93.241:2181,10.181.2.12:2181,10.107.48.112:2181/mesos")
.setAppName("MyStreamingApp")
.set("spark.executor.uri","hdfs://Sigmoid/executors/spark-­1.3.0-­bin-­hadoop2.4.tgz")
.set("spark.mesos.coarse", "true")
.set("spark.cores.max", "30")
.set("spark.executor.memory",  "10g")
val  sc  =  new  SparkContext(sconf)
val  ssc  =  new  StreamingContext(sc, Seconds(1))
...
Spark  Streaming  over  a  HA  Mesos  Cluster
Real-­time  stream  processing  systems  must  be  operational  24/7,  which  
requires  them  to  recover  from  all  kinds  of  failures  in  the  system.
● Spark  and  its  RDD  abstraction  is  designed  to  seamlessly  handle  failures  of  any  worker  nodes  in  
the  cluster.  
● In  Streaming,  driver  failure  can  be  recovered  with  checkpointing  application  state.
● Write  Ahead  Logs  (WAL)  &  Acknowledgements  can  ensure  0  data  loss.
Spark  Streaming  Fault-­tolerance
Simple  Fault-­tolerant  Streaming  Infra
● Figure out the bottleneck : CPU, Memory, IO, Network
● If parsing is involved, use the one which gives
high performance.
● Proper Data modeling
● Compression, Serialization
Creating  a  scalable  pipeline
Thank You
@rahul_kumar_aws
LinuxCon  /  ContainerCon  -­ Berlin,  Germany

Weitere ähnliche Inhalte

Was ist angesagt?

Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
DataStax
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
SingleStore
 

Was ist angesagt? (20)

Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Kick-Start with SMACK Stack
Kick-Start with SMACK StackKick-Start with SMACK Stack
Kick-Start with SMACK Stack
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise Search
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Building a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at YieldbotBuilding a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at Yieldbot
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 

Andere mochten auch

ReactiveStream-meetup-Jan102015ppt
ReactiveStream-meetup-Jan102015pptReactiveStream-meetup-Jan102015ppt
ReactiveStream-meetup-Jan102015ppt
Rahul Kumar
 
Building High Scalable Distributed Framework on Apache Mesos
Building High Scalable Distributed Framework on Apache MesosBuilding High Scalable Distributed Framework on Apache Mesos
Building High Scalable Distributed Framework on Apache Mesos
Rahul Kumar
 
Composing and Scaling Data Platforms-2015
Composing and Scaling Data Platforms-2015Composing and Scaling Data Platforms-2015
Composing and Scaling Data Platforms-2015
Rahul Kumar
 

Andere mochten auch (20)

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
ReactiveStream-meetup-Jan102015ppt
ReactiveStream-meetup-Jan102015pptReactiveStream-meetup-Jan102015ppt
ReactiveStream-meetup-Jan102015ppt
 
Building High Scalable Distributed Framework on Apache Mesos
Building High Scalable Distributed Framework on Apache MesosBuilding High Scalable Distributed Framework on Apache Mesos
Building High Scalable Distributed Framework on Apache Mesos
 
Composing and Scaling Data Platforms-2015
Composing and Scaling Data Platforms-2015Composing and Scaling Data Platforms-2015
Composing and Scaling Data Platforms-2015
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streaming
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and Beyond
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Apache Toree
Apache ToreeApache Toree
Apache Toree
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 

Ähnlich wie Fully fault tolerant real time data pipeline with docker and mesos

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
NETWAYS
 
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Akhil Das
 

Ähnlich wie Fully fault tolerant real time data pipeline with docker and mesos (20)

Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
Fully Fault tolerant Streaming Workflows at Scale using Apache Mesos & Spark ...
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 
Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...
Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...
Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...
 
DevOps in Age of Kubernetes
DevOps in Age of KubernetesDevOps in Age of Kubernetes
DevOps in Age of Kubernetes
 
Choosing PaaS: Cisco and Open Source Options: an overview
Choosing PaaS:  Cisco and Open Source Options: an overviewChoosing PaaS:  Cisco and Open Source Options: an overview
Choosing PaaS: Cisco and Open Source Options: an overview
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員MeetupDatacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit Talk
 
Doing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native WayDoing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native Way
 
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
#ITsubbotnik Spring 2017: Roman Dimitrenko "Building Paas with the HashiStack"
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Episode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OSEpisode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OS
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
mesos-devoxx14
mesos-devoxx14mesos-devoxx14
mesos-devoxx14
 

Kürzlich hochgeladen

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Fully fault tolerant real time data pipeline with docker and mesos

  • 1. Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon  /  ContainerCon  -­ Berlin,  Germany
  • 2. Agenda Data Pipeline Mesos + Docker Reactive Data Pipeline
  • 3. Goal Analyzing data always have great benefits and is one of the greatest challenge for an organization.
  • 4. Today’s business generates massive amount of digital data.
  • 5. which is cumbersome to store, transport and analyze
  • 6. Making distributed system and off- loading workload to commodity clusters is one of the better approach to solve data problem
  • 7.
  • 8. Characteristics Of a distributed system ❏ Resource Sharing ❏ Openness ❏ Concurrency ❏ Scalability ❏ Fault Tolerance ❏ Transparency
  • 11. Manually Scale Frameworks & Install services
  • 13. Static Partitioning Blocker for Fault Tolerant data pipeline
  • 14. Failure make it even more complex to manage
  • 15. Apache Mesos “Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.”
  • 16.
  • 17. Mesos Features Scalability: scale up to 10,000s of nodes Fault-tolerant: replicated master and slaves using ZooKeeper Docker support: Support for Docker containers Native Container: Linux Native isolation between tasks with Linux Containers Scheduling: Multi-resource scheduling (memory, CPU, disk, and ports) API supports: Java, Python and C++ APIs for developing new parallel applications Monitoring: Web UI for viewing cluster state
  • 18.
  • 20.
  • 21.
  • 22. Docker  Containerizer Mesos adds the support for launching tasks that contains Docker images Users can either launch a Docker image as a Task, or as an Executor. To run the mesos-agent to enable the Docker Containerizer, “docker” must be set as one of the containerizers option mesos-agent --containerizers=docker,mesos
  • 23.
  • 24. Mesos Frameworks Aurora: Aurora was developed at Twitter and the migrated to Apache Project later. Aurora is a framework that keeps service running across a shared pool of machines, and responsible for keeping them running forever. Marathon: It is a framework for container orchestration for Mesos. Marathon helps to run other framework on Mesos. Marathon also runs other application container such as Jetty, JBoss Server, Play Server. Chronos: Fault tolerance job scheduler for Mesos, It was developed at Airbnb as replacement of cron.
  • 25. Resilient Distributed Datasets (RDDs) - Big collection of data which is: - Immutable - Distributed - Lazily evaluated - Type Inferred - Cacheable Spark Stack
  • 26. Many big-data applications need to process large data streams in near-real time Monitoring  Systems Alert  Systems Computing  Systems Why  Spark  Streaming?
  • 27. Taken  from  Apache  Spark. What  is  Spark  Streaming?
  • 28. Framework  for  large  scale  stream  processing   ➔ Created  at  UC  Berkeley   ➔ Scales  to  100s  of  nodes ➔ Can  achieve  second  scale  latencies ➔ Provides  a  simple  batch-­like  API  for  implementing  complex  algorithm ➔ Can  absorb  live  data  streams  from  Kafka,  Flume,  ZeroMQ,  Kinesis  etc. What  is  Spark  Streaming?
  • 29. Run  a  streaming  computation  as  a  series  of  very  small,  deterministic  batch  jobs -­ Chop  up  the  live  stream  into  batches  of  X  seconds -­ Spark  treats  each  batch  of  data  as  RDDs   and  processes  them  using  RDD  operations -­ Finally,  the  processed  results  of  the  RDD   operations  are  returned  in  batches Spark  Streaming
  • 30. Point  of  Failure Simple  Streaming  Pipeline
  • 31.
  • 32. ● To  use  Mesos  from  Spark,  you  need  a  Spark  binary  package  available  in  a   place  accessible  (http/s3/hdfs)  by  Mesos,  and  a  Spark  driver  program   configured  to  connect  to  Mesos. ● Configuring  the  driver  program  to  connect  to  Mesos: val sconf  =  new  SparkConf() .setMaster("mesos://zk://10.121.93.241:2181,10.181.2.12:2181,10.107.48.112:2181/mesos") .setAppName("MyStreamingApp") .set("spark.executor.uri","hdfs://Sigmoid/executors/spark-­1.3.0-­bin-­hadoop2.4.tgz") .set("spark.mesos.coarse", "true") .set("spark.cores.max", "30") .set("spark.executor.memory",  "10g") val  sc  =  new  SparkContext(sconf) val  ssc  =  new  StreamingContext(sc, Seconds(1)) ... Spark  Streaming  over  a  HA  Mesos  Cluster
  • 33. Real-­time  stream  processing  systems  must  be  operational  24/7,  which   requires  them  to  recover  from  all  kinds  of  failures  in  the  system. ● Spark  and  its  RDD  abstraction  is  designed  to  seamlessly  handle  failures  of  any  worker  nodes  in   the  cluster.   ● In  Streaming,  driver  failure  can  be  recovered  with  checkpointing  application  state. ● Write  Ahead  Logs  (WAL)  &  Acknowledgements  can  ensure  0  data  loss. Spark  Streaming  Fault-­tolerance
  • 35.
  • 36. ● Figure out the bottleneck : CPU, Memory, IO, Network ● If parsing is involved, use the one which gives high performance. ● Proper Data modeling ● Compression, Serialization Creating  a  scalable  pipeline
  • 37. Thank You @rahul_kumar_aws LinuxCon  /  ContainerCon  -­ Berlin,  Germany