SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Hadoop Ecosystem and Low
Latency Streaming Architecture
InSemble Inc.
http://www.insemble.com
Agenda
What is Big Data and why it is relevant ?1
Flume, Kafka and Storm4
Reference Architecture for Low Latency Streaming3
Hadoop Ecosystem2
Demo5
Big Data Definitions
• Wikipedia defines it as “Data Sets with sizes beyond the ability of
commonly used software tools to capture, curate, manage and process
data within a tolerable elapsed time”
• Gartner defines it as Data with the following characteristics
– High Velocity
– High Variety
– High Volume
• Another Definition is “Big Data is a large volume, unstructured data
which cannot be handled by traditional database management systems
”
Why a game changer
• Schema on Read
– Interpreting data at processing time
– Key, Values are not intrinsic properties of data but chosen by
person analyzing the data
• Move code to data
– With traditional, we bring data to code and I/O becomes a
bottleneck
– With distributed systems, we have to deal with our own
checkpointing/recovery
• More data beats better algorithms
Enterprise Relevance
• Missed Opportunities
– Channels
– Data that is analyzed
• Constraint was high cost
– Storage
– Processing
• Future-proof your business
– Schema on Read
– Access pattern not as relevant
– Not just future-proofing your architecture
Hadoop Ecosystem
Source: Apache Hadoop Documentation
Hadoop 2 with YARN
Source: Hadoop In Practice by Alex Holmes
Big Data Journey
➢ Real time Insight from all channels
➢ IT is key differentiator for your business
➢ Perfect alignment of Business and IT
➢ Ad Hoc Data Exploration
➢ Batch, Interactive, Real time use cases
➢ Predictive Analytics, Machine Learning
➢ Consolidated Analytics
➢ ETL
➢ Time Constraints
➢ Security standards defined
➢ Governance Standards Defined
➢ Integrated with the Enterprise
➢ Evaluate Business Benefits
➢ Understand Ecosystem
➢ Identify Platform
Aware of Benefits
Execute
Expand
Managed
Optimized
- Scout for Opportunities
- Pilot project
- Multiple Use cases
- Governance Model
- Core competency
Journey Over Time
BusinessValue
Effects
GREAT
GOOD
Real time Stream Processing
Architecture with Hadoop
Flume Architecture
• Distributed system for
collecting and aggregating
from multiple data stores to
a centralized data store
• Agent is a JVM that hosts
the Flume components
• Channel will store
message until picked by a
sink
• Different types of Flume
sources
• Source and Sink are
decoupled
Consolidation Architecture
Multiplexing Architecture
Kafka Introduction
• Messaging System which is distributed, partitioned and replicated
• Kafka brokers run as a cluster
• Producers and Consumers can be written in any language
Topic
• Ordered, immutable sequence numbers
• Retains messages until a period of time
• “Offset” of where they are is controlled by the consumer
• Each partition is replicated and has “leader” and 0 or more “follower”.
R/W only done on leader
Producers and Consumers
• Producer controls which partition messages goes to
• Supports both Queuing and Pub/Sub
– Abstraction called Consumer group
• Ordering within Partition
– Ordering for subscriber has to be done with only one subscriber to that
partition
Storm Introduction
• Distributed real time computational system
–Process unbounded streams of data
–Can use multiple programming languages
–Scalable, fault-tolerant and guarantees that data will be processed
• Use Cases
–Real time analytics, online machine learning
–Continuous Computation
–Distributed RPC
–ETL
• Concepts
–Topology
–Spouts
–Bolts
Concepts
• Storm Cluster
– Master node(Nimbus)
• Distributing code
• Assigns tasks to machines
• Monitors for failures
– Worker nodes(Supervisor)
• Starts/stops worker processes
• Each worker process executes subset of a topology
– Zookeeper
• Coordinates between Nimbus and Supervisors
• Nimbus and Supervisors completely stateless
• State maintained by Zookeeper or local disks
Details
• Stream
– Unbounded sequence of tuples
• Spout(write logic)
– Source of stream. Emits tuples
• Bolt(write logic)
– Processes streams and emits tuples
• Topology
– DAG of spouts and bolts
– Submit a topology to a Storm cluster
– Each node runs in parallel and parallelism is controlled
Stream groupings
• Tells a topology how to send tuples between two components
• Since tasks are executed in parallel, how do we control which tasks the
tuples are being sent to
Why Use Twitter as Data Source
Demo - Twitter TopN Trending Topic
• Method 1 — Flume with interceptor
• Method 2 — Storm with custom Twitter
Spout
• Method 3 — Flume + Kafka + Storm
Demo - Twitter TopN Trending Topic
• Use Flume Twitter Source to ingest data and
publish event to Kafka topic
• Use Kafka as messaging backbone
• Use Storm as an Real-Time event processing
system to calculate TopN trending topic
• Use Redis to store the TopN Result
• Use Node.js/JQuery for visualization
Flow Chart
Demo: Start Redis Server
Demo: Start Node.js server
Demo: Start Storm
Demo: Start Flume Agent
Demo: Storm Console Output
Demo: Trending Result
Flume Agent — Source
Flume Agent — Channel
Flume Agent — Sink
Storm Topology Design
Submit Topology to Storm
Production Cluster
Submit Topology to Test Cluster
ParseTweetBolt Code
ParseTweetBolt Code
ParseTweetBolt Code
Questions?


Vijay Mandava: vijay@insemble.com
Lan Jiang: lan@insemble.com / @Lan_Jiang



Weitere ähnliche Inhalte

Was ist angesagt?

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architectureMatteo Merli
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageStreamlio
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scaleMatteo Merli
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Build a custom metrics on aws cloud
Build a custom metrics on aws cloudBuild a custom metrics on aws cloud
Build a custom metrics on aws cloudAhmad karawash
 

Was ist angesagt? (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Message queues
Message queuesMessage queues
Message queues
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Build a custom metrics on aws cloud
Build a custom metrics on aws cloudBuild a custom metrics on aws cloud
Build a custom metrics on aws cloud
 

Andere mochten auch

Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...inside-BigData.com
 
Hssc i objective workbook
Hssc i objective workbookHssc i objective workbook
Hssc i objective workbookEngin Basturk
 
Iman kepada Malaikat
Iman kepada MalaikatIman kepada Malaikat
Iman kepada MalaikatNafika E.R.C
 
hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve
 
Public Sector Show - Speakers Presentation
Public Sector Show  - Speakers PresentationPublic Sector Show  - Speakers Presentation
Public Sector Show - Speakers Presentationacademiesshow
 
hivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messengerhivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messengerhivve
 
hivve.me Project Based Learning Messenger
hivve.me  Project Based Learning Messengerhivve.me  Project Based Learning Messenger
hivve.me Project Based Learning Messengerhivve
 
The Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil PremiumThe Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil Premiumacademiesshow
 
JessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINALJessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINALJames Jessup
 
VCR Presentation Jessup
VCR Presentation JessupVCR Presentation Jessup
VCR Presentation JessupJames Jessup
 

Andere mochten auch (20)

Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
 
NEGOSIASI
NEGOSIASINEGOSIASI
NEGOSIASI
 
Bunga
BungaBunga
Bunga
 
Hssc i objective workbook
Hssc i objective workbookHssc i objective workbook
Hssc i objective workbook
 
Iman kepada Malaikat
Iman kepada MalaikatIman kepada Malaikat
Iman kepada Malaikat
 
hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger
 
Pharmacy slide share
Pharmacy slide sharePharmacy slide share
Pharmacy slide share
 
MATT CV ROEVIN
MATT CV ROEVINMATT CV ROEVIN
MATT CV ROEVIN
 
Public Sector Show - Speakers Presentation
Public Sector Show  - Speakers PresentationPublic Sector Show  - Speakers Presentation
Public Sector Show - Speakers Presentation
 
Luxury Wedding Venues in MA
Luxury Wedding Venues in MALuxury Wedding Venues in MA
Luxury Wedding Venues in MA
 
hivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messengerhivve.me - The first collaborative learning messenger
hivve.me - The first collaborative learning messenger
 
hivve.me Project Based Learning Messenger
hivve.me  Project Based Learning Messengerhivve.me  Project Based Learning Messenger
hivve.me Project Based Learning Messenger
 
ENFERMERÍA
ENFERMERÍAENFERMERÍA
ENFERMERÍA
 
ankita cv final (2)
ankita cv final (2)ankita cv final (2)
ankita cv final (2)
 
Bunga
BungaBunga
Bunga
 
The Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil PremiumThe Academies Show Birmingham 2014 - Session on Pupil Premium
The Academies Show Birmingham 2014 - Session on Pupil Premium
 
Q distance
Q distanceQ distance
Q distance
 
JessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINALJessupJamesBIAComprehensiveAssignmentFINAL
JessupJamesBIAComprehensiveAssignmentFINAL
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
VCR Presentation Jessup
VCR Presentation JessupVCR Presentation Jessup
VCR Presentation Jessup
 

Ähnlich wie Hadoop Ecosystem and Low Latency Streaming Architecture

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesCrossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesIsuru Suriarachchi
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an IntroductionErik Schmiegelow
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 

Ähnlich wie Hadoop Ecosystem and Low Latency Streaming Architecture (20)

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesCrossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Hadoop
HadoopHadoop
Hadoop
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Algorithmic Trading
Algorithmic TradingAlgorithmic Trading
Algorithmic Trading
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 

Kürzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 

Kürzlich hochgeladen (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Hadoop Ecosystem and Low Latency Streaming Architecture

  • 1. Hadoop Ecosystem and Low Latency Streaming Architecture InSemble Inc. http://www.insemble.com
  • 2. Agenda What is Big Data and why it is relevant ?1 Flume, Kafka and Storm4 Reference Architecture for Low Latency Streaming3 Hadoop Ecosystem2 Demo5
  • 3. Big Data Definitions • Wikipedia defines it as “Data Sets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process data within a tolerable elapsed time” • Gartner defines it as Data with the following characteristics – High Velocity – High Variety – High Volume • Another Definition is “Big Data is a large volume, unstructured data which cannot be handled by traditional database management systems ”
  • 4. Why a game changer • Schema on Read – Interpreting data at processing time – Key, Values are not intrinsic properties of data but chosen by person analyzing the data • Move code to data – With traditional, we bring data to code and I/O becomes a bottleneck – With distributed systems, we have to deal with our own checkpointing/recovery • More data beats better algorithms
  • 5. Enterprise Relevance • Missed Opportunities – Channels – Data that is analyzed • Constraint was high cost – Storage – Processing • Future-proof your business – Schema on Read – Access pattern not as relevant – Not just future-proofing your architecture
  • 6. Hadoop Ecosystem Source: Apache Hadoop Documentation
  • 7. Hadoop 2 with YARN Source: Hadoop In Practice by Alex Holmes
  • 8. Big Data Journey ➢ Real time Insight from all channels ➢ IT is key differentiator for your business ➢ Perfect alignment of Business and IT ➢ Ad Hoc Data Exploration ➢ Batch, Interactive, Real time use cases ➢ Predictive Analytics, Machine Learning ➢ Consolidated Analytics ➢ ETL ➢ Time Constraints ➢ Security standards defined ➢ Governance Standards Defined ➢ Integrated with the Enterprise ➢ Evaluate Business Benefits ➢ Understand Ecosystem ➢ Identify Platform Aware of Benefits Execute Expand Managed Optimized - Scout for Opportunities - Pilot project - Multiple Use cases - Governance Model - Core competency Journey Over Time BusinessValue Effects GREAT GOOD
  • 9. Real time Stream Processing Architecture with Hadoop
  • 10. Flume Architecture • Distributed system for collecting and aggregating from multiple data stores to a centralized data store • Agent is a JVM that hosts the Flume components • Channel will store message until picked by a sink • Different types of Flume sources • Source and Sink are decoupled
  • 13. Kafka Introduction • Messaging System which is distributed, partitioned and replicated • Kafka brokers run as a cluster • Producers and Consumers can be written in any language
  • 14. Topic • Ordered, immutable sequence numbers • Retains messages until a period of time • “Offset” of where they are is controlled by the consumer • Each partition is replicated and has “leader” and 0 or more “follower”. R/W only done on leader
  • 15. Producers and Consumers • Producer controls which partition messages goes to • Supports both Queuing and Pub/Sub – Abstraction called Consumer group • Ordering within Partition – Ordering for subscriber has to be done with only one subscriber to that partition
  • 16. Storm Introduction • Distributed real time computational system –Process unbounded streams of data –Can use multiple programming languages –Scalable, fault-tolerant and guarantees that data will be processed • Use Cases –Real time analytics, online machine learning –Continuous Computation –Distributed RPC –ETL • Concepts –Topology –Spouts –Bolts
  • 17. Concepts • Storm Cluster – Master node(Nimbus) • Distributing code • Assigns tasks to machines • Monitors for failures – Worker nodes(Supervisor) • Starts/stops worker processes • Each worker process executes subset of a topology – Zookeeper • Coordinates between Nimbus and Supervisors • Nimbus and Supervisors completely stateless • State maintained by Zookeeper or local disks
  • 18. Details • Stream – Unbounded sequence of tuples • Spout(write logic) – Source of stream. Emits tuples • Bolt(write logic) – Processes streams and emits tuples • Topology – DAG of spouts and bolts – Submit a topology to a Storm cluster – Each node runs in parallel and parallelism is controlled
  • 19. Stream groupings • Tells a topology how to send tuples between two components • Since tasks are executed in parallel, how do we control which tasks the tuples are being sent to
  • 20. Why Use Twitter as Data Source
  • 21. Demo - Twitter TopN Trending Topic • Method 1 — Flume with interceptor • Method 2 — Storm with custom Twitter Spout • Method 3 — Flume + Kafka + Storm
  • 22. Demo - Twitter TopN Trending Topic • Use Flume Twitter Source to ingest data and publish event to Kafka topic • Use Kafka as messaging backbone • Use Storm as an Real-Time event processing system to calculate TopN trending topic • Use Redis to store the TopN Result • Use Node.js/JQuery for visualization
  • 30. Flume Agent — Source
  • 31. Flume Agent — Channel
  • 34. Submit Topology to Storm Production Cluster
  • 35. Submit Topology to Test Cluster
  • 39. Questions? 
 Vijay Mandava: vijay@insemble.com Lan Jiang: lan@insemble.com / @Lan_Jiang