SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Page 1 © Hortonworks Inc. 2014
Real Time Processing with Hadoop
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Topics – Hortonworks and Apache Storm
•  Hortonworks and Enterprise Hadoop
•  Stream Processing with Apache Storm – Introduction
•  Apache Storm – Key Constructs
•  Apache Storm – Data Flows and Architecture
•  Apache Storm Usage Cases and Real World Walkthrough
•  Apache Storm Demo and How To
Page 3 © Hortonworks Inc. 2014
Hortonworks and Enterprise Hadoop
Hortonworks. We do Hadoop.
Page 4 © Hortonworks Inc. 2014
Our Mission:
Power your Modern Data Architecture
with HDP and Enterprise Apache Hadoop
Who we are
June 2011: Original 24 architects, developers, operators of Hadoop from Yahoo!
June 2014: An enterprise software company with 420+ Employees
Key Partners
Our model
Innovate and deliver Apache Hadoop as a complete enterprise data platform
completely in the open, backed by a world class support organization
…and hundreds more
Page 5 © Hortonworks Inc. 2014
HDP delivers a comprehensive data management platform
HDP 2.1
Hortonworks Data Platform
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN: Data Operating System
DATA MANAGEMENT
SECURITY
BATCH, INTERACTIVE & REAL-TIME
DATA ACCESS
GOVERNANCE
& INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
ISV
Engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS
(Hadoop Distributed File System)
In-Memory
Spark
Deployment Choice
Linux Windows On-Premise Cloud
YARN is the architectural
center of HDP
•  Enables batch, interactive
and real-time workloads
•  Single SQL engine for both batch
and interactive
•  Enable existing ISV apps to plug
directly into Hadoop via YARN
Provides comprehensive
enterprise capabilities
•  Governance
•  Security
•  Operations
The widest range of
deployment options
•  Linux & Windows
•  On premise & cloud
Tez
Tez
Page 6 © Hortonworks Inc. 2014
Choice Of Tool For Stream Processing
Storm
§  Event by Event processing
§  Suited for sub second real time processing
–  Fraud detection
Spark Streaming
§  Micro-Batch processing
§  Suited for near real time processing
–  Trending Facebook likes
HDFS2	
  
(redundant,	
  reliable	
  storage)	
  
YARN	
  
(cluster	
  resource	
  management)	
  
Spark	
  
(micro	
  batch)	
  
Apache	
  	
  
STORM	
  
(streaming)	
  
HADOOP 2.x
Tez	
  
(interac5ve)	
  
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Page 7 © Hortonworks Inc. 2014
Stream Processing with Apache Storm
Page 8 © Hortonworks Inc. 2014
Why stream processing IN Hadoop?
What is the need?
§  Exponential rise in real-time data
§  Ability to process real-time data opens new
business opportunities
Why Now?
§  Economics of Open source software &
commodity hardware
§  YARN allows multiple computing paradigms to
co-exist in the data lake
HDFS2	
  
(redundant,	
  reliable	
  storage)	
  
YARN	
  
(cluster	
  resource	
  management)	
  
Spark	
  
(micro	
  batch)	
  
Apache	
  	
  
STORM	
  
(streaming)	
  
HADOOP 2.x
Tez	
  
(interac5ve)	
  
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Stream processing has emerged as a key use case
Page 9 © Hortonworks Inc. 2014
Storm Use Cases – Business View
Monitor real-time data
to..
Prevent Optimize
Finance
- Securities Fraud
- Compliance violations
- Order routing
- Pricing
Telco
- Security breaches
- Network Outages
- Bandwidth allocation
- Customer service
Retail
- Offers
- Pricing
Manufacturing
- Device failures - Supply chain
Transportation
- Driver & fleet issues - Routes
- Pricing
Web
- Application failures
- Operational issues
- Site content
Sentiment Clickstream Machine/Sensor Logs Geo-location
- Shrinkage
- Stock outs
Page 10 © Hortonworks Inc. 2014
Stream processing very different from batch
Factors Streaming Batch
Data
Freshness Real-time (usually < 15
min)
Historical – usually more than
15 min old
Location Primarily memory (moved
to disk after processing)
Primarily in disk moved to
memory for processing
Processing
Speed Sub second to few
seconds
Few minutes to hours
Frequency Always running Sporadic to periodic
Clients
Who? Automated systems only Human & automated systems
Type Primarily operational
systems
Primarily analytical applications
Page 11 © Hortonworks Inc. 2014
Key Requirements of a Streaming Solution
•  Extremely high ingest rates – millions of events/secondData Ingest
•  Ability to easily plug different processing frameworks
•  Guaranteed processing – at least once processing semanticsProcessing
•  Ability to persist data to multiple relational and non- relational data storesPersistence
•  Security, HA, fault tolerance & management supportOperations
Page 12 © Hortonworks Inc. 2014
Apache Storm Leading for Stream Processing
Open source, real-time event stream processing platform that provides fixed,
continuous, & low latency processing for very high frequency streaming data
•  Horizontally scalable like Hadoop
•  Eg: 10 node cluster can process 1M tuples per secondHighly scalable
•  Automatically reassigns tasks on failed nodesFault-tolerant
•  Supports at least once & exactly once processing semantics
Guarantees
processing
•  Processing logic can be defined in any language
Language
agnostic
•  Brand, governance & a large active communityApache project
Page 13 © Hortonworks Inc. 2014
Storm Use Cases – IT operations view
•  Continuously ingest high rate messages, process them and update data
stores
Continuous
processing
•  Aggregate multiple data streams that emit data at extremely high rates
into one central data store
High speed data
aggregation
•  Filter out unwanted data on the fly before it is persisted to a data storeData filtering
•  Extremely resource( CPU, mem or I/O) intensive processing that would
take long time to process on a single machine can be parallelized with
Storm to reduce response times to seconds
Distributed RPC
response time
reduction
Page 14 © Hortonworks Inc. 2014
Apache Storm
Key Constructs
Hortonworks. We do Hadoop.
Page 15 © Hortonworks Inc. 2014
Key Constructs in Apache Storm
Basic Concepts of a Topology
Tuples and Streams, Sprouts, Bolts
Field Grouping
Architecture and Topology Submission
Parallelism
Processing Guarantee
Page 16 © Hortonworks Inc. 2014
Basic Concepts
Spouts: Generate streams.
Tuple: Most fundamental data structure and is a named
list of values that can be of any datatype
Streams: Groups of tuples
Bolts: Contain data processing, persistence and alerting
logic. Can also emit tuples for downstream bolts
Tuple Tree: First spout tuple and all the tuples that were
emitted by the bolts that processed it
Topology: Group of spouts and bolts wired together into a
workflow
Topology
Page 17 © Hortonworks Inc. 2014
Tuples and Streams
What is a Tuple?
§  Fundamental data structure in Storm. Is a named list of values that can be of any data type.
• What is a Stream?
– An unbounded sequences of tuples.
– Core abstraction in Storm and are what you “process” in Storm
Page 18 © Hortonworks Inc. 2014
Spouts
What is a Spout?
§  Generates or a source of Streams
– E.g.: JMS, Twitter, Log, Kafka Spout
§  Can spin up multiple instances of a Spout and dynamically adjust as needed
Page 19 © Hortonworks Inc. 2014
Bolts
What is a Bolt?
§  Processes any number of input streams and produces output streams
§  Common processing in bolts are functions, aggregations, joins, read/write to data stores,
alerting logic
§  Can spin up multiple instances of a Bolt and dynamically adjust as needed
Example of Bolts:
1.  HBaseBolt: persist stream in Hbase
2.  HDFSBolt: persist stream into HFDS as Avro Files using Flume
3.  MonitoringBolt: Read from Hbase and create alerts via email and messaging queues if
given thresholds are exceeded.
Page 20 © Hortonworks Inc. 2014
Fields Grouping
Question: If multiple instances of single bolt can be created, when a tuple
is emitted, how do you control what instance the tuple goes to?
Answer: Field Grouping
What is a Field grouping?
§  Provides various ways to control tuple routing to bolts.
§  Many field grouping exist include shuffle, fields, global..
Example of Field grouping in Use Case
§  Events for a given driver/truck must be sent to the same monitoring bolt instance.
Page 21 © Hortonworks Inc. 2014
Different Fields Grouping
Grouping type What it does When to use
Shuffle Grouping Sends tuple to a bolt in random round robin
sequence
- Doing atomic operations eg.math operations
Fields Grouping Sends tuples to a bolt based on or more field's in
the tuple
-  Segmentation of the incoming stream
-  Counting tuples of a certain type
All grouping Sends a single copy of each bolt to all instances
of a receiving bolt
-  Send some signal to all bolts like clear cache or
refresh state etc.
-  Send ticker tuple to signal bolts to save state etc.
Custom grouping Implement your own field grouping so tuples are
routed based on custom logic
-  Used to get max flexibility to change processing
sequence, logic etc. based on different factors like
data types, load, seasonality etc.
Direct grouping Source decides which bolt will receive tuple -  Depends
Global grouping Global Grouping sends tuples generated by all
instances of the source to a single target
instance (specifically, the task with lowest ID)
-  Global counts..
Page 22 © Hortonworks Inc. 2014
Data Processing Guarantees in Storm
Processing
guarantee
How is it achieved? Applicable use cases
At least once Replay tuples on failure -  Unordered idempotent operations
-  Sub second latency processing
Exactly once Trident -  Need ordered processing
-  Global counts
-  Context aware processing
-  Causality based
-  Latency not important
Page 23 © Hortonworks Inc. 2014
Architecture
Nimbus(Management server)
•  Similar to job tracker
•  Distributes code around cluster
•  Assigns tasks
•  Handles failures
Supervisor(slave nodes):
•  Similar to task tracker
•  Run bolts and spouts as ‘tasks’
Zookeeper:
•  Cluster co-ordination
•  Nimbus HA
•  Stores cluster metrics
•  Consumption related metadata for Trident
topologies
Page 24 © Hortonworks Inc. 2014
Parallelism in a Storm Topology
Supervisor
§  The supervisor listens for work assigned to its machine and starts and stops worker processes as
necessary based on what Nimbus has assigned to it.
Worker (JVM Process)
§  Is a JVM process that executes a subset of a topology
§  Belongs to a specific topology and may run one or more executors (threads) for one or more components
(spouts or bolts) of this topology.
Executors (Thread in JVM Process)
§  Is a thread that is spawned by a worker process and runs within the worker’s JVM
§  Runs one or more tasks for the same component (spout or bolt)
Tasks
§  Performs the actual data processing and is run within its parent executor’s thread of execution
Page 25 © Hortonworks Inc. 2014
Storm Components and Topology Submission
Submit
storm-event-processor
topology
Nimbus
(Yarn App Master Node)
Zookeeper ZookeeperZookeeper
Supervisor
(Slave Node)
Supervisor
(Slave Node)
Supervisor
(Slave Node)
Supervisor
(Slave Node)
Supervisor
(Slave Node)
Kafka
Spout
Kafka
Spout
Kafka
Spout
Kafka
Spout
Kafka
Spout
HDFS
Bolt
HDFS
Bolt
HDFS
Bolt
HBase
Bolt
HBase
Bolt
Monitor
Bolt
Monitor
Bolt
Nimbus (Management server)
•  Similar to job tracker
•  Distributes code around cluster
•  Assigns tasks
•  Handles failures
Supervisor (Worker nodes)
•  Similar to task tracker
•  Run bolts and spouts as ‘tasks’
Zookeeper:
•  Cluster co-ordination
•  Nimbus HA
•  Stores cluster metrics
Page 26 © Hortonworks Inc. 2014
Relationship Between Supervisors, Workers, Executors
& Tasks
Source: http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
Each supervisor machine in storm has specific
Predefined ports to which a worker process is assigned
supervisor
Page 27 © Hortonworks Inc. 2014
Parallelism Example
Example Configuration in Use Case:
§  Supervisors = 4
§  Workers = 8
§  spout.count = 5
§  bolt.count = 16
How many worker threads per supervisor?
§  8/4 = 2
How many total executors(threads)?
§  16 + 5 = 21
How many executors(threads) per worker?
§  21/8 = 2 to 3
Page 28 © Hortonworks Inc. 2014
Apache Storm
Data Flows and Architecture
Hortonworks. We do Hadoop.
Page 29 © Hortonworks Inc. 2014
Flexible Architecture with Apache Storm
Storm	
  
KAFKA	
  OR	
  JMS
-­‐Pump	
  data	
  into	
  storm	
  
-­‐Send	
  noEficaEons	
  from	
  	
  
	
  Storm
Any	
  applicaEon	
  development	
  
pla*orm	
  
Simplify	
  development	
  of	
  Storm	
  topologies
HDFS	
  
	
  Data	
  lake
Any	
  RDBMS	
  
Provide	
  reference	
  data	
  for	
  storm	
  
topologies
In-­‐memory	
  caching	
  
	
  plaMorms
Temporary	
  data	
  storage	
  	
  
Any	
  	
  
NoSQL	
  	
  
database	
  
Real-­‐Eme	
  views	
  
for	
  operaEonal	
  
dashboards
Any	
  	
  
search	
  
plaMorm	
  
Search	
  interface	
  
for	
  analysts	
  &	
  
dashboards
Page 30 © Hortonworks Inc. 2014
An Apache Storm Solution Architecture Example
HDP 2.x Data Lake
Online	
  Data	
  	
  
Processing	
  
HBase	
  
	
  Accumulo	
  
Real	
  Time	
  Stream	
  	
  
Processing	
  
Storm	
  
YARN	
  
HDFS	
  
APACHE	
  KAFKA	
  Real-time data
feeds
Search	
  
Solr
Page 31 © Hortonworks Inc. 2014
HPD Data Flow and Architecture
HDFS
Input Feed
Hive
Storm
UI
Query UI
Output Feed
Hbase & Solr
Page 32 © Hortonworks Inc. 2014
HDP Data Flow and Architecture
HDFS
Input Feed
Hive
Storm
UI
Query UI
Output Feed
Hbase & Solr
Source Input Stream
Page 33 © Hortonworks Inc. 2014
HDP Data Flow and Architecture
HDFS
Input Feed
Hive
Storm
UI
Query UI
Output Feed
Hbase & Solr
Ingest and Process Stream (Normalization/Micro ETL) in Real Time Using Storm
Page 34 © Hortonworks Inc. 2014
HDP Data Flow and Architecture
HDFS
Input Feed
Hive
Storm
UI
Query UI
Output Feed
Hbase & Solr
Publish to Output Feed (Available via RESTful API)
Push to Hbase & Solr and Sink to HDFS/HCatalog (Meta Model) for Ad-Hoc SQL
Page 35 © Hortonworks Inc. 2014
HDP Data Flow and Architecture
HDFS
Input Feed
Hive
Storm
UI
Query UI
Output Feed
Hbase & Solr
Data Available for Search and Query (RESTful API Calls an Option as Well)
Page 36 © Hortonworks Inc. 2014
Apache Storm – Code Walkthrough
Hortonworks. We do Hadoop.
Page 37 © Hortonworks Inc. 2014
Automated Stock Insight
Monitor Tweet Volume For Early Detection of Stock Movement
– Monitor Twitter stream for S&P 500 companies to identify & act on unexpected increase in tweet volume
Solution components
§  Ingest:
– Listen for Twitter streams related to S&P 500 companies
§  Processing
– Monitor tweets for unexpected volume
– Volume thresholds managed in HBASE
§  Persistence
– HBase & HDFS
§  Refine
– Update threshold values based on historical analysis of tweet volumes
Page 38 © Hortonworks Inc. 2014
High Level Architecture
Interactive Query
TEZ
Perform Ad Hoc
Queries on driver/truck
events and other
related data sources
Messaging Grid
(WMQ, ActiveMQ, Kafka)
truck
events
TOPIC
Stream Processing with Storm
Kafka Spout
HBase
Bolt
Monitoring
Bolt
HDFS
Bolt
High Speed Ingestion
Distributed Storage
HDFS
Write to
HDFS
Real-time Serviing with
HBase
driver
dangerous
events
driver
dangerou
s events
count
Write to
HBase
Update Alert
Thresholds
Batch Analytics
MR2
Do batch analysis/
models & update
HBase with right
thresholds for alerts
Truck Streaming Data
T(N)T(2)T(1)
Page 39 © Hortonworks Inc. 2014
HDP Provides a Single Data Platform
Interactive Query
TEZ
Perform Ad Hoc
Queries on driver/truck
events and other
related data sources
Messaging Grid
(WMQ, ActiveMQ, Kafka)
truck
events
TOPIC
Stream Processing with Storm
Kafka Spout
HBase
Bolt
Monitoring
Bolt
HDFS
Bolt
High Speed Ingestion
Distributed Storage
HDFS
Write to
HDFS
Real-time Serviing with
HBase
driver
dangerous
events
driver
dangerou
s events
count
Write to
HBase
Update Alert
Thresholds
Batch Analytics
MR2
Do batch analysis/
models & update
HBase with right
thresholds for alerts YARN
Enables 4 different apps/workloads on
a single cluster
HDP Data Lake
Truck Streaming Data
T(N)T(2)T(1)
Page 40 © Hortonworks Inc. 2014
HDP Provides a Single Data Platform
Interactive Query
TEZ
Perform Ad Hoc
Queries on driver/truck
events and other
related data sources
Messaging Grid
(WMQ, ActiveMQ, Kafka)
truck
events
TOPIC
Stream Processing with Storm
Kafka Spout
HBase
Bolt
Monitoring
Bolt
HDFS
Bolt
High Speed Ingestion
Distributed Storage
HDFS
Write to
HDFS
Real-time Serviing with
HBase
driver
dangerous
events
driver
dangerou
s events
count
Write to
HBase
Update Alert
Thresholds
Batch Analytics
MR2
Do batch analysis/
models & update
HBase with right
thresholds for alerts YARN
Enables 4 different apps/workloads on
a single cluster
HDP Data Lake
Truck Streaming Data
T(N)T(2)T(1)
Page 41 © Hortonworks Inc. 2014
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 

Was ist angesagt? (20)

Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 

Andere mochten auch

Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...Kai Wähner
 
Neo4j and the Panama Papers - FooCafe June 2016
Neo4j and the Panama Papers - FooCafe June 2016Neo4j and the Panama Papers - FooCafe June 2016
Neo4j and the Panama Papers - FooCafe June 2016Craig Taverner
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hivePadma shree. T
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoopGeoff Hendrey
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Aengus Rooney
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseRon Bodkin
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)Kay Lerch
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelArangoDB Database
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High AvailabilityDataWorks Summit
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseArangoDB Database
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 

Andere mochten auch (20)

Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
 
Neo4j and the Panama Papers - FooCafe June 2016
Neo4j and the Panama Papers - FooCafe June 2016Neo4j and the Panama Papers - FooCafe June 2016
Neo4j and the Panama Papers - FooCafe June 2016
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hive
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the Enterprise
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph Database
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 

Ähnlich wie Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop

Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Hortonworks
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsAaron Brooks
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseMingliang Liu
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Stormjustinjleet
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 

Ähnlich wie Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop (20)

Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Kürzlich hochgeladen (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop

  • 1. Page 1 © Hortonworks Inc. 2014 Real Time Processing with Hadoop Hortonworks. We do Hadoop.
  • 2. Page 2 © Hortonworks Inc. 2014 Topics – Hortonworks and Apache Storm •  Hortonworks and Enterprise Hadoop •  Stream Processing with Apache Storm – Introduction •  Apache Storm – Key Constructs •  Apache Storm – Data Flows and Architecture •  Apache Storm Usage Cases and Real World Walkthrough •  Apache Storm Demo and How To
  • 3. Page 3 © Hortonworks Inc. 2014 Hortonworks and Enterprise Hadoop Hortonworks. We do Hadoop.
  • 4. Page 4 © Hortonworks Inc. 2014 Our Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop Who we are June 2011: Original 24 architects, developers, operators of Hadoop from Yahoo! June 2014: An enterprise software company with 420+ Employees Key Partners Our model Innovate and deliver Apache Hadoop as a complete enterprise data platform completely in the open, backed by a world class support organization …and hundreds more
  • 5. Page 5 © Hortonworks Inc. 2014 HDP delivers a comprehensive data management platform HDP 2.1 Hortonworks Data Platform Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS YARN: Data Operating System DATA MANAGEMENT SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive HCatalog NoSQL HBase Accumulo Stream Storm Others ISV Engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) In-Memory Spark Deployment Choice Linux Windows On-Premise Cloud YARN is the architectural center of HDP •  Enables batch, interactive and real-time workloads •  Single SQL engine for both batch and interactive •  Enable existing ISV apps to plug directly into Hadoop via YARN Provides comprehensive enterprise capabilities •  Governance •  Security •  Operations The widest range of deployment options •  Linux & Windows •  On premise & cloud Tez Tez
  • 6. Page 6 © Hortonworks Inc. 2014 Choice Of Tool For Stream Processing Storm §  Event by Event processing §  Suited for sub second real time processing –  Fraud detection Spark Streaming §  Micro-Batch processing §  Suited for near real time processing –  Trending Facebook likes HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   Spark   (micro  batch)   Apache     STORM   (streaming)   HADOOP 2.x Tez   (interac5ve)   Multi Use Data Platform Batch, Interactive, Online, Streaming, …
  • 7. Page 7 © Hortonworks Inc. 2014 Stream Processing with Apache Storm
  • 8. Page 8 © Hortonworks Inc. 2014 Why stream processing IN Hadoop? What is the need? §  Exponential rise in real-time data §  Ability to process real-time data opens new business opportunities Why Now? §  Economics of Open source software & commodity hardware §  YARN allows multiple computing paradigms to co-exist in the data lake HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   Spark   (micro  batch)   Apache     STORM   (streaming)   HADOOP 2.x Tez   (interac5ve)   Multi Use Data Platform Batch, Interactive, Online, Streaming, … Stream processing has emerged as a key use case
  • 9. Page 9 © Hortonworks Inc. 2014 Storm Use Cases – Business View Monitor real-time data to.. Prevent Optimize Finance - Securities Fraud - Compliance violations - Order routing - Pricing Telco - Security breaches - Network Outages - Bandwidth allocation - Customer service Retail - Offers - Pricing Manufacturing - Device failures - Supply chain Transportation - Driver & fleet issues - Routes - Pricing Web - Application failures - Operational issues - Site content Sentiment Clickstream Machine/Sensor Logs Geo-location - Shrinkage - Stock outs
  • 10. Page 10 © Hortonworks Inc. 2014 Stream processing very different from batch Factors Streaming Batch Data Freshness Real-time (usually < 15 min) Historical – usually more than 15 min old Location Primarily memory (moved to disk after processing) Primarily in disk moved to memory for processing Processing Speed Sub second to few seconds Few minutes to hours Frequency Always running Sporadic to periodic Clients Who? Automated systems only Human & automated systems Type Primarily operational systems Primarily analytical applications
  • 11. Page 11 © Hortonworks Inc. 2014 Key Requirements of a Streaming Solution •  Extremely high ingest rates – millions of events/secondData Ingest •  Ability to easily plug different processing frameworks •  Guaranteed processing – at least once processing semanticsProcessing •  Ability to persist data to multiple relational and non- relational data storesPersistence •  Security, HA, fault tolerance & management supportOperations
  • 12. Page 12 © Hortonworks Inc. 2014 Apache Storm Leading for Stream Processing Open source, real-time event stream processing platform that provides fixed, continuous, & low latency processing for very high frequency streaming data •  Horizontally scalable like Hadoop •  Eg: 10 node cluster can process 1M tuples per secondHighly scalable •  Automatically reassigns tasks on failed nodesFault-tolerant •  Supports at least once & exactly once processing semantics Guarantees processing •  Processing logic can be defined in any language Language agnostic •  Brand, governance & a large active communityApache project
  • 13. Page 13 © Hortonworks Inc. 2014 Storm Use Cases – IT operations view •  Continuously ingest high rate messages, process them and update data stores Continuous processing •  Aggregate multiple data streams that emit data at extremely high rates into one central data store High speed data aggregation •  Filter out unwanted data on the fly before it is persisted to a data storeData filtering •  Extremely resource( CPU, mem or I/O) intensive processing that would take long time to process on a single machine can be parallelized with Storm to reduce response times to seconds Distributed RPC response time reduction
  • 14. Page 14 © Hortonworks Inc. 2014 Apache Storm Key Constructs Hortonworks. We do Hadoop.
  • 15. Page 15 © Hortonworks Inc. 2014 Key Constructs in Apache Storm Basic Concepts of a Topology Tuples and Streams, Sprouts, Bolts Field Grouping Architecture and Topology Submission Parallelism Processing Guarantee
  • 16. Page 16 © Hortonworks Inc. 2014 Basic Concepts Spouts: Generate streams. Tuple: Most fundamental data structure and is a named list of values that can be of any datatype Streams: Groups of tuples Bolts: Contain data processing, persistence and alerting logic. Can also emit tuples for downstream bolts Tuple Tree: First spout tuple and all the tuples that were emitted by the bolts that processed it Topology: Group of spouts and bolts wired together into a workflow Topology
  • 17. Page 17 © Hortonworks Inc. 2014 Tuples and Streams What is a Tuple? §  Fundamental data structure in Storm. Is a named list of values that can be of any data type. • What is a Stream? – An unbounded sequences of tuples. – Core abstraction in Storm and are what you “process” in Storm
  • 18. Page 18 © Hortonworks Inc. 2014 Spouts What is a Spout? §  Generates or a source of Streams – E.g.: JMS, Twitter, Log, Kafka Spout §  Can spin up multiple instances of a Spout and dynamically adjust as needed
  • 19. Page 19 © Hortonworks Inc. 2014 Bolts What is a Bolt? §  Processes any number of input streams and produces output streams §  Common processing in bolts are functions, aggregations, joins, read/write to data stores, alerting logic §  Can spin up multiple instances of a Bolt and dynamically adjust as needed Example of Bolts: 1.  HBaseBolt: persist stream in Hbase 2.  HDFSBolt: persist stream into HFDS as Avro Files using Flume 3.  MonitoringBolt: Read from Hbase and create alerts via email and messaging queues if given thresholds are exceeded.
  • 20. Page 20 © Hortonworks Inc. 2014 Fields Grouping Question: If multiple instances of single bolt can be created, when a tuple is emitted, how do you control what instance the tuple goes to? Answer: Field Grouping What is a Field grouping? §  Provides various ways to control tuple routing to bolts. §  Many field grouping exist include shuffle, fields, global.. Example of Field grouping in Use Case §  Events for a given driver/truck must be sent to the same monitoring bolt instance.
  • 21. Page 21 © Hortonworks Inc. 2014 Different Fields Grouping Grouping type What it does When to use Shuffle Grouping Sends tuple to a bolt in random round robin sequence - Doing atomic operations eg.math operations Fields Grouping Sends tuples to a bolt based on or more field's in the tuple -  Segmentation of the incoming stream -  Counting tuples of a certain type All grouping Sends a single copy of each bolt to all instances of a receiving bolt -  Send some signal to all bolts like clear cache or refresh state etc. -  Send ticker tuple to signal bolts to save state etc. Custom grouping Implement your own field grouping so tuples are routed based on custom logic -  Used to get max flexibility to change processing sequence, logic etc. based on different factors like data types, load, seasonality etc. Direct grouping Source decides which bolt will receive tuple -  Depends Global grouping Global Grouping sends tuples generated by all instances of the source to a single target instance (specifically, the task with lowest ID) -  Global counts..
  • 22. Page 22 © Hortonworks Inc. 2014 Data Processing Guarantees in Storm Processing guarantee How is it achieved? Applicable use cases At least once Replay tuples on failure -  Unordered idempotent operations -  Sub second latency processing Exactly once Trident -  Need ordered processing -  Global counts -  Context aware processing -  Causality based -  Latency not important
  • 23. Page 23 © Hortonworks Inc. 2014 Architecture Nimbus(Management server) •  Similar to job tracker •  Distributes code around cluster •  Assigns tasks •  Handles failures Supervisor(slave nodes): •  Similar to task tracker •  Run bolts and spouts as ‘tasks’ Zookeeper: •  Cluster co-ordination •  Nimbus HA •  Stores cluster metrics •  Consumption related metadata for Trident topologies
  • 24. Page 24 © Hortonworks Inc. 2014 Parallelism in a Storm Topology Supervisor §  The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Worker (JVM Process) §  Is a JVM process that executes a subset of a topology §  Belongs to a specific topology and may run one or more executors (threads) for one or more components (spouts or bolts) of this topology. Executors (Thread in JVM Process) §  Is a thread that is spawned by a worker process and runs within the worker’s JVM §  Runs one or more tasks for the same component (spout or bolt) Tasks §  Performs the actual data processing and is run within its parent executor’s thread of execution
  • 25. Page 25 © Hortonworks Inc. 2014 Storm Components and Topology Submission Submit storm-event-processor topology Nimbus (Yarn App Master Node) Zookeeper ZookeeperZookeeper Supervisor (Slave Node) Supervisor (Slave Node) Supervisor (Slave Node) Supervisor (Slave Node) Supervisor (Slave Node) Kafka Spout Kafka Spout Kafka Spout Kafka Spout Kafka Spout HDFS Bolt HDFS Bolt HDFS Bolt HBase Bolt HBase Bolt Monitor Bolt Monitor Bolt Nimbus (Management server) •  Similar to job tracker •  Distributes code around cluster •  Assigns tasks •  Handles failures Supervisor (Worker nodes) •  Similar to task tracker •  Run bolts and spouts as ‘tasks’ Zookeeper: •  Cluster co-ordination •  Nimbus HA •  Stores cluster metrics
  • 26. Page 26 © Hortonworks Inc. 2014 Relationship Between Supervisors, Workers, Executors & Tasks Source: http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ Each supervisor machine in storm has specific Predefined ports to which a worker process is assigned supervisor
  • 27. Page 27 © Hortonworks Inc. 2014 Parallelism Example Example Configuration in Use Case: §  Supervisors = 4 §  Workers = 8 §  spout.count = 5 §  bolt.count = 16 How many worker threads per supervisor? §  8/4 = 2 How many total executors(threads)? §  16 + 5 = 21 How many executors(threads) per worker? §  21/8 = 2 to 3
  • 28. Page 28 © Hortonworks Inc. 2014 Apache Storm Data Flows and Architecture Hortonworks. We do Hadoop.
  • 29. Page 29 © Hortonworks Inc. 2014 Flexible Architecture with Apache Storm Storm   KAFKA  OR  JMS -­‐Pump  data  into  storm   -­‐Send  noEficaEons  from      Storm Any  applicaEon  development   pla*orm   Simplify  development  of  Storm  topologies HDFS    Data  lake Any  RDBMS   Provide  reference  data  for  storm   topologies In-­‐memory  caching    plaMorms Temporary  data  storage     Any     NoSQL     database   Real-­‐Eme  views   for  operaEonal   dashboards Any     search   plaMorm   Search  interface   for  analysts  &   dashboards
  • 30. Page 30 © Hortonworks Inc. 2014 An Apache Storm Solution Architecture Example HDP 2.x Data Lake Online  Data     Processing   HBase    Accumulo   Real  Time  Stream     Processing   Storm   YARN   HDFS   APACHE  KAFKA  Real-time data feeds Search   Solr
  • 31. Page 31 © Hortonworks Inc. 2014 HPD Data Flow and Architecture HDFS Input Feed Hive Storm UI Query UI Output Feed Hbase & Solr
  • 32. Page 32 © Hortonworks Inc. 2014 HDP Data Flow and Architecture HDFS Input Feed Hive Storm UI Query UI Output Feed Hbase & Solr Source Input Stream
  • 33. Page 33 © Hortonworks Inc. 2014 HDP Data Flow and Architecture HDFS Input Feed Hive Storm UI Query UI Output Feed Hbase & Solr Ingest and Process Stream (Normalization/Micro ETL) in Real Time Using Storm
  • 34. Page 34 © Hortonworks Inc. 2014 HDP Data Flow and Architecture HDFS Input Feed Hive Storm UI Query UI Output Feed Hbase & Solr Publish to Output Feed (Available via RESTful API) Push to Hbase & Solr and Sink to HDFS/HCatalog (Meta Model) for Ad-Hoc SQL
  • 35. Page 35 © Hortonworks Inc. 2014 HDP Data Flow and Architecture HDFS Input Feed Hive Storm UI Query UI Output Feed Hbase & Solr Data Available for Search and Query (RESTful API Calls an Option as Well)
  • 36. Page 36 © Hortonworks Inc. 2014 Apache Storm – Code Walkthrough Hortonworks. We do Hadoop.
  • 37. Page 37 © Hortonworks Inc. 2014 Automated Stock Insight Monitor Tweet Volume For Early Detection of Stock Movement – Monitor Twitter stream for S&P 500 companies to identify & act on unexpected increase in tweet volume Solution components §  Ingest: – Listen for Twitter streams related to S&P 500 companies §  Processing – Monitor tweets for unexpected volume – Volume thresholds managed in HBASE §  Persistence – HBase & HDFS §  Refine – Update threshold values based on historical analysis of tweet volumes
  • 38. Page 38 © Hortonworks Inc. 2014 High Level Architecture Interactive Query TEZ Perform Ad Hoc Queries on driver/truck events and other related data sources Messaging Grid (WMQ, ActiveMQ, Kafka) truck events TOPIC Stream Processing with Storm Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt High Speed Ingestion Distributed Storage HDFS Write to HDFS Real-time Serviing with HBase driver dangerous events driver dangerou s events count Write to HBase Update Alert Thresholds Batch Analytics MR2 Do batch analysis/ models & update HBase with right thresholds for alerts Truck Streaming Data T(N)T(2)T(1)
  • 39. Page 39 © Hortonworks Inc. 2014 HDP Provides a Single Data Platform Interactive Query TEZ Perform Ad Hoc Queries on driver/truck events and other related data sources Messaging Grid (WMQ, ActiveMQ, Kafka) truck events TOPIC Stream Processing with Storm Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt High Speed Ingestion Distributed Storage HDFS Write to HDFS Real-time Serviing with HBase driver dangerous events driver dangerou s events count Write to HBase Update Alert Thresholds Batch Analytics MR2 Do batch analysis/ models & update HBase with right thresholds for alerts YARN Enables 4 different apps/workloads on a single cluster HDP Data Lake Truck Streaming Data T(N)T(2)T(1)
  • 40. Page 40 © Hortonworks Inc. 2014 HDP Provides a Single Data Platform Interactive Query TEZ Perform Ad Hoc Queries on driver/truck events and other related data sources Messaging Grid (WMQ, ActiveMQ, Kafka) truck events TOPIC Stream Processing with Storm Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt High Speed Ingestion Distributed Storage HDFS Write to HDFS Real-time Serviing with HBase driver dangerous events driver dangerou s events count Write to HBase Update Alert Thresholds Batch Analytics MR2 Do batch analysis/ models & update HBase with right thresholds for alerts YARN Enables 4 different apps/workloads on a single cluster HDP Data Lake Truck Streaming Data T(N)T(2)T(1)
  • 41. Page 41 © Hortonworks Inc. 2014 Q&A