Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Distribution Patterns with Apache NiFi
March 2016
Bryan Ben...
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Overview
• Frequently Asked Question:
 “How do I distribute dat...
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pushing to Listeners
NCM
Node 1
ListenHTTP
Node 2
ListenHTTP
Loa...
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pulling on All Nodes
NCM
Node 1
GetKafka
Node 2
GetKafka
Kafka T...
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pulling With List & Fetch Operations
NCM
Node 1 (Primary)
ListHD...
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Push
NCM
Node 1
Input Port
Node 2
Input Port
Standa...
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Pull
NCM
Node 1
RPG
Node 2
RPG
Standalone NiFi
Outp...
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
Several mechanism for distributing data across a cluster...
Nächste SlideShare
Wird geladen in …5
×

Data Distribution Patterns with Apache NiFi

5.348 Aufrufe

Veröffentlicht am

An overview of how to distribute data across a NiFi cluster. Includes pushing to listening processors, pulling on each node, pulling on primary node with redistribution, and site-to-site.

Veröffentlicht in: Software

Data Distribution Patterns with Apache NiFi

  1. 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Distribution Patterns with Apache NiFi March 2016 Bryan Bende – Member of Technical Staff
  2. 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Overview • Frequently Asked Question:  “How do I distribute data across my NiFi cluster?” • Answer:  “It depends…” • Typically based on whether the data is being pushed or pulled
  3. 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Pushing to Listeners NCM Node 1 ListenHTTP Node 2 ListenHTTP Load Balancer Data Producer Data Producer Data Producer • Processors listening for incoming data on each node in the cluster • Load balancer sitting in front of the cluster pointing at listeners • Data producers push data to load balancer which distributes data across the cluster • Same approach works for ListenSyslog, ListenUDP, and HandleHttpRequest
  4. 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Pulling on All Nodes NCM Node 1 GetKafka Node 2 GetKafka Kafka Topic • Works when data source ensures each request gets a unique piece of data • Kafka sees each GetKafka processor as the same client • Each GetKafka processor gets different data
  5. 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Pulling With List & Fetch Operations NCM Node 1 (Primary) ListHDFS HDFS FetchHDFS RPG Input Port Node 2 ListHDFS FetchHDFS RPG Input Port • Perform “list” operation on primary node • Send results to Remote Process Group pointing back to same cluster • Redistributes results to all nodes to perform “fetch” in parallel • Same approach for ListFile + FetchFile and ListSFTP + FetchSFTP
  6. 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Push NCM Node 1 Input Port Node 2 Input Port Standalone NiFi RPG • Site-To-Site makes a direct connection between NiFi instances • Either side can be a cluster or standalone instance • In a push scenario, the source connects a Remote Process Group to an Input Port on the destination • If pushing to a cluster, Site-To-Site takes care of load balancing across the nodes in the cluster
  7. 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Pull NCM Node 1 RPG Node 2 RPG Standalone NiFi Output Port • In a pull scenario, the destination connects a Remote Process Group to an Output Port on the source • Each node will pull different data from the source • If the source was a cluster, each node would pull from each node in the source cluster
  8. 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary Several mechanism for distributing data across a cluster… • Push to Listeners with a load balancer in front • Pull from a source that provides a queue • Pull with a List + Fetch approach • Push with Site-to-Site • Pull with Site-to-Site Contact Info • bbende@hortonworks.com • Twitter - @bbende

×