Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Harnessing Data-in-Motion
with Hortonworks DataFlow
Apache NiFi, Kaf...
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Introduction to Hortonworks Data Flow
• Introduction to Apa...
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Se...
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management Flow management + Stream Processing
D A T A I N M O ...
Introduction to
Apache Projects
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi?
• Created to address the challenges of global e...
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
What is Apache NiFi used for?
• Reliable and secure tran...
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Terminology
FlowFile
• Unit of data moving through the system
•...
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache Kafka? APACHE
KAFKA
• Distributed streaming platform...
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka: Anatomy of a Topic
Partition
0
Partition
1
Partition
2
0 0 0...
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi and Kafka Are Complementary
NiFi
Provide dataflow solution
• C...
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache Storm?
• Distributed, low-latency, fault-tolerant, S...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Tuples and Streams
• What is a Tuple?
–Fundamental data str...
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Spouts
• What is a Spout?
–Source of data
–E.g.: JMS, Twitt...
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Bolts
• What is a Bolt?
–Processes any number of input stre...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Topology
• What is a Topology?
–A network of spouts and bol...
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
+
NiFi and Storm Are Complementary
NiFi
Simple event processing
• M...
Better Together
+ +
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points
• NiFi - Kafka
– NiFi Kafka Producer
– NiFi ...
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points – NiFi & Kafka
NiFi
MiNiFi
MiNiFi
MiNiFi
Kaf...
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points – NiFi & Kafka
Kafka
Producer 1
Producer 2
P...
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points – Storm & Kafka
• storm-kafka module
– Kafka...
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better Together
NiFiMiNiFi
Kafka
Storm
Incoming Topic
Results Topic...
Best Practices
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi PublishKafka
Apache NiFi - Node 1
Apache Kafka
Topic 1 - Parti...
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Nodes = Partitions
Apache NiFi - Node 1
Apache ...
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Nodes > Partitions
Apache NiFi - Node 1
Apache ...
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Nodes < Partitions
Apache NiFi - Node 1
Apache ...
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Tasks = Partitions
Apache NiFi - Node 1
Apache ...
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Tasks > Partitions
Apache NiFi - Node 1
Consume...
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Processors & Batching Messages
• PublishKafka - ‘Message Dema...
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best Practice Summary
• PublishKafka
• Each concurrent task is an i...
Demo!
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary of the Demo Scenario
Truck Sensors
NiFi
MiNiFi
Kafka Storm
...
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo – Data Generator
 Geo Event
2016-11-07 10:34:52.922|truck_geo...
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo – MiNiFi
Processors:
- name: TailFile
class: org.apache.nifi.p...
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo - NiFi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo - Storm
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo - Dashboard
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Hortonworks Community Connection:
Data Ingestion and Str...
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberized interaction w/Kafka GetKafka PutKafka
Kafka broker 0.8 (...
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (...
Nächste SlideShare
Wird geladen in …5
×

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Together

41.482 Aufrufe

Veröffentlicht am

Apache NiFi, Storm and Kafka augment each other in modern enterprise architectures. NiFi provides a coding free solution to get many different formats and protocols in and out of Kafka and compliments Kafka with full audit trails and interactive command and control. Storm compliments NiFi with the capability to handle complex event processing.



Join us to learn how Apache NiFi, Storm and Kafka can augment each other for creating a new dataplane connecting multiple systems within your enterprise with ease, speed and increased productivity.

https://www.brighttalk.com/webcast/9573/224063

Veröffentlicht in: Technologie

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Together

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Harnessing Data-in-Motion with Hortonworks DataFlow Apache NiFi, Kafka and Storm Better Together Bryan Bende Sr. Software Engineer Haimo Liu Product Manager
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Introduction to Hortonworks Data Flow • Introduction to Apache projects • Better together • Best Practices • Demo
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Flow Management Enterprise Services At the edge Security Visualization On premises In the cloud Registries/Catalogs Governance (Security/Compliance) Operations HDF 2.0 – Data in Motion Platform
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow Management Flow management + Stream Processing D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0 – Data in Motion Platform Enterprise Services Ambari Ranger Other services
  6. 6. Introduction to Apache Projects
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi? • Created to address the challenges of global enterprise dataflow • Key features: – Visual Command and Control – Data Lineage (Provenance) – Data Prioritization – Data Buffering/Back-Pressure – Control Latency vs. Throughput – Secure Control Plane / Data Plane – Scale Out Clustering – Extensibility
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Complex Rolling Window Operations
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Terminology FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache Kafka? APACHE KAFKA • Distributed streaming platform that allows publishing and subscribing to streams of records • Streams of records are organized into categories called topics • Topics can be partitioned and/or replicated • Records consist of a key, value, and timestamp http://kafka.apache.org/intro Kafka Cluster producer producer producer consumer consumer consumer
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka: Anatomy of a Topic Partition 0 Partition 1 Partition 2 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 11 11 12 Writes Old New  Partitioning allows topics to scale beyond a single machine/node  Topics can also be replicated, for high availability. APACHE KAFKA
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi and Kafka Are Complementary NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow Kafka Provide durable stream store • Low latency • Distributed data durability • Decentralized management of producers & consumers +
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache Storm? • Distributed, low-latency, fault-tolerant, Stream Processing platform. • Provides processing guarantees. • Key concepts include: • Tuples • Streams • Spouts • Bolts • Topology
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Tuples and Streams • What is a Tuple? –Fundamental data structure in Storm –Named list of values that can be of any data type •What is a Stream? –An unbounded sequences of tuples. –Core abstraction in Storm and are what you “process” in Storm
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Spouts • What is a Spout? –Source of data –E.g.: JMS, Twitter, Log, Kafka Spout –Can spin up multiple instances of a Spout and dynamically adjust as needed
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Bolts • What is a Bolt? –Processes any number of input streams and produces output streams –Common processing in bolts are functions, aggregations, joins, R/W to data stores, alerting logic –Can spin up multiple instances of a Bolt and dynamically adjust as needed
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Topology • What is a Topology? –A network of spouts and bolts wired together into a workflow Truck-Event-Processor Topology Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt WebSocket Bolt Stream Stream Stream Stream
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved + NiFi and Storm Are Complementary NiFi Simple event processing • Manages flow of data between producers and consumers across the enterprise • Data enrichment, splitting, aggregation, format conversion, schema translation… • Scale out to handle gigabytes per second, or scale down to a Raspberry PI handling tens of thousands of events per second Storm Complex and distributed processing • Complex processing from multiple streams (JOIN operations) • Analyzing data across time windows (rolling window aggregation, standard deviation, etc.) • Scale out to thousands of nodes if needed +
  19. 19. Better Together + +
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points • NiFi - Kafka – NiFi Kafka Producer – NiFi Kafka Consumer • Storm - Kafka – Storm Kafka Consumer – Storm Kafka Producer + +
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points – NiFi & Kafka NiFi MiNiFi MiNiFi MiNiFi Kafka Consumer 1 Consumer 2 Consumer N • Producer Processors • PutKafka (0.8 Kafka Client) • PublishKafka (0.9 Kafka Client) • PublishKafka_0_10 (0.10 Kafka Client) +
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points – NiFi & Kafka Kafka Producer 1 Producer 2 Producer N NiFi Destination 1 Destination 2 Destination 3 • Consumer Processors • GetKafka (0.8 Kafka Client) • ConsumeKafka (0.9 Kafka Client) • ConsumeKafka_0_10 (0.10 Kafka Client) +
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points – Storm & Kafka • storm-kafka module – KafkaSpout (Core & Trident) & KafkaBolt – Compatible with Kafka 0.8 and 0.9 client – Kafka client declared by topology developer • storm-kafka-client module – KafkaSpout & KafkaSpoutTuplesBuilder – Compatible with Kafka 0.9 and 0.10 client – Kafka client declared by topology developer Kafka Storm Incoming Topic Results Topic KafkaSpout KafkaBolt +
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Better Together NiFiMiNiFi Kafka Storm Incoming Topic Results Topic PublishKafka ConsumeKafka Destinations MiNiFi • MiNiFi – Collection, filtering, and prioritization at the edge • NiFi - Central data flow management, routing, enriching, and transformation • Kafka - Central messaging bus for subscription by downstream consumers • Storm - Streaming analytics focused on complex event processing + +
  25. 25. Best Practices
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi PublishKafka Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 PublishKafka Apache NiFi – Node 2 PublishKafka = Concurrent Task • Each NiFi node runs an instance of PublishKafka • Each instance has one or more concurrent tasks (threads) • Each concurrent task is an independent producer, sends data round-robin to partitions of a topic +
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Nodes = Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task • Each NiFi node runs an instance of ConsumeKafka • Each instance has one or more concurrent tasks (threads) • Each concurrent task is a consumer assigned to a single partition • Kafka Client ensures a given partition can only have one consumer/thread in a consumer group +
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Nodes > Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Apache NiFi – Node 3 ConsumeKafka (consumer group 1) • Remember… each partition can only have one consumer from the same group • When there are more NiFi nodes than partitions, some nodes won’t consume anything +
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Nodes < Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Topic 1 - Partition 3 Topic 1 - Partition 4 • When there are less NiFi nodes/tasks than partitions, multiple partitions will be assigned to each node/task
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Tasks = Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Topic 1 - Partition 3 Topic 1 - Partition 4 • When there are less NiFi nodes than partitions, we can increase the concurrent tasks on each node • Kafka Client will automatically rebalance partition assignment • Improves throughput
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Tasks > Partitions Apache NiFi - Node 1 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 • Increasing concurrent tasks only makes sense when the number of partitions is greater than the number of nodes • Otherwise we end up with some tasks not consuming anything +
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka Processors & Batching Messages • PublishKafka - ‘Message Demarcator’ • If not specified, flow file content sent as a single message • If specified, flow file content separated into multiple messages based on demarcator • Ex: Sending 1 million messages to Kafka – significantly better performance with 1 flow file containing 1 million demarcated messages vs. 1 million flow files with a single message • ConsumeKafka - ‘Message Demarcator’ • If not specified, a flow file is produced for each message consumed • If specified, multiple messages written to a single flow file separated by the demarcator • Maximum # of messages written to a single flow file equals ‘Max Poll Records’
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best Practice Summary • PublishKafka • Each concurrent task is an independent producer • Scale number of concurrent tasks according to data flow • ConsumeKafka • Kafka client assigns one thread per-partition with in a consumer group • Create optimal alignment between # of partitions and # of consumer tasks • Avoid having more tasks than partitions • Batching • Message Demarcator property on PublishKafka and ConsumeKafka • Can achieve significantly better performance
  34. 34. Demo!
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary of the Demo Scenario Truck Sensors NiFi MiNiFi Kafka Storm Speed Events Average Speed PublishKafka ConsumeKafka Dashboard Windowed Avg. Speed • MiNiFi – Collects data from truck sensors • NiFi – Filter/enrich truck data, deliver to Kafka, consume results • Kafka - Central messaging bus, Storm consumes from and publishes to • Storm – Computes average speed over a time window per driver & route + ++
  36. 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo – Data Generator  Geo Event 2016-11-07 10:34:52.922|truck_geo_event|73|10|George Vetticaden|1390372503|Saint Louis to Tulsa|Normal|38.14|- 91.3|1|  Speed Event 2016-11-07 10:34:52.922|truck_speed_event|73|10|George Vetticaden|1390372503|Saint Louis to Tulsa|70|
  37. 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo – MiNiFi Processors: - name: TailFile class: org.apache.nifi.processors.standard.TailFile ... Properties: File Location: Local File to Tail: /tmp/truck-sensor-data/truck-1.txt ... Connections: - name: TailFile/success/2042214b-0158-1000-353d-654ef72c7307 source name: TailFile ... Remote Processing Groups: - name: http://localhost:9090/nifi url: http://localhost:9090/nifi ... Input Ports: - id: 2042214b-0158-1000-353d-654ef72c7307 name: Truck Events ...
  38. 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo - NiFi
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo - Storm
  40. 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo - Dashboard
  41. 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/
  42. 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberized interaction w/Kafka GetKafka PutKafka Kafka broker 0.8 (HDP 2.3.2) Supported Supported Kafka broker 0.9 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9 (Apache) Not Supported Not Supported Non-Kerberized interaction w/Kafka GetKafka PutKafka Kafka broker 0.8 (HDP 2.3.2) Supported Supported Kafka broker 0.9 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) Supported Supported Kafka broker 0.9 (Apache) Supported Supported SSL Interaction w/ Kafka GetKafka PutKafka Kafka broker 0.8 (HDP 2.3.2) N/A N/A Kafka broker 0.9 (HDP 2.3.4 +) Not Supported Not Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9 (Apache) Not Supported Not Supported HDF Kafka Processor Compatibility
  43. 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (2 sets) Kafka broker 0.8 (HDP 2.3.2) Not Supported Not Supported Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9/0.10 (Apache) Supported Supported Non-Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (2 sets) Kafka broker 0.8 (HDP 2.3.2) Not Supported Not Supported Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) Not Supported Not Supported Kafka broker 0.9/0.10 (Apache) Supported Supported SSL Interaction w/ Kafka ConsumeKafka (2 sets) PublishKafka (2 sets) Kafka broker 0.8 (HDP 2.3.2) N/A N/A Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9/0.10 (Apache) Supported Supported HDF Kafka Processor Compatibility

×