SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Apache NiFi in the
Hadoop Ecosystem
Bryan Bende – Member of Technical Staff
Hadoop Summit 2016 Dublin
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Outline
• What is Apache NiFi?
• Hadoop Ecosystem Integrations
• Use-Case/Architecture Discussion
• Future Work
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About Me
• Member of Technical Staff on Hortonworks DataFlow Team
• Apache NiFi Committer & PMC Member
• Integrations for HBase, Solr, Syslog, Stream Processing
• Twitter: @bbende / Blog: bryanbende.com
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi?
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Enterprise Data Flow
Data Flow
Process and Analyze
Data
Acquire Data
Store Data
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Different organizations/business units across different geographic locations…
Realistic View of Enterprise Data Flow
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with different business partners and customers
Realistic View of Enterprise Data Flow
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
• Created to address the challenges of global enterprise dataflow
• Key features:
– Visual Command and Control
– Data Lineage (Provenance)
– Data Prioritization
– Data Buffering/Back-Pressure
– Control Latency vs. Throughput
– Secure Control Plane / Data Plane
– Scale Out Clustering
– Extensibility
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
What is Apache NiFi used for?
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
– Conversion between formats
– Extraction/Parsing
– Routing decisions
What is Apache NiFi NOT used for?
• Distributed Computation
• Complex Event Processing
• Joins / Complex Rolling Window Operations
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Ecosystem Integrations
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Ingest
MergeContent
• Merges into appropriately sized files for HDFS
• Based on size, number of messages, and time
UpdateAttribute
• Sets the HDFS directory and filename
• Use expression language to dynamically bin by date:
/data/${now():format('yyyy/MM/dd/HH')}/
PutHDFS
• Writes FlowFile content to HDFS
• Supports Conflict Resolution Strategy and Kerberos
authentication
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Retrieval
ListHDFS
• Periodically perform listing on HDFS directory
• Produces FlowFile per HDFS file
• Flow only contains HDFS path & filename
FetchHDFS
• Retrieves a file from HDFS
• Use incoming FlowFiles to dynamically fetch:
HDFS Filename: ${path}/${filename}
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Retrieval in a Cluster
NCM
Node 1 (Primary)
ListHDFS
HDFS
FetchHDFS
RPG
Input Port
Node 2
ListHDFS
FetchHDFS
RPG
Input Port
• Perform “list” operation on
primary node
• Send results to Remote
Process Group pointing back
to same cluster
• Redistributes results to all
nodes to perform “fetch” in
parallel
• Same approach for ListFile +
FetchFile and ListSFTP +
FetchSFTP
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Integration
• ControllerService wrapping HBase Client
• Implementation provided for HBase 1.1.2 Client
• Other implementations could be built as an extension
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Ingest – Single Cell
• Table, Row Id, Col Family, and Col Qualifier
provided in processor, or dynamically from
attributes
• FlowFile content becomes the cell value
• Batch Size to specify maximum number of cells
for a single ‘put’ operation
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Ingest – Full Row
• Table and Column Family provided in
processor, or dynamically from attributes
• Row ID can be a field in JSON, or a FlowFile
attribute
• JSON Field/Values become Column Qualifiers
and Values
• Complex Field Strategy
– Fail
– Warn
– Ignore
– Text
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Integration
PutKafka
• Provide Broker and Topic Name
• Publishes FlowFile content as one or more messages
• Ability to send large delimited content, split into
messages by NiFi
GetKafka
• Provide ZK Connection String and Topic Name
• Produces a FlowFile for each message consumed
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing Integration
• Stream processing systems
generally pull data, then push
results
• NiFi Site-To-Site pushes and pulls
between NiFi instances
• The Site-To-Site Client can be used
from a stream processing platform
https://github.com/apache/nifi/tree/master/nifi
-commons/nifi-site-to-site-client
Stream Processing
Site-To-Site Client
NiFi
Output Port
Stream Processing
Site-To-Site Client
NiFi
Input Port
Pull
Push
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Site-to-Site Client Overview
• Push to Input Port, or Pull from Output Port
• Communicate with NiFi clusters, or standalone instances
• Handles load balancing and reliable delivery
• Secure connections using certificates (optional)
SiteToSiteClientConfig clientConfig =
new SiteToSiteClient.Builder()
.url(“http://localhost:8080/nifi”)
.portName(”My Port”)
.buildConfig();
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Site-To-Site Client Pulling
SiteToSiteClient client = ...
Transaction transaction =
client.createTransaction(TransferDirection.RECEIVE);
DataPacket dataPacket = transaction.receive();
while (dataPacket != null) {
...
}
transaction.confirm();
transaction.complete();
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Site-To-Site Client Pushing
SiteToSiteClient client = ...
Transaction transaction =
client.createTransaction(TransferDirection.SEND);
NiFiDataPacket data = ...
transaction.send(data.getContent(), data.getAttributes());
transaction.confirm();
transaction.complete();
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Stream Processing Integrations
Spark Streaming - NiFi Spark Receiver
• https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver
Storm – NiFi Spout
• https://github.com/apache/nifi/tree/master/nifi-external/nifi-storm-spout
Flink – NiFi Source & Sink
• https://github.com/apache/flink/tree/master/flink-streaming-connectors/flink-connector-nifi
Apex - NiFi Input Operators & Output Operators
• https://github.com/apache/incubator-apex-
malhar/tree/master/contrib/src/main/java/com/datatorrent/contrib/nifi
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other Relevant Integrations
• GetSolr, PutSolrContentStream
• FetchElasticSearch, PutElasticSearch
• GetMongo, PutMongo
• QueryCassandra, PutCassandraQL
• GetCouchbaseKey, PutCouchbaseKey
• QueryDatabaseTable, ExecuteSQL, PutSQL
• GetSplunk, PutSplunk
• And more! https://nifi.apache.org/docs.html
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use-Case/Architecture Discussion
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Drive Data to Core for Analysis
NiFi
Stream
Processing
NiFi
NiFi
• Drive data from sources to central data center for analysis
• Tiered collection approach at various locations, think regional data centers
Edge
Edge
Core
Batch
Analytics
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamically Adjusting Data Flows
• Push analytic results back to core NiFi
• Push results back to edge locations/devices to change behavior
NiFi
NiFi
NiFi
Edge
Edge
Core
Batch
Analytics
Stream
Processing
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
1. Logs filtered by level and sent from Edge -> Core
2. Stream processing produces new filters based on rate & sends back to core
3. Edge polls core for new filter levels & updates filtering
Example: Dynamic Log Collection
Core NiFi
Streaming
Processing
Edge NiFi
Logs Logs
New Filters
Logs Output Log Input Log Output
Result Input Store Result
Service Fetch ResultPoll Service
Filter
New Filters
New
Filters
Poll
Analytic
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Log Collection – Edge NiFi
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Log Collection – Core NiFi
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Log Collection Summary
NiFi
NiFi
NiFi
Edge
Edge
Core
Stream
Processing
Logs
Logs
Logs
New FiltersNew Filters
New Filters
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Work
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future – Ecosystem Integrations
• Ambari
– Support a fully managed NiFi Cluster through Ambari
– Monitoring, management, upgrades, etc.
• Ranger
– Ability to delegate authorization decisions to Ranger
– Manage authorization controls through Ranger
• Atlas
– Track lineage from the source to destination
– Apply tags to data as its acquired
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future – Apache NiFi
• HA Control Plane
– Zero Master cluster, Web UI accessible from any node
– Auto-Election of “Cluster Coordinator” and “Primary Node” through ZooKeeper
• HA Data Plane
– Ability to replicate data across nodes in a cluster
• Multi-Tenancy
– Restrict Access to portions of a flow
– Allow people/groups with in an organization to only access their portions of the flow
• Extension Registry
– Create a central repository of NARs and Templates
– Move most NARs out of Apache NiFi distribution, ship with a minimal set
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future – Apache NiFi
• Variable Registry
– Define environment specific variables through the UI, reference through EL
– Make templates more portable across environments/instances
• Redesign of User Interface
– Modernize look & feel, improve usability, support multi-tenancy
• Continued Development of Integration Points
– New processors added continuously!
• MiNiFi
– Complimentary data collection agent to NiFi’s current approach
– Small, lightweight, centrally managed agent that integrates with NiFi for follow-on dataflow
management
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thanks!
• Questions?
• Contact Info:
– Email: bbende@hortonworks.com
– Twitter: @bbende
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Terminology, value-sets, codesystems by Lloyd McKenzie
Terminology, value-sets, codesystems by Lloyd McKenzieTerminology, value-sets, codesystems by Lloyd McKenzie
Terminology, value-sets, codesystems by Lloyd McKenzieFHIR Developer Days
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Reference design for v mware nsx
Reference design for v mware nsxReference design for v mware nsx
Reference design for v mware nsxsolarisyougood
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Timothy Spann
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
FIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFermin Galan
 
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...HostedbyConfluent
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetryVincent Behar
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 

Was ist angesagt? (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Terminology, value-sets, codesystems by Lloyd McKenzie
Terminology, value-sets, codesystems by Lloyd McKenzieTerminology, value-sets, codesystems by Lloyd McKenzie
Terminology, value-sets, codesystems by Lloyd McKenzie
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Reference design for v mware nsx
Reference design for v mware nsxReference design for v mware nsx
Reference design for v mware nsx
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
FIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scale
 
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 

Andere mochten auch

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataMats Johansson
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionMilind Pandit
 
Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介Talend KK
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFHortonworks
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of ThingsSujee Maniyam
 

Andere mochten auch (20)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Aliran sesat
Aliran sesatAliran sesat
Aliran sesat
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Hbase trabalho final
Hbase trabalho finalHbase trabalho final
Hbase trabalho final
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 

Ähnlich wie Apache NiFi in the Hadoop Ecosystem

State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFiHortonworks
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiJoe Percivall
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkHortonworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...huguk
 

Ähnlich wie Apache NiFi in the Hadoop Ecosystem (20)

State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 

Mehr von Bryan Bende

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiBryan Bende
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without DataBryan Bende
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record ProcessingBryan Bende
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and ApexBryan Bende
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud ComputingBryan Bende
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 

Mehr von Bryan Bende (9)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFi
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 

Kürzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Apache NiFi in the Hadoop Ecosystem

  • 1. Apache NiFi in the Hadoop Ecosystem Bryan Bende – Member of Technical Staff Hadoop Summit 2016 Dublin
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Outline • What is Apache NiFi? • Hadoop Ecosystem Integrations • Use-Case/Architecture Discussion • Future Work
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About Me • Member of Technical Staff on Hortonworks DataFlow Team • Apache NiFi Committer & PMC Member • Integrations for HBase, Solr, Syslog, Stream Processing • Twitter: @bbende / Blog: bryanbende.com
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi?
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplistic View of Enterprise Data Flow Data Flow Process and Analyze Data Acquire Data Store Data
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Different organizations/business units across different geographic locations… Realistic View of Enterprise Data Flow
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with different business partners and customers Realistic View of Enterprise Data Flow
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi • Created to address the challenges of global enterprise dataflow • Key features: – Visual Command and Control – Data Lineage (Provenance) – Data Prioritization – Data Buffering/Back-Pressure – Control Latency vs. Throughput – Secure Control Plane / Data Plane – Scale Out Clustering – Extensibility
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Joins / Complex Rolling Window Operations
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Ecosystem Integrations
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Ingest MergeContent • Merges into appropriately sized files for HDFS • Based on size, number of messages, and time UpdateAttribute • Sets the HDFS directory and filename • Use expression language to dynamically bin by date: /data/${now():format('yyyy/MM/dd/HH')}/ PutHDFS • Writes FlowFile content to HDFS • Supports Conflict Resolution Strategy and Kerberos authentication
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Retrieval ListHDFS • Periodically perform listing on HDFS directory • Produces FlowFile per HDFS file • Flow only contains HDFS path & filename FetchHDFS • Retrieves a file from HDFS • Use incoming FlowFiles to dynamically fetch: HDFS Filename: ${path}/${filename}
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Retrieval in a Cluster NCM Node 1 (Primary) ListHDFS HDFS FetchHDFS RPG Input Port Node 2 ListHDFS FetchHDFS RPG Input Port • Perform “list” operation on primary node • Send results to Remote Process Group pointing back to same cluster • Redistributes results to all nodes to perform “fetch” in parallel • Same approach for ListFile + FetchFile and ListSFTP + FetchSFTP
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Integration • ControllerService wrapping HBase Client • Implementation provided for HBase 1.1.2 Client • Other implementations could be built as an extension
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Ingest – Single Cell • Table, Row Id, Col Family, and Col Qualifier provided in processor, or dynamically from attributes • FlowFile content becomes the cell value • Batch Size to specify maximum number of cells for a single ‘put’ operation
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Ingest – Full Row • Table and Column Family provided in processor, or dynamically from attributes • Row ID can be a field in JSON, or a FlowFile attribute • JSON Field/Values become Column Qualifiers and Values • Complex Field Strategy – Fail – Warn – Ignore – Text
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka Integration PutKafka • Provide Broker and Topic Name • Publishes FlowFile content as one or more messages • Ability to send large delimited content, split into messages by NiFi GetKafka • Provide ZK Connection String and Topic Name • Produces a FlowFile for each message consumed
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Integration • Stream processing systems generally pull data, then push results • NiFi Site-To-Site pushes and pulls between NiFi instances • The Site-To-Site Client can be used from a stream processing platform https://github.com/apache/nifi/tree/master/nifi -commons/nifi-site-to-site-client Stream Processing Site-To-Site Client NiFi Output Port Stream Processing Site-To-Site Client NiFi Input Port Pull Push
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Site-to-Site Client Overview • Push to Input Port, or Pull from Output Port • Communicate with NiFi clusters, or standalone instances • Handles load balancing and reliable delivery • Secure connections using certificates (optional) SiteToSiteClientConfig clientConfig = new SiteToSiteClient.Builder() .url(“http://localhost:8080/nifi”) .portName(”My Port”) .buildConfig();
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Site-To-Site Client Pulling SiteToSiteClient client = ... Transaction transaction = client.createTransaction(TransferDirection.RECEIVE); DataPacket dataPacket = transaction.receive(); while (dataPacket != null) { ... } transaction.confirm(); transaction.complete();
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Site-To-Site Client Pushing SiteToSiteClient client = ... Transaction transaction = client.createTransaction(TransferDirection.SEND); NiFiDataPacket data = ... transaction.send(data.getContent(), data.getAttributes()); transaction.confirm(); transaction.complete();
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current Stream Processing Integrations Spark Streaming - NiFi Spark Receiver • https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver Storm – NiFi Spout • https://github.com/apache/nifi/tree/master/nifi-external/nifi-storm-spout Flink – NiFi Source & Sink • https://github.com/apache/flink/tree/master/flink-streaming-connectors/flink-connector-nifi Apex - NiFi Input Operators & Output Operators • https://github.com/apache/incubator-apex- malhar/tree/master/contrib/src/main/java/com/datatorrent/contrib/nifi
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other Relevant Integrations • GetSolr, PutSolrContentStream • FetchElasticSearch, PutElasticSearch • GetMongo, PutMongo • QueryCassandra, PutCassandraQL • GetCouchbaseKey, PutCouchbaseKey • QueryDatabaseTable, ExecuteSQL, PutSQL • GetSplunk, PutSplunk • And more! https://nifi.apache.org/docs.html
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use-Case/Architecture Discussion
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Drive Data to Core for Analysis NiFi Stream Processing NiFi NiFi • Drive data from sources to central data center for analysis • Tiered collection approach at various locations, think regional data centers Edge Edge Core Batch Analytics
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamically Adjusting Data Flows • Push analytic results back to core NiFi • Push results back to edge locations/devices to change behavior NiFi NiFi NiFi Edge Edge Core Batch Analytics Stream Processing
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Logs filtered by level and sent from Edge -> Core 2. Stream processing produces new filters based on rate & sends back to core 3. Edge polls core for new filter levels & updates filtering Example: Dynamic Log Collection Core NiFi Streaming Processing Edge NiFi Logs Logs New Filters Logs Output Log Input Log Output Result Input Store Result Service Fetch ResultPoll Service Filter New Filters New Filters Poll Analytic
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Log Collection – Edge NiFi
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Log Collection – Core NiFi
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Log Collection Summary NiFi NiFi NiFi Edge Edge Core Stream Processing Logs Logs Logs New FiltersNew Filters New Filters
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Future Work
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Future – Ecosystem Integrations • Ambari – Support a fully managed NiFi Cluster through Ambari – Monitoring, management, upgrades, etc. • Ranger – Ability to delegate authorization decisions to Ranger – Manage authorization controls through Ranger • Atlas – Track lineage from the source to destination – Apply tags to data as its acquired
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Future – Apache NiFi • HA Control Plane – Zero Master cluster, Web UI accessible from any node – Auto-Election of “Cluster Coordinator” and “Primary Node” through ZooKeeper • HA Data Plane – Ability to replicate data across nodes in a cluster • Multi-Tenancy – Restrict Access to portions of a flow – Allow people/groups with in an organization to only access their portions of the flow • Extension Registry – Create a central repository of NARs and Templates – Move most NARs out of Apache NiFi distribution, ship with a minimal set
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Future – Apache NiFi • Variable Registry – Define environment specific variables through the UI, reference through EL – Make templates more portable across environments/instances • Redesign of User Interface – Modernize look & feel, improve usability, support multi-tenancy • Continued Development of Integration Points – New processors added continuously! • MiNiFi – Complimentary data collection agent to NiFi’s current approach – Small, lightweight, centrally managed agent that integrates with NiFi for follow-on dataflow management
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thanks! • Questions? • Contact Info: – Email: bbende@hortonworks.com – Twitter: @bbende
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you