SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Introduction to Apache NiFi
Timothy Spann
Field Engineer
tspann@cloudera.com
@PaasDev
2Š Cloudera, Inc. All rights reserved.
FLOW MANAGEMENT POWERED BY APACHE NIFI
• Ingestion: connectors to read/write
data from/to several data sources
• Transformation:
• Format conversion
• Compression/decompression,
Merge, Split, encryption, etc..
• Data enrichment
• Attribute, content, rules, etc…
• Routing
• Priority, dynamic/static, based on
content or metadata, etc…
• Parsing
3Š Cloudera, Inc. All rights reserved.
APACHE NIFI HIGH LEVEL CAPABILITIES
• Web-based user interface
• Design, control, feedback & monitoring
• Highly configurable
• Loss tolerant vs guaranteed delivery
• Low latency vs high throughput
• Dynamic prioritization
• Flow can be modified at runtime
• Back pressure
• Data provenance
• Track dataflow from beginning to end
• Designed for extension
• Build your own processors
• Secure
• SSL, SSH, HTTPS, etc.
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and
pull models
• Hundreds of processors
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Why Apache NiFi?
5Š Cloudera, Inc. All rights reserved.
285+ PROCESSORS FOR DEEPER ECOSYSTEM INTEGRATION
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective
projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
6Š Cloudera, Inc. All rights reserved.
Apache NiFi 1.9 Features
Key New Features
• Apache Kafka 2.0 support
• Apache Hive 3.1.0 support
• Connection load balancing
• MQTT Performance improvements
Updated to 1.9.0
FLOW FILES ARE LIKE HTTP DATA
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT
2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT
2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’Value: './’
Binary Content *
Header
Content
SQL BASED ROUTING WITH NiFi’s QueryRecord Processor
• QueryRecord Processor-
Executes a SQL statement
against records and writes the
results to the flow file content.
• CSVReader: Looking up schema
from SR, it will converts CSV
Records into ProcessRecords
• SQ execution via Apache Calcite:
execute congured SQL against
the ProcessRecords for routing
• CSVRecordSetWriter: Converts
the result of the query from
Process records into CSV for the
for the flow file content
Why should you care?
Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular
expressions.
INFRASTRUCTURE AND FLOW MONITORING
Custom Apache NiFi Processors for Open Source Computer
Vision
TEXT PROCESSING
https://community.hortonworks.com/articles/178510/integration-apache-opennlp-184-into-apache-nifi-15.html
• Dates
• Locations
• Names
• Money
• Organizations
https://community.hortonworks.com/articles/52415/processing-social-media-feeds-in-stream-with-apach.html
Sentiment AnalysisNamed Entity Extraction
https://community.hortonworks.com/articles/76935/using-sentiment-analysis-and-nlp-tools-with-hdp-25.html
https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
https://community.hortonworks.com/articles/81222/adding-stanford-corenlp-to-big-data-pipelines-apac.html
https://community.hortonworks.com/articles/81270/adding-stanford-corenlp-to-big-data-pipelines-apac-1.html
TENSORFLOW PROCESSOR IN JAVA
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
INGEST S3 FILES
INGEST RDBMS TABLES
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
ETL LOOKUPS WITH HBASE
NiFi Positioning
Apache
NiFi / MiNiFi
ETL
(Informatica, etc.)
Enterprise
Service Bus
(Fuse, Mule, etc.)
Messaging
Bus
(Kafka, MQ, etc.)
Processing
Framework
(Storm, Spark,
etc.)
Apache NiFi / Processing Frameworks
NiFi
Simple event processing
• Primarily feed data into processing
frameworks, can process data, with a focus
on simple event processing
• Operate on a single piece of data, or in
correlation with an enrichment dataset
(enrichment, parsing, splitting, and
transformations)
• Can scale out, but scale up better to take
full advantage of hardware resources, run
concurrent processing tasks/threads
(processing terabytes of data per day on a
single node)
⚠ Not another distributed processing
framework, but to feed data into those
Processing Frameworks (Flink, Kafka
Streams, Storm, Spark, etc.)
Complex and distributed processing
• Complex processing from multiple streams
(JOIN operations)
• Analyzing data across time windows (rolling
window aggregation, standard deviation, etc.)
• Scale out to thousands of nodes if needed
⚠ Not designed to collect data or manage data
flow
Apache NiFi / Messaging Bus Services
NiFi
Provide dataflow solution
• Centralized management, from edge to core
• Great traceability, event level data provenance
starting when data is born
• Interactive command and control – real time
operational visibility
• Dataflow management, including prioritization,
back pressure, and edge intelligence
• Visual representation of global dataflow
⚠ Not a messaging bus, flow maintenance
needed when you have frequent consumer side
updates
Messaging Bus (Kafka, JMS, etc.)
Provide messaging bus service
• Low latency
• Great data durability
• Decentralized management (producers &
consumers)
• Low broker maintenance for dynamic consumer
side updates
⚠ Not designed to solve dataflow problems
(prioritization, edge intelligence, etc.)
⚠ Traceability limited to in/out of topics, no lineage
⚠ Lack of global view of components/connectivities
Apache NiFi / Integration, or Ingestion, Frameworks
NiFi
End user facing dataflow management
tool
• Out of the box solution for dataflow
management
• Interactive command and control in the core,
design and deploy on the edge
• Flexible failure handling at each point of the
flow
• Visual representation of global dataflow and
connectivities
• Native cross data center communication
• Data provenance for traceability
⚠ Not a library to be embedded in other
applications
Integration framework (Spring Integration,
Camel, etc), ingestion framework (Flume,
etc)
Developer facing integration tool with a
focus on data ingestion
• A set of tools to orchestrate workflow
• A fixed design and deploy pattern
• Leverage messaging bus across disconnected
networks
⚠ Developer facing, custom coding needed to
optimize
⚠ Pre-built failure handling, lack of flexibility
⚠ No holistic view of global dataflow
⚠ No built-in data traceability
Apache NiFi / ETL Tools
NiFi
NOT schema dependent
• Dataflow management for both structured and
unstructured data, powered by separation of
metadata and payload
• Schema is not required, but you can have
schema
• Minimum modeling effort, just enough to
manage dataflows
• Do the plumbing job, maximize developers’
brainpower for creative work
⚠ Not designed to do heavy lifting transformation
work for DB tables (JOIN datasets, etc.). You
can create custom processors to do that, but
long way to go to catch up with existing ETL
tools from user experience perspective (GUI for
data wrangling, cleansing, etc.)
ETL (Informatica, etc.)
Schema dependent
• Tailored for Databases/WH
• ETL operations based on schema/data
modeling
• Highly efficient, optimized performance
⚠ Must pre-prepare your data, time consuming to
build data modeling, and maintain schemas
⚠ Not geared towards handling unstructured data,
PDF, Audio, Video, etc.
⚠ Not designed to solve dataflow problems
NiFi and Kafka Are Complementary
NiFi
Provide dataflow solution
• Centralized management, from edge to core
• Great traceability, event level data provenance
starting when data is born
• Interactive command and control – real time
operational visibility
• Dataflow management, including prioritization,
back pressure, and edge intelligence
• Visual representation of global dataflow
Kafka
Provide durable stream store
• Low latency
• Distributed data durability
• Decentralized management of producers &
consumers
+
⚠ Requires adding/removing processors
according to consumer-side updates
⚠ Not optimized to manage dataflows
(prioritization, enrichment, protocols, formats,
event level authorizations, objects with
various sizes, etc.)
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 

Was ist angesagt? (20)

Nifi
NifiNifi
Nifi
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 

Ähnlich wie Introduction to Apache NiFi dws19 DWS - DC 2019

Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 

Ähnlich wie Introduction to Apache NiFi dws19 DWS - DC 2019 (20)

Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Mobile gotcha
Mobile gotchaMobile gotcha
Mobile gotcha
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 

Mehr von Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 

Mehr von Timothy Spann (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 

KĂźrzlich hochgeladen

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

KĂźrzlich hochgeladen (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Introduction to Apache NiFi dws19 DWS - DC 2019

  • 1. Introduction to Apache NiFi Timothy Spann Field Engineer tspann@cloudera.com @PaasDev
  • 2. 2Š Cloudera, Inc. All rights reserved. FLOW MANAGEMENT POWERED BY APACHE NIFI • Ingestion: connectors to read/write data from/to several data sources • Transformation: • Format conversion • Compression/decompression, Merge, Split, encryption, etc.. • Data enrichment • Attribute, content, rules, etc… • Routing • Priority, dynamic/static, based on content or metadata, etc… • Parsing
  • 3. 3Š Cloudera, Inc. All rights reserved. APACHE NIFI HIGH LEVEL CAPABILITIES • Web-based user interface • Design, control, feedback & monitoring • Highly congurable • Loss tolerant vs guaranteed delivery • Low latency vs high throughput • Dynamic prioritization • Flow can be modied at runtime • Back pressure • Data provenance • Track dataflow from beginning to end • Designed for extension • Build your own processors • Secure • SSL, SSH, HTTPS, etc.
  • 4. • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specic QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Why Apache NiFi?
  • 5. 5Š Cloudera, Inc. All rights reserved. 285+ PROCESSORS FOR DEEPER ECOSYSTEM INTEGRATION Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket
  • 6. 6Š Cloudera, Inc. All rights reserved. Apache NiFi 1.9 Features Key New Features • Apache Kafka 2.0 support • Apache Hive 3.1.0 support • Connection load balancing • MQTT Performance improvements Updated to 1.9.0
  • 7. FLOW FILES ARE LIKE HTTP DATA HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’Value: './’ Binary Content * Header Content
  • 8. SQL BASED ROUTING WITH NiFi’s QueryRecord Processor • QueryRecord Processor- Executes a SQL statement against records and writes the results to the flow le content. • CSVReader: Looking up schema from SR, it will converts CSV Records into ProcessRecords • SQ execution via Apache Calcite: execute congured SQL against the ProcessRecords for routing • CSVRecordSetWriter: Converts the result of the query from Process records into CSV for the for the flow le content Why should you care? Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.
  • 10. Custom Apache NiFi Processors for Open Source Computer Vision
  • 11. TEXT PROCESSING https://community.hortonworks.com/articles/178510/integration-apache-opennlp-184-into-apache-nifi-15.html • Dates • Locations • Names • Money • Organizations https://community.hortonworks.com/articles/52415/processing-social-media-feeds-in-stream-with-apach.html Sentiment AnalysisNamed Entity Extraction https://community.hortonworks.com/articles/76935/using-sentiment-analysis-and-nlp-tools-with-hdp-25.html https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html https://community.hortonworks.com/articles/81222/adding-stanford-corenlp-to-big-data-pipelines-apac.html https://community.hortonworks.com/articles/81270/adding-stanford-corenlp-to-big-data-pipelines-apac-1.html
  • 12. TENSORFLOW PROCESSOR IN JAVA https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor
  • 16. NiFi Positioning Apache NiFi / MiNiFi ETL (Informatica, etc.) Enterprise Service Bus (Fuse, Mule, etc.) Messaging Bus (Kafka, MQ, etc.) Processing Framework (Storm, Spark, etc.)
  • 17. Apache NiFi / Processing Frameworks NiFi Simple event processing • Primarily feed data into processing frameworks, can process data, with a focus on simple event processing • Operate on a single piece of data, or in correlation with an enrichment dataset (enrichment, parsing, splitting, and transformations) • Can scale out, but scale up better to take full advantage of hardware resources, run concurrent processing tasks/threads (processing terabytes of data per day on a single node) ⚠ Not another distributed processing framework, but to feed data into those Processing Frameworks (Flink, Kafka Streams, Storm, Spark, etc.) Complex and distributed processing • Complex processing from multiple streams (JOIN operations) • Analyzing data across time windows (rolling window aggregation, standard deviation, etc.) • Scale out to thousands of nodes if needed ⚠ Not designed to collect data or manage data flow
  • 18. Apache NiFi / Messaging Bus Services NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow ⚠ Not a messaging bus, flow maintenance needed when you have frequent consumer side updates Messaging Bus (Kafka, JMS, etc.) Provide messaging bus service • Low latency • Great data durability • Decentralized management (producers & consumers) • Low broker maintenance for dynamic consumer side updates ⚠ Not designed to solve dataflow problems (prioritization, edge intelligence, etc.) ⚠ Traceability limited to in/out of topics, no lineage ⚠ Lack of global view of components/connectivities
  • 19. Apache NiFi / Integration, or Ingestion, Frameworks NiFi End user facing dataflow management tool • Out of the box solution for dataflow management • Interactive command and control in the core, design and deploy on the edge • Flexible failure handling at each point of the flow • Visual representation of global dataflow and connectivities • Native cross data center communication • Data provenance for traceability ⚠ Not a library to be embedded in other applications Integration framework (Spring Integration, Camel, etc), ingestion framework (Flume, etc) Developer facing integration tool with a focus on data ingestion • A set of tools to orchestrate workflow • A xed design and deploy pattern • Leverage messaging bus across disconnected networks ⚠ Developer facing, custom coding needed to optimize ⚠ Pre-built failure handling, lack of flexibility ⚠ No holistic view of global dataflow ⚠ No built-in data traceability
  • 20. Apache NiFi / ETL Tools NiFi NOT schema dependent • Dataflow management for both structured and unstructured data, powered by separation of metadata and payload • Schema is not required, but you can have schema • Minimum modeling effort, just enough to manage dataflows • Do the plumbing job, maximize developers’ brainpower for creative work ⚠ Not designed to do heavy lifting transformation work for DB tables (JOIN datasets, etc.). You can create custom processors to do that, but long way to go to catch up with existing ETL tools from user experience perspective (GUI for data wrangling, cleansing, etc.) ETL (Informatica, etc.) Schema dependent • Tailored for Databases/WH • ETL operations based on schema/data modeling • Highly ecient, optimized performance ⚠ Must pre-prepare your data, time consuming to build data modeling, and maintain schemas ⚠ Not geared towards handling unstructured data, PDF, Audio, Video, etc. ⚠ Not designed to solve dataflow problems
  • 21. NiFi and Kafka Are Complementary NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow Kafka Provide durable stream store • Low latency • Distributed data durability • Decentralized management of producers & consumers + ⚠ Requires adding/removing processors according to consumer-side updates ⚠ Not optimized to manage dataflows (prioritization, enrichment, protocols, formats, event level authorizations, objects with various sizes, etc.)