Submit Search
Upload
From Zero to Data Flow in Hours with Apache NiFi
•
Download as PPTX, PDF
•
12 likes
•
6,861 views
DataWorks Summit/Hadoop Summit
Follow
From Zero to Data Flow in Hours with Apache NiFi
Read less
Read more
Technology
Report
Share
Report
Share
1 of 25
Download now
Recommended
Splunk-Presentation
Splunk-Presentation
PrasadThorat23
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
DevOps.com
Azure devops
Azure devops
Mohit Chhabra
Splunk Overview
Splunk Overview
Splunk
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
Recommended
Splunk-Presentation
Splunk-Presentation
PrasadThorat23
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
DevOps.com
Azure devops
Azure devops
Mohit Chhabra
Splunk Overview
Splunk Overview
Splunk
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD For Beginners
Rahul Nath
Apache kafka
Apache kafka
Ramakrishna kapa
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
CICD using jenkins and Nomad
CICD using jenkins and Nomad
Bram Vogelaar
Splunk 101
Splunk 101
Splunk
Scaling containers with keda
Scaling containers with keda
Nilesh Gule
The State of DevSecOps
The State of DevSecOps
DevOps Indonesia
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
DataWorks Summit
Splunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | Edureka
Edureka!
DevOps Architecture Design
DevOps Architecture Design
Agile Testing Alliance
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
Building Business Service Intelligence with ITSI
Building Business Service Intelligence with ITSI
Splunk
Elastic Observability keynote
Elastic Observability keynote
Elasticsearch
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
Kafka 101
Kafka 101
Clement Demonchy
Introduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
Databricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
More Related Content
What's hot
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD For Beginners
Rahul Nath
Apache kafka
Apache kafka
Ramakrishna kapa
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
CICD using jenkins and Nomad
CICD using jenkins and Nomad
Bram Vogelaar
Splunk 101
Splunk 101
Splunk
Scaling containers with keda
Scaling containers with keda
Nilesh Gule
The State of DevSecOps
The State of DevSecOps
DevOps Indonesia
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
DataWorks Summit
Splunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | Edureka
Edureka!
DevOps Architecture Design
DevOps Architecture Design
Agile Testing Alliance
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
Building Business Service Intelligence with ITSI
Building Business Service Intelligence with ITSI
Splunk
Elastic Observability keynote
Elastic Observability keynote
Elasticsearch
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
Kafka 101
Kafka 101
Clement Demonchy
Introduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
Databricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
What's hot
(20)
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD For Beginners
Apache kafka
Apache kafka
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
CICD using jenkins and Nomad
CICD using jenkins and Nomad
Splunk 101
Splunk 101
Scaling containers with keda
Scaling containers with keda
The State of DevSecOps
The State of DevSecOps
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
Splunk Tutorial for Beginners - What is Splunk | Edureka
Splunk Tutorial for Beginners - What is Splunk | Edureka
DevOps Architecture Design
DevOps Architecture Design
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Building Business Service Intelligence with ITSI
Building Business Service Intelligence with ITSI
Elastic Observability keynote
Elastic Observability keynote
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Kafka 101
Kafka 101
Introduction to Apache Kafka
Introduction to Apache Kafka
Databricks on AWS.pptx
Databricks on AWS.pptx
Similar to From Zero to Data Flow in Hours with Apache NiFi
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
hadooparchbook
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Cloudera, Inc.
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
donaghmccabe
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
Riccardo Romani
End to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
Red_Hat_Storage
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
Matt Stubbs
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
tdc-globalcode
Similar to From Zero to Data Flow in Hours with Apache NiFi
(20)
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
Application Architectures with Hadoop
Application Architectures with Hadoop
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
Application Architectures with Hadoop
Application Architectures with Hadoop
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
End to End Streaming Architectures
End to End Streaming Architectures
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
Spark One Platform Webinar
Spark One Platform Webinar
Architecting Applications with Hadoop
Architecting Applications with Hadoop
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Recently uploaded
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Roshan Dwivedi
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Recently uploaded
(20)
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
From Zero to Data Flow in Hours with Apache NiFi
1.
Copyright © 2016,
Schlumberger, All rights reserved. From Zero to Data Flow In Hours with Apache Nifi Hadoop Summit – San Jose 2016 Chris Herrera Schlumberger
2.
Copyright © 2016,
Schlumberger, All rights reserved. Agenda • Why is composable data flow important to the drilling industry • Current State of the System • The Breaking Point to the new system • An unexpected workflow in testing • How are we using it today • What’s Next
3.
Copyright © 2016,
Schlumberger, All rights reserved. Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.
4.
Copyright © 2016,
Schlumberger, All rights reserved. Introduction • 2 Years managing product development and innovation teams working on real time data ingestion and delivery • 5 years of experience in the Hadoop ecosystem • 11 years of experience with various aspects of the oilfield (operational and technical) Chris Herrera Schlumberger
5.
Copyright © 2016,
Schlumberger, All rights reserved. Wireline Measurement / Logging While Drilling Mud logging Fluids Completions Cementing Rig • Several contractors brought in to develop and complete the well • Can be comprised of one, or most of the time many companies • All bringing their own system, a lot of times without a central repository of data • Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth The Major Components of a Drilling Project
6.
Copyright © 2016,
Schlumberger, All rights reserved. Where Does This Data Need to Go? RT Server Operational Support Client Monitoring Processing and Print Centers
7.
Copyright © 2016,
Schlumberger, All rights reserved. Workflow of Data During and Post Operations ProcessingCenter Acquisition DataServer Classification & Labelling Quality Control Classification Quality Control Hosting QC & Labelling Conversion Data Delivery KPI&Reporting ProcessingAcq Sales and Job Planning Data Processor Customer Manager Client Data Delivery Sales Field Engineer
8.
Copyright © 2016,
Schlumberger, All rights reserved. Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus What Does This Mean In A Data Sense Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
9.
Copyright © 2016,
Schlumberger, All rights reserved. What Does This Mean in a Volume Sense ~9000 Users / Month ~10 Files / Minute ~480 Data Queries / sec ~3050 Wells / month
10.
Copyright © 2016,
Schlumberger, All rights reserved. Context Fidelity Time Acquisition - Field Interpretation - Office A Quick(ish) Note On The Importance of Data Provenance • Need to retain the fidelity throughout the flow.
11.
Copyright © 2016,
Schlumberger, All rights reserved. Typical Data Problems Concerns • What is the time zone of the data we are receiving – one day UTC... • ”Ahh, I see you did not implement that part of the standard...” • Wait, Why are you sending data at 5 times the sampling rate of the sensor... • I did not get the memo that you were changing your data model today... • Governmental / Client data residency concerns
12.
Copyright © 2016,
Schlumberger, All rights reserved. Current Solution… • 100+ Man Years of effort over 14 years • ~2,000,000 + Lines of Code • Extreme barrier to entry for workflow changes • Very little understanding of what happened to the data Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
13.
Copyright © 2016,
Schlumberger, All rights reserved. We Needed A Simpler – Maintainable Solution…
14.
Copyright © 2016,
Schlumberger, All rights reserved. The Original Plan… Rabbit MQ DLIS Parser ETP Endpoint LAS Parser Data Writer {} DB Event Publisher Node JS What About: • Data cleansing • Routing • The ability to debug what has gone wrong • TIME (estimated 6 man months)
15.
Copyright © 2016,
Schlumberger, All rights reserved. How does Nifi fit into the equation? • Knowing where data came from is crucial (and often missing) to real time decision making • The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding • With several processors already available, there is a low barrier to entry when it comes to data flow creation
16.
Copyright © 2016,
Schlumberger, All rights reserved. Enter Nifi… Processor Creation Data Flow Creation Creation Play… 10 Man Hours ETP WITSML 1.3.1.1 / 1.4.1.1 LAS 1.2 / 2.0 1 Man Day
17.
Copyright © 2016,
Schlumberger, All rights reserved. Prototype Setup Data Source Processor Input Data Cleansing Data Enrichment { } Repo Data Storage Put Data 2 Man Days • Append Well Name • Append Client Name • Append Run name • Append Pass Name Process Group: Get Update Process Group: Fix Time Zone Remove Absent indexes Data Cleansing Routing
18.
Copyright © 2016,
Schlumberger, All rights reserved. What About Testing!
19.
Copyright © 2016,
Schlumberger, All rights reserved. Testing Landscape Today 2.2 TB Test Data • 22 Applications • 14 Different formats of data • Data of questionable quality • Stored on a file share Effort • .5 man effort / sprint on maintenance • 2 weeks to perform a full test
20.
Copyright © 2016,
Schlumberger, All rights reserved. Step 1: Data Set Curation – Creating the Set of Reference LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Clean Test Data Set 2.2 TB Test Data 6 Hours
21.
Copyright © 2016,
Schlumberger, All rights reserved. Docker Step 2: Immediate Test Harness Clean Test Data Set • Step 1: Need Data • Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest • Step 3: add put processor • Step 4: start dataflow From: 2 weeks to setup a test to:
22.
Copyright © 2016,
Schlumberger, All rights reserved. • Docker Step 3: Immediate Live Data Testing Production RT System Processor Input Testing Processor Group Anonymize Data • Significantly cuts down time to test application against real data • Especially in brownfield applications • Brings a level of confidence to the project that otherwise would be missing.
23.
Copyright © 2016,
Schlumberger, All rights reserved. Next Steps
24.
Copyright © 2016,
Schlumberger, All rights reserved. Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance RT Server • Understanding the chain of custody from sensor to user • Tracking the provenance of the data as it traverses through the system
25.
Copyright © 2016,
Schlumberger, All rights reserved. Thank You! Questions?
Editor's Notes
Different arrival times Different Data streams Exchanging data amongst themselves Unknown quality
Download now