SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Building the High Speed Cyber Security
Data Pipeline Using Apache NiFi
Praveen Kanumarlapudi
Cyber Security
60% of Small
Businesses Fold Within
6 Months of a Cyber
Attack.
How to make it
success ?
Global Security Key Stake Holders
Security Operations Center Data Scientists Data Analysts Executives
An information security
operations center
("ISOC" or "SOC") is a
facility where
enterprise information
systems (websites,
applications, databases,
data centers and
servers, networks,
desktops and other
endpoints) are
monitored, assessed,
and defended.
Technology : SIEM
Security data scientists
have the skills to
understand complex
algorithms and build
advanced models for
threat and anomaly
detection and applying
these concepts to real
security data sets in
single or clustered
environments.
Technology : Python, R,
Big Data, Spark/Scala or
MATLAB…
Map and trace the data
from system to system
for solving a given
business or incident
problem.
Design and create data
reports using various
reporting tools that
help business executive
to make better
decisions.
Implements new
metrics for business
(KPIs)
Technology : SQL, SIEM,
Big Data, Reporting
tools
CSO’s,
CISO’s
Cyber Security ‘BIG data’ challenges
• Speed , Volume and Variety
 Data Ingestion
 Cleansing
 Transformation
• data reliance
 Executives – KPI Metrics
 Data scientists
 SOC
 Data Analysts
• Real-Time context
A couple of years Ago !
Network logs
Web logs
AD Logs
Infrastructure
logs
Application
Logs
Threat Intel
3rd Party RG
RDBMS
unstructured(semi)structured
Syslog
servers
SIEM APP
Sqoop
PySpark
SIEM Tool
Data Source Ingestion Integration Delivery
Flume
UBA Tools
SOCDataScienceKPI/Reporting
Challenges
• Complexity of Architecture
• Debugging
• Data Source Dependencies
• Lack of Centralized logging
• Multiple Data Copies
• Stress on Network
• Transformations with respect to destination
Solution Framework
 Single Data entry point – avoids network traffic and
duplicate data flowing around
 Transformations according destination – reduces the
reliance on source
 Should be capable of handling different formats and
different sources
Ingest Clean/Route
Transform for
1
Transform for
2
Route to 1
Route to 2
Archive
Deployment Models
Data Sources
Challenges
 Good architectural understanding of all
systems
 Good amount of coding effort
 Long development hours
 Maintenance overheads
 Maintain the sync between the systems
 Provenance
• Guaranteed delivery
• Processors that supports multiple
formats
• Ease to develop the flows and
deploy in minutes
• Open Source and rich community
The Data Gateway
Network logs
Web logs
AD Logs
Infrastructure
logs
Application
Logs
Threat Intel
3rd Party RG
RDBMS
unstructured(semi)structured
Data Source Data Gateway Delivery
SOCDataScienceKPI/Reporting
SOC
Sample Flow
To Azure
To Splunk
Grafana dashboard – Last 7 days
Yesterday
In the middle of the day
Metrics
 100+ production flows
 ~ 20 Billion events
 1000+ Transformations
Next ?
 MiNiFi
 Stateless NiFi
 Registry
 SAM
 Real-Time Model training
 CI/CD, NiFi API’s
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Weitere ähnliche Inhalte

Was ist angesagt?

SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSplunk
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & OrchestrationSplunk
 
Splunk Phantom SOAR Roundtable
Splunk Phantom SOAR RoundtableSplunk Phantom SOAR Roundtable
Splunk Phantom SOAR RoundtableSplunk
 
SentinelOne - NOAH19 Tel Aviv
SentinelOne - NOAH19 Tel AvivSentinelOne - NOAH19 Tel Aviv
SentinelOne - NOAH19 Tel AvivNOAH Advisors
 
Security operation center
Security operation centerSecurity operation center
Security operation centerMuthuKumaran267
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
For Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdf
For Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdfFor Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdf
For Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdfJustinBrown267905
 
PaloAlto Enterprise Security Solution
PaloAlto Enterprise Security SolutionPaloAlto Enterprise Security Solution
PaloAlto Enterprise Security SolutionPrime Infoserv
 
The journey to ICS - Extended
The journey to ICS - Extended The journey to ICS - Extended
The journey to ICS - Extended Larry Vandenaweele
 
Cyber Threat Intelligence
Cyber Threat IntelligenceCyber Threat Intelligence
Cyber Threat IntelligenceZaiffiEhsan
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk
 
Threat Hunting
Threat HuntingThreat Hunting
Threat HuntingSplunk
 
Cyber Security roadmap.pptx
Cyber Security roadmap.pptxCyber Security roadmap.pptx
Cyber Security roadmap.pptxSandeepK707540
 
Cyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptxCyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptxAbimbolaFisher1
 
SEIM-Microsoft Sentinel.pptx
SEIM-Microsoft Sentinel.pptxSEIM-Microsoft Sentinel.pptx
SEIM-Microsoft Sentinel.pptxAmrMousa51
 
Splunk for Enterprise Security and User Behavior Analytics
 Splunk for Enterprise Security and User Behavior Analytics Splunk for Enterprise Security and User Behavior Analytics
Splunk for Enterprise Security and User Behavior AnalyticsSplunk
 

Was ist angesagt? (20)

SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & Orchestration
 
Splunk Phantom SOAR Roundtable
Splunk Phantom SOAR RoundtableSplunk Phantom SOAR Roundtable
Splunk Phantom SOAR Roundtable
 
SentinelOne - NOAH19 Tel Aviv
SentinelOne - NOAH19 Tel AvivSentinelOne - NOAH19 Tel Aviv
SentinelOne - NOAH19 Tel Aviv
 
SIEM Architecture
SIEM ArchitectureSIEM Architecture
SIEM Architecture
 
Security operation center
Security operation centerSecurity operation center
Security operation center
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
For Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdf
For Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdfFor Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdf
For Public_ Cybersecurity_ Frameworks, Fundamentals, and Foundations 2023.pdf
 
PaloAlto Enterprise Security Solution
PaloAlto Enterprise Security SolutionPaloAlto Enterprise Security Solution
PaloAlto Enterprise Security Solution
 
The journey to ICS - Extended
The journey to ICS - Extended The journey to ICS - Extended
The journey to ICS - Extended
 
Cyber Threat Intelligence
Cyber Threat IntelligenceCyber Threat Intelligence
Cyber Threat Intelligence
 
Security Information and Event Management
Security Information and Event ManagementSecurity Information and Event Management
Security Information and Event Management
 
SOAR and SIEM.pptx
SOAR and SIEM.pptxSOAR and SIEM.pptx
SOAR and SIEM.pptx
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
 
Cyber Security roadmap.pptx
Cyber Security roadmap.pptxCyber Security roadmap.pptx
Cyber Security roadmap.pptx
 
SIEM in NIST Cyber Security Framework
SIEM in NIST Cyber Security FrameworkSIEM in NIST Cyber Security Framework
SIEM in NIST Cyber Security Framework
 
Cyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptxCyber Threat Intelligence.pptx
Cyber Threat Intelligence.pptx
 
SEIM-Microsoft Sentinel.pptx
SEIM-Microsoft Sentinel.pptxSEIM-Microsoft Sentinel.pptx
SEIM-Microsoft Sentinel.pptx
 
Splunk for Enterprise Security and User Behavior Analytics
 Splunk for Enterprise Security and User Behavior Analytics Splunk for Enterprise Security and User Behavior Analytics
Splunk for Enterprise Security and User Behavior Analytics
 

Ähnlich wie Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...Jorge Cardoso
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
Distributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsDistributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsScyllaDB
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicAmazon Web Services
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissanceCloudera, Inc.
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practicesMihajlo Prerad
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITOld Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITPrecisely
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicWebinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicSnapLogic
 

Ähnlich wie Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi (20)

(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Distributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsDistributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time Applications
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practices
 
inmation Presentation
inmation Presentationinmation Presentation
inmation Presentation
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS) Splunk in the Cisco Unified Computing System (UCS)
Splunk in the Cisco Unified Computing System (UCS)
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITOld Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe IT
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicWebinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

  • 1. Building the High Speed Cyber Security Data Pipeline Using Apache NiFi Praveen Kanumarlapudi
  • 3. 60% of Small Businesses Fold Within 6 Months of a Cyber Attack.
  • 4. How to make it success ?
  • 5. Global Security Key Stake Holders Security Operations Center Data Scientists Data Analysts Executives An information security operations center ("ISOC" or "SOC") is a facility where enterprise information systems (websites, applications, databases, data centers and servers, networks, desktops and other endpoints) are monitored, assessed, and defended. Technology : SIEM Security data scientists have the skills to understand complex algorithms and build advanced models for threat and anomaly detection and applying these concepts to real security data sets in single or clustered environments. Technology : Python, R, Big Data, Spark/Scala or MATLAB… Map and trace the data from system to system for solving a given business or incident problem. Design and create data reports using various reporting tools that help business executive to make better decisions. Implements new metrics for business (KPIs) Technology : SQL, SIEM, Big Data, Reporting tools CSO’s, CISO’s
  • 6. Cyber Security ‘BIG data’ challenges • Speed , Volume and Variety  Data Ingestion  Cleansing  Transformation • data reliance  Executives – KPI Metrics  Data scientists  SOC  Data Analysts • Real-Time context
  • 7. A couple of years Ago ! Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Syslog servers SIEM APP Sqoop PySpark SIEM Tool Data Source Ingestion Integration Delivery Flume UBA Tools SOCDataScienceKPI/Reporting
  • 8. Challenges • Complexity of Architecture • Debugging • Data Source Dependencies • Lack of Centralized logging • Multiple Data Copies • Stress on Network • Transformations with respect to destination
  • 9. Solution Framework  Single Data entry point – avoids network traffic and duplicate data flowing around  Transformations according destination – reduces the reliance on source  Should be capable of handling different formats and different sources Ingest Clean/Route Transform for 1 Transform for 2 Route to 1 Route to 2 Archive
  • 11. Challenges  Good architectural understanding of all systems  Good amount of coding effort  Long development hours  Maintenance overheads  Maintain the sync between the systems  Provenance
  • 12. • Guaranteed delivery • Processors that supports multiple formats • Ease to develop the flows and deploy in minutes • Open Source and rich community
  • 13. The Data Gateway Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Data Source Data Gateway Delivery SOCDataScienceKPI/Reporting SOC
  • 17. Grafana dashboard – Last 7 days
  • 19. In the middle of the day
  • 20. Metrics  100+ production flows  ~ 20 Billion events  1000+ Transformations
  • 21. Next ?  MiNiFi  Stateless NiFi  Registry  SAM  Real-Time Model training  CI/CD, NiFi API’s

Hinweis der Redaktion

  1. 60 Percent of Small Businesses went out of business in just 6 months after cyber attacks