SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Model as a Service
Modern Streaming Data Science with Apache Metron
Casey Stella
@casey_stella
2017
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Introduction
Hi, I’m Casey Stella!
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
◦ The enriched data will be indexed for further analysis in a realtime index and in a batch
index
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
◦ The enriched data will be indexed for further analysis in a realtime index and in a batch
index
• Better enrichment =⇒ more context =⇒ better insights
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
• Machine learning can adapt better, but can be hard to scale at high velocity
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
• Machine learning can adapt better, but can be hard to scale at high velocity
The ideal solution is to enable the mixing of rules, statistical baselining and machine
learning.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
• Many instances of many models will be running at once
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
• Many instances of many models will be running at once
◦ Share resources fairly by using Yarn for deployment and model lifecycle management
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
• Provide a REST interface serving up your model
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
• Provide a REST interface serving up your model
From there, MaaS will handle deployment, management and model auto-discovery
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Model as a Service: Stellar
Now that we have a deployment and autodiscovery solution, we need to interact with the
models. We think of Stellar as like Excel functions that we can run on streaming data:
• Compose a rich set of built-in functions with user defined functions
• Provide simple primitives around the functions: boolean operations, conditionals,
numerical computation.
Model interaction is enabled as a set of Stellar functions that discover and interact with
model endpoints.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Model as a Service: Architecture
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Interlude: Domain Generating Algorithms
Domain Generating Algorithms are used by botnets to communicate with compromised
computers in an evasive way. Traffic to a fixed command and control host would be
noticed and firewall rules could be modified to cut communication channels cleanly.
Instead, the botnet command and control hosts must move around to evade detection.
Typically this involves periodically generating a synthetic domain in a repeatable way
and having the compromised computers attempt to connect to a few candidate synthetic
domains daily with some moderate hope of guessing the right one. This traffic is small
enough to be lost in the shuffle of a large organization, making the evasion effective.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Demo
• One solution for detecting synthetic domains is to construct a classifier that must be
run against every domain going across the network.
• At the BSides DFW Conference in 2013, ClickSecurity presented a model written in
Python for detecting synthetic domains.
• We can expose this model as a REST endpoint, deploy via MaaS and interact with
the model via Stellar
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
dga_model_endpoint := MAAS_GET_ENDPOINT('dga')
dga_result_map := MAAS_MODEL_APPLY( dga_model_endpoint
, { 'host' : domain_without_subdomains }
)
dga_result := MAP_GET('is_malicious', dga_result_map)
is_dga := dga_result != null && dga_result == 'dga'
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
• Automatic conversion of common model output formats to an endpoint
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
• Automatic conversion of common model output formats to an endpoint
• A searchable model registry
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Questions
Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather
session Thursday.
• Find me at http://caseystella.com
• Twitter handle: @casey_stella
• Email address: cstella@hortonworks.com
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017

Weitere ähnliche Inhalte

Was ist angesagt?

Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkItai Yaffe
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...DataWorks Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 

Was ist angesagt? (20)

Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 

Andere mochten auch

Andere mochten auch (12)

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Solving Cyber at Scale
Solving Cyber at ScaleSolving Cyber at Scale
Solving Cyber at Scale
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Ähnlich wie MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron

Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationRanga Vadlamudi
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSAmazon Web Services
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Amazon Web Services
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraDataStax Academy
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information SecuritySven Krasser
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it worldChris Dwan
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

Ähnlich wie MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (20)

Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB Presentation
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWS
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache Cassandra
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information Security
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it world
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron

  • 1. Model as a Service Modern Streaming Data Science with Apache Metron Casey Stella @casey_stella 2017 Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 2. Introduction Hi, I’m Casey Stella! Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 3. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 4. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 5. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 6. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 7. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked ◦ The enriched data will be indexed for further analysis in a realtime index and in a batch index Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 8. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked ◦ The enriched data will be indexed for further analysis in a realtime index and in a batch index • Better enrichment =⇒ more context =⇒ better insights Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 9. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 10. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 11. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 12. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well • Machine learning can adapt better, but can be hard to scale at high velocity Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 13. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well • Machine learning can adapt better, but can be hard to scale at high velocity The ideal solution is to enable the mixing of rules, statistical baselining and machine learning. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 14. Characteristics of a Solution • There’s a lot of data and it comes at you fast Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 15. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 16. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 17. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 18. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 19. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 20. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 21. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 22. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 23. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper • Many instances of many models will be running at once Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 24. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper • Many instances of many models will be running at once ◦ Share resources fairly by using Yarn for deployment and model lifecycle management Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 25. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 26. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 27. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 28. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless • Provide a REST interface serving up your model Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 29. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless • Provide a REST interface serving up your model From there, MaaS will handle deployment, management and model auto-discovery Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 30. Model as a Service: Stellar Now that we have a deployment and autodiscovery solution, we need to interact with the models. We think of Stellar as like Excel functions that we can run on streaming data: • Compose a rich set of built-in functions with user defined functions • Provide simple primitives around the functions: boolean operations, conditionals, numerical computation. Model interaction is enabled as a set of Stellar functions that discover and interact with model endpoints. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 31. Model as a Service: Architecture Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 32. Interlude: Domain Generating Algorithms Domain Generating Algorithms are used by botnets to communicate with compromised computers in an evasive way. Traffic to a fixed command and control host would be noticed and firewall rules could be modified to cut communication channels cleanly. Instead, the botnet command and control hosts must move around to evade detection. Typically this involves periodically generating a synthetic domain in a repeatable way and having the compromised computers attempt to connect to a few candidate synthetic domains daily with some moderate hope of guessing the right one. This traffic is small enough to be lost in the shuffle of a large organization, making the evasion effective. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 33. Demo • One solution for detecting synthetic domains is to construct a classifier that must be run against every domain going across the network. • At the BSides DFW Conference in 2013, ClickSecurity presented a model written in Python for detecting synthetic domains. • We can expose this model as a REST endpoint, deploy via MaaS and interact with the model via Stellar Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 34. dga_model_endpoint := MAAS_GET_ENDPOINT('dga') dga_result_map := MAAS_MODEL_APPLY( dga_model_endpoint , { 'host' : domain_without_subdomains } ) dga_result := MAP_GET('is_malicious', dga_result_map) is_dga := dga_result != null && dga_result == 'dga' Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 35. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 36. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 37. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 38. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 39. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) • Automatic conversion of common model output formats to an endpoint Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 40. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) • Automatic conversion of common model output formats to an endpoint • A searchable model registry Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 41. Questions Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather session Thursday. • Find me at http://caseystella.com • Twitter handle: @casey_stella • Email address: cstella@hortonworks.com Casey Stella@casey_stella (Hortonworks) Model as a Service 2017