SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Model as a Service
Modern Streaming Data Science with Apache Metron
Casey Stella
@casey_stella
2017
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Introduction
Hi, I’m Casey Stella!
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
◦ The enriched data will be indexed for further analysis in a realtime index and in a batch
index
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
◦ The enriched data will be indexed for further analysis in a realtime index and in a batch
index
• Better enrichment =⇒ more context =⇒ better insights
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
• Machine learning can adapt better, but can be hard to scale at high velocity
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
• Machine learning can adapt better, but can be hard to scale at high velocity
The ideal solution is to enable the mixing of rules, statistical baselining and machine
learning.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
• Many instances of many models will be running at once
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
• Many instances of many models will be running at once
◦ Share resources fairly by using Yarn for deployment and model lifecycle management
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
• Provide a REST interface serving up your model
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
• Provide a REST interface serving up your model
From there, MaaS will handle deployment, management and model auto-discovery
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Model as a Service: Stellar
Now that we have a deployment and autodiscovery solution, we need to interact with the
models. We think of Stellar as like Excel functions that we can run on streaming data:
• Compose a rich set of built-in functions with user defined functions
• Provide simple primitives around the functions: boolean operations, conditionals,
numerical computation.
Model interaction is enabled as a set of Stellar functions that discover and interact with
model endpoints.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Model as a Service: Architecture
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Interlude: Domain Generating Algorithms
Domain Generating Algorithms are used by botnets to communicate with compromised
computers in an evasive way. Traffic to a fixed command and control host would be
noticed and firewall rules could be modified to cut communication channels cleanly.
Instead, the botnet command and control hosts must move around to evade detection.
Typically this involves periodically generating a synthetic domain in a repeatable way
and having the compromised computers attempt to connect to a few candidate synthetic
domains daily with some moderate hope of guessing the right one. This traffic is small
enough to be lost in the shuffle of a large organization, making the evasion effective.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Demo
• One solution for detecting synthetic domains is to construct a classifier that must be
run against every domain going across the network.
• At the BSides DFW Conference in 2013, ClickSecurity presented a model written in
Python for detecting synthetic domains.
• We can expose this model as a REST endpoint, deploy via MaaS and interact with
the model via Stellar
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
dga_model_endpoint := MAAS_GET_ENDPOINT('dga')
dga_result_map := MAAS_MODEL_APPLY( dga_model_endpoint
, { 'host' : domain_without_subdomains }
)
dga_result := MAP_GET('is_malicious', dga_result_map)
is_dga := dga_result != null && dga_result == 'dga'
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
• Automatic conversion of common model output formats to an endpoint
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
• Automatic conversion of common model output formats to an endpoint
• A searchable model registry
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Questions
Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather
session Thursday.
• Find me at http://caseystella.com
• Twitter handle: @casey_stella
• Email address: cstella@hortonworks.com
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017

Weitere ähnliche Inhalte

Was ist angesagt?

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureDataWorks Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...DataWorks Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProCloudera, Inc.
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...DataWorks Summit
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 

Was ist angesagt? (20)

Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 

Ähnlich wie MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (Incubating)

Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationRanga Vadlamudi
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSAmazon Web Services
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Amazon Web Services
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraDataStax Academy
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information SecuritySven Krasser
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it worldChris Dwan
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

Ähnlich wie MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (Incubating) (20)

Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB Presentation
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWS
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache Cassandra
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information Security
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it world
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (Incubating)

  • 1. Model as a Service Modern Streaming Data Science with Apache Metron Casey Stella @casey_stella 2017 Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 2. Introduction Hi, I’m Casey Stella! Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 3. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 4. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 5. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 6. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 7. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked ◦ The enriched data will be indexed for further analysis in a realtime index and in a batch index Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 8. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked ◦ The enriched data will be indexed for further analysis in a realtime index and in a batch index • Better enrichment =⇒ more context =⇒ better insights Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 9. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 10. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 11. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 12. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well • Machine learning can adapt better, but can be hard to scale at high velocity Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 13. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well • Machine learning can adapt better, but can be hard to scale at high velocity The ideal solution is to enable the mixing of rules, statistical baselining and machine learning. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 14. Characteristics of a Solution • There’s a lot of data and it comes at you fast Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 15. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 16. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 17. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 18. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 19. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 20. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 21. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 22. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 23. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper • Many instances of many models will be running at once Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 24. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper • Many instances of many models will be running at once ◦ Share resources fairly by using Yarn for deployment and model lifecycle management Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 25. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 26. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 27. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 28. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless • Provide a REST interface serving up your model Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 29. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless • Provide a REST interface serving up your model From there, MaaS will handle deployment, management and model auto-discovery Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 30. Model as a Service: Stellar Now that we have a deployment and autodiscovery solution, we need to interact with the models. We think of Stellar as like Excel functions that we can run on streaming data: • Compose a rich set of built-in functions with user defined functions • Provide simple primitives around the functions: boolean operations, conditionals, numerical computation. Model interaction is enabled as a set of Stellar functions that discover and interact with model endpoints. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 31. Model as a Service: Architecture Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 32. Interlude: Domain Generating Algorithms Domain Generating Algorithms are used by botnets to communicate with compromised computers in an evasive way. Traffic to a fixed command and control host would be noticed and firewall rules could be modified to cut communication channels cleanly. Instead, the botnet command and control hosts must move around to evade detection. Typically this involves periodically generating a synthetic domain in a repeatable way and having the compromised computers attempt to connect to a few candidate synthetic domains daily with some moderate hope of guessing the right one. This traffic is small enough to be lost in the shuffle of a large organization, making the evasion effective. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 33. Demo • One solution for detecting synthetic domains is to construct a classifier that must be run against every domain going across the network. • At the BSides DFW Conference in 2013, ClickSecurity presented a model written in Python for detecting synthetic domains. • We can expose this model as a REST endpoint, deploy via MaaS and interact with the model via Stellar Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 34. dga_model_endpoint := MAAS_GET_ENDPOINT('dga') dga_result_map := MAAS_MODEL_APPLY( dga_model_endpoint , { 'host' : domain_without_subdomains } ) dga_result := MAP_GET('is_malicious', dga_result_map) is_dga := dga_result != null && dga_result == 'dga' Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 35. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 36. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 37. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 38. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 39. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) • Automatic conversion of common model output formats to an endpoint Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 40. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) • Automatic conversion of common model output formats to an endpoint • A searchable model registry Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 41. Questions Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather session Thursday. • Find me at http://caseystella.com • Twitter handle: @casey_stella • Email address: cstella@hortonworks.com Casey Stella@casey_stella (Hortonworks) Model as a Service 2017