SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Prasanth Kothuri, CERN
Daniel Lanza Garcia, CERN
Real-time Detection Of Anomalies In
The Database Infrastructure Using
Apache Spark
#EUde13
Outline
• About CERN
• Apache Spark at CERN
• Data Lake Challenges
• Detecting Anomalous Connections
• Real-time Intelligent Monitoring
2#EUde13
About CERN
3#EUde13
• CERN - European Laboratory for Particle Physics
• Founded in 1954 by 12 Countries for fundamental
physics research in the post-war Europe
• Today 21 member states + world-wide
collaborations
• About ~1000 MCHF yearly budget
• 2’300 CERN personnel
• 10’000 users from 110 countries
Fundamental Research
4#EUde13
• What is 95% of the Universe made
of?
• Why do particles have mass?
• Why is there no antimatter left in
the Universe?
• What was the Universe like just
after the “Big Bang”
Who Changed My Plan? 5
The Large Hadron Collider (LHC)
Largest machine in the world
27km, 6000+ superconducting magnets
Emptiest place in the solar system
High vacuum inside the magnets
Hottest spot in the galaxy
During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun;
Fastest racetrack on Earth
Protons circulate 11245 times/s (99.9999991% the speed of light)
Apache Spark @ CERN
• Spark is a popular compute engine for Analytics
+ ETL
– Deployed on four production Hadoop/YARN clusters
• Aggregated capacity (2017): ~1500 physical cores, 11 PB
– Adoption is growing. Key projects involving Spark:
• Analytics for accelerator controls and logging
• Monitoring IT Infrastructure, uses Spark + Spark streaming
• Analytics on aggregated logs
• Explorations on the use of Spark for high energy physics
6#EUdev2
Detection Of Anomalies In
Database Infrastructure
7#EUde13
CERN Database Infrastructure
8#EUde13
• Over 100 Oracle databases, most of them RAC
• Running Oracle 11.2.0.4 and 12.1.0.2
• 950 TB of data files for production DBs in total (without counting
replicas), NAS as storage
• Oracle In-Memory in production since July 2015
• Examples of critical production DBs:
• Quench Protection System: 150’000 changes/s
• LHC logging database: 492 TB, growing 180 TB/year
• But also DB as a Service (single instances)
• 310 MySQL, 70 PostgreSQL, 9 Oracle, 5 InfluxDB
Data Lake
9#EUde13
• Objective: Build central repository for database audit,
performance metrics and logs with the goal of real time
analytics and offline analytics
• Use cases
• Detection of Anomalies
• Troubleshooting
• Capacity Planning
Diverse Data
10#EUde13
• Tables
• Audit
• Performance Metrics
• Alert
• Log files
• Listener
Data pipeline Architecture
11#EUde13
alertlog
Audit
AWR
Kafka
500 events/minute
500 events/minute
500 events/minute
70.000 events/minute
70.000 events/minute
HDFS ACLs
Per-index ACLS
Aggregations and Dashboards
12#EUde13
Real-time analysis
13#EUde13
• Attempt to detect anomalous connections using Machine
Learning techniques
• First approach to use kNN classifier to classify connection
as normal or anomalous
• Anomaly detection suffer from high false positive rate, the
challenge is to choose features that best characterize
user connections pattern
Feature Selection
14#EUde13
• Data Preparation
• Feature Extraction
• Client_host
• Client_user
• Client_program
• Service_name
• Hour_of_the_day
• Day_of_the_week
{
"oracle_sid" : "XXXX_YYYY",
"listener_name": "listener_scan1",
"database_type" : "oracle",
"producer" : "oracle",
"source_type": "listener",
"hostname" : "XXXXX.cern.ch",
"flume_agent_version": "0.1.6-7.el6",
"type" : "dbconnection",
"event_timestamp" : "2017-09-27T04:45:27+0200",
"CONNECT_DATA_SERVER": "DEDICATED",
"CONNECT_DATA_SERVICE_NAME": “XXXr.cern.ch",
"client_program": "python",
"client_host" : "pxxxxj2.cern.ch",
"client_user": "merge",
"client_protocol" : "tcp",
"client_ip": "137.100.100.167",
"client_port": 38432,
"type" : "establish",
"service_name" : "XXXX.cern.ch",
"return_code": 0
}
Detecting anomalous connections job
• Machine learning: kNN
15#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
16#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
17#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
18#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
19#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
– Making it scalable, each user should behave similarly
20#EUde13
Detecting anomalous connections job
• Distance Function:
21#EUde13
• Spark Streaming Job processing – 10 million / day
• 6 executors, each with 4 cores and 8GB of memory
Metrics monitor job
• A general purpose metrics monitor
22#EUde13
New metrics by
aggregation and
math equations
Metrics
source
Notifications
sink
(optional)
Metrics monitor job: data flow
23#EUde13
Monitor 1
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
Monitor 2
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
Metrics
source
Analysis
results
sink
(optional)
Monitor N
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator 1
Notificator 2
Notificator N
Metrics
source
Notifications
sink
(optional)
Metrics monitor job: modularity
24#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Metrics
source
Analysis
results
sink
(optional)
Notificator
• Easy implementation of new components
– Extending corresponding abstract class
– Stateful operations: a store is available
Metrics
source
Properties
source
• Which metrics should process this monitor?
– CPU Usage of all databases, Cache Hit % of MySQL
databases, all machines in PROD, …
• Filter metrics by attributes
• Accepts exact value or regular expressions
• If not applied, all metrics are processed
25#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Massage metric value
– Average, difference, filter, …
• Internal store for stateful pre-analysis
• If applied, result will be the analyzed value
26#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: average value of a period
27#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: weighted average value of a period
28#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: difference with previous value
29#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: difference with previous value
30#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Is the metric behaving as it should?
• Determine the status of each incoming metric
– OK, WARNING, ERROR, EXCEPTION
• Results can be sent to external storage
• Internal store for stateful analysis
31#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: thresholds fixed manually
32#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning from recent behaviour (average and variance)
– Errors when metric does not behave as it used to
33#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning from recent behaviour (average and variance)
– Easily affected by outliers
34#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning from recent behaviour (percentiles)
35#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
36#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
37#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
38#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
39#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Should we inform anyone?
– Metric has been in ERROR for 20 minutes…
• Determines when a notification is raised
• Based on metric statuses
• Notifications are consumed by Notifications sink
40#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: constant status for certain period
41#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: percentage of the period in certain statuses
42#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
Metrics monitor job: configuration
43#EUde13
checkpoint.dir = <path_to_store_stateful_data> (default: /tmp/)
spark.batch.time = <seconds> (default: 30)
metrics.source.kafka-prod.type = kafka
metrics.source.kafka-prod.consumer.bootstrap.servers = server:port,server:port
metrics.source.kafka-prod.topics = db-logging-platform
metrics.source.kafka-prod.parser.attributes = INSTANCE_NAME METRIC_NAME
metrics.source.kafka-prod.parser.value.attribute = VALUE
metrics.source.kafka-prod.parser.timestamp.attribute = END_TIME
results.sink.type = elastic
results.sink.index = itdb_db-metric-results/log
notifications.sink.type = elastic
notifications.sink.index = itdb_db-metric-notifications/log
# Monitors
monitor.<monitor-id-1>.<confs>...
monitor.<monitor-id-2>.<confs>...
monitor.<monitor-id-n>.<confs>...
Properties
source
Metrics monitor job: monitor configuration
44#EUde13
# Monitor CPU of all instances in production
monitor.CPUUsage.filter.attribute.INSTANCE_NAME = regex:prod-.*
monitor.CPUUsage.filter.attribute.METRIC_NAME = CPU Usage Per Sec
monitor.CPUUsage.missing.max-period = 3m
monitor.CPUUsage.pre-analysis.type = weighted-average
monitor.CPUUsage.pre-analysis.period = 5m
monitor.CPUUsage.analysis.type = seasonal
monitor.CPUUsage.analysis.season = hour
monitor.CPUUsage.analysis.learning.ratio = 0.2
monitor.CPUUsage.notificator.error-perc.type = percentage
monitor.CPUUsage.notificator.error-perc.statuses = ERROR
monitor.CPUUsage.notificator.error-perc.period = 10m
monitor.CPUUsage.notificator.warn-perc.type = percentage
monitor.CPUUsage.notificator.warn-perc.statuses = WARNING
monitor.CPUUsage.notificator.warn-perc.period = 20m
• Configuration can be updated while running
• Add/remove monitor or change any parameter
Filter
Pre-
anal
ysis
Anal
ysis
Notificator
Notificator
Metrics monitor job
• We are monitoring:
– Around 30K metrics per minute (day season analysis)
– Processing time = 30 seconds/minute
– On 3 executors (YARN)
• Repo, user’s and developer’s manual, contributions,
issues:
– https://github.com/cerndb/metrics-monitor
45#EUde13
Tips and Suggestions
46#EUde13
Java 8:
47#EUde13
Java 8:
Extend RDDs and Streams + Optional + lambdas + +
java.streaming.* + reference methods =
48#EUde13
Not using Spark check-pointing feature
49#EUde13
• Pro: develop/update the job without loosing stateful data
• Cons: implement save/load
• Note: for stateful operations checkpoint needs to be enable
– But when restarting, data is removed
Conclusion
50#EUde13
• Building the data lake opens up for several kinds of analysis
• Implementation of anomalous connections helps us to identify
potential intrusions to the database infrastructure
• Intelligent monitoring tool learns and adapts the threshold
• Intelligent monitoring tool is open source and available for
everyone
• Apache Spark Streaming helps us to perform machine
learning at scale

Weitere ähnliche Inhalte

Was ist angesagt?

The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...Databricks
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfMichael Kogan
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 

Was ist angesagt? (20)

The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 

Ähnlich wie Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark with Daniel Lanza and Prasanth Kothuri

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Combining out - of - band monitoring with AI and big data for datacenter aut...
Combining out - of - band monitoring with AI and big data  for datacenter aut...Combining out - of - band monitoring with AI and big data  for datacenter aut...
Combining out - of - band monitoring with AI and big data for datacenter aut...Ganesan Narayanasamy
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Integrated Carbon Observation System (ICOS)
 
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsQin Liu
 
ET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC PosterET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC PosterJoshua Proulx
 
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Szymon Skorupinski
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Sensor Organism project presentation
Sensor Organism project presentationSensor Organism project presentation
Sensor Organism project presentationNaums Mogers
 
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...Deep Learning Italia
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPyData
 
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...Spark Summit
 
Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsAndrew Lowe
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenNETWAYS
 

Ähnlich wie Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark with Daniel Lanza and Prasanth Kothuri (20)

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
DA-JPL-final
DA-JPL-finalDA-JPL-final
DA-JPL-final
 
Combining out - of - band monitoring with AI and big data for datacenter aut...
Combining out - of - band monitoring with AI and big data  for datacenter aut...Combining out - of - band monitoring with AI and big data  for datacenter aut...
Combining out - of - band monitoring with AI and big data for datacenter aut...
 
ROBOTICS - Introduction to Robotics Microcontroller
ROBOTICS -  Introduction to Robotics MicrocontrollerROBOTICS -  Introduction to Robotics Microcontroller
ROBOTICS - Introduction to Robotics Microcontroller
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...
 
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
 
ET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC PosterET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC Poster
 
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Super Computers
Super ComputersSuper Computers
Super Computers
 
Sensor Organism project presentation
Sensor Organism project presentationSensor Organism project presentation
Sensor Organism project presentation
 
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
 
OOW-IMC-final
OOW-IMC-finalOOW-IMC-final
OOW-IMC-final
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
supercomputer
supercomputersupercomputer
supercomputer
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark Basham
 
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
 
Supercomputer @ manarat university by reza
Supercomputer  @ manarat university by rezaSupercomputer  @ manarat university by reza
Supercomputer @ manarat university by reza
 
Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle Physics
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe Haen
 

Mehr von Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Mehr von Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Kürzlich hochgeladen

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark with Daniel Lanza and Prasanth Kothuri

  • 1. Prasanth Kothuri, CERN Daniel Lanza Garcia, CERN Real-time Detection Of Anomalies In The Database Infrastructure Using Apache Spark #EUde13
  • 2. Outline • About CERN • Apache Spark at CERN • Data Lake Challenges • Detecting Anomalous Connections • Real-time Intelligent Monitoring 2#EUde13
  • 3. About CERN 3#EUde13 • CERN - European Laboratory for Particle Physics • Founded in 1954 by 12 Countries for fundamental physics research in the post-war Europe • Today 21 member states + world-wide collaborations • About ~1000 MCHF yearly budget • 2’300 CERN personnel • 10’000 users from 110 countries
  • 4. Fundamental Research 4#EUde13 • What is 95% of the Universe made of? • Why do particles have mass? • Why is there no antimatter left in the Universe? • What was the Universe like just after the “Big Bang”
  • 5. Who Changed My Plan? 5 The Large Hadron Collider (LHC) Largest machine in the world 27km, 6000+ superconducting magnets Emptiest place in the solar system High vacuum inside the magnets Hottest spot in the galaxy During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun; Fastest racetrack on Earth Protons circulate 11245 times/s (99.9999991% the speed of light)
  • 6. Apache Spark @ CERN • Spark is a popular compute engine for Analytics + ETL – Deployed on four production Hadoop/YARN clusters • Aggregated capacity (2017): ~1500 physical cores, 11 PB – Adoption is growing. Key projects involving Spark: • Analytics for accelerator controls and logging • Monitoring IT Infrastructure, uses Spark + Spark streaming • Analytics on aggregated logs • Explorations on the use of Spark for high energy physics 6#EUdev2
  • 7. Detection Of Anomalies In Database Infrastructure 7#EUde13
  • 8. CERN Database Infrastructure 8#EUde13 • Over 100 Oracle databases, most of them RAC • Running Oracle 11.2.0.4 and 12.1.0.2 • 950 TB of data files for production DBs in total (without counting replicas), NAS as storage • Oracle In-Memory in production since July 2015 • Examples of critical production DBs: • Quench Protection System: 150’000 changes/s • LHC logging database: 492 TB, growing 180 TB/year • But also DB as a Service (single instances) • 310 MySQL, 70 PostgreSQL, 9 Oracle, 5 InfluxDB
  • 9. Data Lake 9#EUde13 • Objective: Build central repository for database audit, performance metrics and logs with the goal of real time analytics and offline analytics • Use cases • Detection of Anomalies • Troubleshooting • Capacity Planning
  • 10. Diverse Data 10#EUde13 • Tables • Audit • Performance Metrics • Alert • Log files • Listener
  • 11. Data pipeline Architecture 11#EUde13 alertlog Audit AWR Kafka 500 events/minute 500 events/minute 500 events/minute 70.000 events/minute 70.000 events/minute HDFS ACLs Per-index ACLS
  • 13. Real-time analysis 13#EUde13 • Attempt to detect anomalous connections using Machine Learning techniques • First approach to use kNN classifier to classify connection as normal or anomalous • Anomaly detection suffer from high false positive rate, the challenge is to choose features that best characterize user connections pattern
  • 14. Feature Selection 14#EUde13 • Data Preparation • Feature Extraction • Client_host • Client_user • Client_program • Service_name • Hour_of_the_day • Day_of_the_week { "oracle_sid" : "XXXX_YYYY", "listener_name": "listener_scan1", "database_type" : "oracle", "producer" : "oracle", "source_type": "listener", "hostname" : "XXXXX.cern.ch", "flume_agent_version": "0.1.6-7.el6", "type" : "dbconnection", "event_timestamp" : "2017-09-27T04:45:27+0200", "CONNECT_DATA_SERVER": "DEDICATED", "CONNECT_DATA_SERVICE_NAME": “XXXr.cern.ch", "client_program": "python", "client_host" : "pxxxxj2.cern.ch", "client_user": "merge", "client_protocol" : "tcp", "client_ip": "137.100.100.167", "client_port": 38432, "type" : "establish", "service_name" : "XXXX.cern.ch", "return_code": 0 }
  • 15. Detecting anomalous connections job • Machine learning: kNN 15#EUde13 Threshold
  • 16. Detecting anomalous connections job • Machine learning: kNN 16#EUde13 Threshold
  • 17. Detecting anomalous connections job • Machine learning: kNN 17#EUde13 Threshold
  • 18. Detecting anomalous connections job • Machine learning: kNN 18#EUde13 Threshold
  • 19. Detecting anomalous connections job • Machine learning: kNN 19#EUde13 Threshold
  • 20. Detecting anomalous connections job • Machine learning: kNN – Making it scalable, each user should behave similarly 20#EUde13
  • 21. Detecting anomalous connections job • Distance Function: 21#EUde13 • Spark Streaming Job processing – 10 million / day • 6 executors, each with 4 cores and 8GB of memory
  • 22. Metrics monitor job • A general purpose metrics monitor 22#EUde13
  • 23. New metrics by aggregation and math equations Metrics source Notifications sink (optional) Metrics monitor job: data flow 23#EUde13 Monitor 1 Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator Monitor 2 Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator Metrics source Analysis results sink (optional) Monitor N Filter (optional) Pre-analysis (optional) Analysis Notificator 1 Notificator 2 Notificator N Metrics source
  • 24. Notifications sink (optional) Metrics monitor job: modularity 24#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Metrics source Analysis results sink (optional) Notificator • Easy implementation of new components – Extending corresponding abstract class – Stateful operations: a store is available Metrics source Properties source
  • 25. • Which metrics should process this monitor? – CPU Usage of all databases, Cache Hit % of MySQL databases, all machines in PROD, … • Filter metrics by attributes • Accepts exact value or regular expressions • If not applied, all metrics are processed 25#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 26. • Massage metric value – Average, difference, filter, … • Internal store for stateful pre-analysis • If applied, result will be the analyzed value 26#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 27. • Type: average value of a period 27#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 28. • Type: weighted average value of a period 28#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 29. • Type: difference with previous value 29#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 30. • Type: difference with previous value 30#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 31. • Is the metric behaving as it should? • Determine the status of each incoming metric – OK, WARNING, ERROR, EXCEPTION • Results can be sent to external storage • Internal store for stateful analysis 31#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 32. • Type: thresholds fixed manually 32#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 33. • Type: learning from recent behaviour (average and variance) – Errors when metric does not behave as it used to 33#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 34. • Type: learning from recent behaviour (average and variance) – Easily affected by outliers 34#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 35. • Type: learning from recent behaviour (percentiles) 35#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 36. • Type: learning a season (hour, day, week) 36#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 37. • Type: learning a season (hour, day, week) 37#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 38. • Type: learning a season (hour, day, week) 38#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 39. • Type: learning a season (hour, day, week) 39#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 40. • Should we inform anyone? – Metric has been in ERROR for 20 minutes… • Determines when a notification is raised • Based on metric statuses • Notifications are consumed by Notifications sink 40#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 41. • Type: constant status for certain period 41#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 42. • Type: percentage of the period in certain statuses 42#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 43. Metrics monitor job: configuration 43#EUde13 checkpoint.dir = <path_to_store_stateful_data> (default: /tmp/) spark.batch.time = <seconds> (default: 30) metrics.source.kafka-prod.type = kafka metrics.source.kafka-prod.consumer.bootstrap.servers = server:port,server:port metrics.source.kafka-prod.topics = db-logging-platform metrics.source.kafka-prod.parser.attributes = INSTANCE_NAME METRIC_NAME metrics.source.kafka-prod.parser.value.attribute = VALUE metrics.source.kafka-prod.parser.timestamp.attribute = END_TIME results.sink.type = elastic results.sink.index = itdb_db-metric-results/log notifications.sink.type = elastic notifications.sink.index = itdb_db-metric-notifications/log # Monitors monitor.<monitor-id-1>.<confs>... monitor.<monitor-id-2>.<confs>... monitor.<monitor-id-n>.<confs>... Properties source
  • 44. Metrics monitor job: monitor configuration 44#EUde13 # Monitor CPU of all instances in production monitor.CPUUsage.filter.attribute.INSTANCE_NAME = regex:prod-.* monitor.CPUUsage.filter.attribute.METRIC_NAME = CPU Usage Per Sec monitor.CPUUsage.missing.max-period = 3m monitor.CPUUsage.pre-analysis.type = weighted-average monitor.CPUUsage.pre-analysis.period = 5m monitor.CPUUsage.analysis.type = seasonal monitor.CPUUsage.analysis.season = hour monitor.CPUUsage.analysis.learning.ratio = 0.2 monitor.CPUUsage.notificator.error-perc.type = percentage monitor.CPUUsage.notificator.error-perc.statuses = ERROR monitor.CPUUsage.notificator.error-perc.period = 10m monitor.CPUUsage.notificator.warn-perc.type = percentage monitor.CPUUsage.notificator.warn-perc.statuses = WARNING monitor.CPUUsage.notificator.warn-perc.period = 20m • Configuration can be updated while running • Add/remove monitor or change any parameter Filter Pre- anal ysis Anal ysis Notificator Notificator
  • 45. Metrics monitor job • We are monitoring: – Around 30K metrics per minute (day season analysis) – Processing time = 30 seconds/minute – On 3 executors (YARN) • Repo, user’s and developer’s manual, contributions, issues: – https://github.com/cerndb/metrics-monitor 45#EUde13
  • 48. Java 8: Extend RDDs and Streams + Optional + lambdas + + java.streaming.* + reference methods = 48#EUde13
  • 49. Not using Spark check-pointing feature 49#EUde13 • Pro: develop/update the job without loosing stateful data • Cons: implement save/load • Note: for stateful operations checkpoint needs to be enable – But when restarting, data is removed
  • 50. Conclusion 50#EUde13 • Building the data lake opens up for several kinds of analysis • Implementation of anomalous connections helps us to identify potential intrusions to the database infrastructure • Intelligent monitoring tool learns and adapts the threshold • Intelligent monitoring tool is open source and available for everyone • Apache Spark Streaming helps us to perform machine learning at scale