Vulnerabilities such as Spectre and Meltdown continue to plague many production servers, based on Intel CPUs. Our solution involves software-based monitoring of hardware counters and sending that data to Apache Spark clusters for threat detection. We leverage Spark's support for support vector machine (SVM) inference. Our machine learning models are trained off-line by a data scientist within a Jupyter notebook environment. As new models are validated, they can be easily deployed to the Spark cluster from the notebook. We have standardized model export and import using the ONNX machine learning open file format. In our presentation, we will demo the full pipeline, from model training to deployment. We will discuss the various challenges when deploying ML-based cyber-threat detection at scale using Apache Spark. For example, we found that gaps in detection can occur when Spark models are updated. We will describe a novel data ingestion architecture, based on Apache Kafka, that we developed to deal with this issue.
2. George Williams, GSI Technology
Scaling ML-based
CyberThreat Detection
For Production Systems
#UnifiedAnalytics #SparkAISummit
3. Agenda
● Cybersecurity Trends
● ML + Cybersecurity + Production Systems
● GSI Technology
● Architecture
● Code
3#UnifiedAnalytics #SparkAISummit
Director of Data Science,
GSI Technology