SlideShare a Scribd company logo
1 of 46
Download to read offline
Mike Sconzo
@sooshie
R&D at Click Security
Focused on data analysis for security use cases
Interested in machine learning/statistical analysis
NetWitness
ERCOT
Sandia National Labs
● Introduction
● How to use basic log information to detect
different attack types
○ Drive-by
○ SQL Injection
● Closing
● Python
○ IPython
○ pandas
○ numpy
○ matplotlib
○ scikit learn
● Bro
● Google
● sqlmap
● JBroFuzz
● sqlparse
● Gather data
● Clean up data
● Explore data
● Select/create features (numeric only)*
● Run machine learning algorithm*
● Analyze results
*optional
Is it possible to find clients being
exploited by various exploit kits by just
looking at traffic patterns?
● Gather data
● Clean up data
● Explore data
● Analyze results
● 21GB of Network Traffic
● 7600 Samples
● 687627 Files
● 807537 HTTP Requests
*MHR will be used as our ground truth
Is it possible to used supervised learning
(classification) to detect strings that are
likely SQL Injection?
● Gather data
● Explore data
● Clean up data
● Transform data
● Select/create features (numeric only)
● Run machine learning algorithm
● Analyze results
*Transform the data into a form that
might give better insight than a
signature
● Strings are great, but patterns might be better
● Extract patterns from the strings
● N-Grams!!!
● It’s possible to make quality decisions/find interesting activity using
data
● The more data you have the more accurate your predictions can
be
● Gathering (the right) data for the use case is important
● Cleaning the data takes a lot of effort, but it’s necessary
● Unfortunately none of this is a silver bullet, but it can help point you
in the right direction(s)
● None of this is magic, you can do it too!
http://clicksecurity.github.io/data_hacking/

More Related Content

Similar to Identifying Web Attacks Via Data Analysis

Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
Tamas K Lengyel
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 

Similar to Identifying Web Attacks Via Data Analysis (20)

Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Passive Intelligence Gathering and Analytics - It's All Just Metadata!
Passive Intelligence Gathering and Analytics - It's All Just Metadata!Passive Intelligence Gathering and Analytics - It's All Just Metadata!
Passive Intelligence Gathering and Analytics - It's All Just Metadata!
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
MongoDB - visualisation of slow operations
MongoDB - visualisation of slow operationsMongoDB - visualisation of slow operations
MongoDB - visualisation of slow operations
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdf
 
An EyeWitness View into your Network
An EyeWitness View into your NetworkAn EyeWitness View into your Network
An EyeWitness View into your Network
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Presto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@MyntraPresto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@Myntra
 
BSides Rochester 2018: Jonathan Myers: IoT Malware Detection with Machine Lea...
BSides Rochester 2018: Jonathan Myers: IoT Malware Detection with Machine Lea...BSides Rochester 2018: Jonathan Myers: IoT Malware Detection with Machine Lea...
BSides Rochester 2018: Jonathan Myers: IoT Malware Detection with Machine Lea...
 
Webinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBWebinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDB
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

Identifying Web Attacks Via Data Analysis

  • 1.
  • 2. Mike Sconzo @sooshie R&D at Click Security Focused on data analysis for security use cases Interested in machine learning/statistical analysis NetWitness ERCOT Sandia National Labs
  • 3. ● Introduction ● How to use basic log information to detect different attack types ○ Drive-by ○ SQL Injection ● Closing
  • 4. ● Python ○ IPython ○ pandas ○ numpy ○ matplotlib ○ scikit learn ● Bro ● Google ● sqlmap ● JBroFuzz ● sqlparse
  • 5. ● Gather data ● Clean up data ● Explore data ● Select/create features (numeric only)* ● Run machine learning algorithm* ● Analyze results *optional
  • 6.
  • 7. Is it possible to find clients being exploited by various exploit kits by just looking at traffic patterns? ● Gather data ● Clean up data ● Explore data ● Analyze results
  • 8.
  • 9. ● 21GB of Network Traffic ● 7600 Samples ● 687627 Files ● 807537 HTTP Requests
  • 10. *MHR will be used as our ground truth
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Is it possible to used supervised learning (classification) to detect strings that are likely SQL Injection? ● Gather data ● Explore data ● Clean up data ● Transform data ● Select/create features (numeric only) ● Run machine learning algorithm ● Analyze results
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. *Transform the data into a form that might give better insight than a signature
  • 31.
  • 32. ● Strings are great, but patterns might be better ● Extract patterns from the strings ● N-Grams!!!
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45. ● It’s possible to make quality decisions/find interesting activity using data ● The more data you have the more accurate your predictions can be ● Gathering (the right) data for the use case is important ● Cleaning the data takes a lot of effort, but it’s necessary ● Unfortunately none of this is a silver bullet, but it can help point you in the right direction(s) ● None of this is magic, you can do it too!