Identifying Web Attacks Via Data Analysis

•

1 like•469 views

This presentation will look at detection of SQL injection using Machine Learning as well as profiling web traffic to find misbehaving hosts. The goal is to get beyond "Top N" types of analysis and begin using multiple features to guide us towards interesting traffic. With these techniques multiple log types can be used, everything from web server logs to proxy logs.

Technology

Mike Sconzo
@sooshie
R&D at Click Security
Focused on data analysis for security use cases
Interested in machine learning/statistical analysis
NetWitness
ERCOT
Sandia National Labs

● Introduction
● How to use basic log information to detect
different attack types
○ Drive-by
○ SQL Injection
● Closing

● Python
○ IPython
○ pandas
○ numpy
○ matplotlib
○ scikit learn
● Bro
● Google
● sqlmap
● JBroFuzz
● sqlparse

● Gather data
● Clean up data
● Explore data
● Select/create features (numeric only)*
● Run machine learning algorithm*
● Analyze results
*optional

Is it possible to find clients being
exploited by various exploit kits by just
looking at traffic patterns?
● Gather data
● Clean up data
● Explore data
● Analyze results

● 21GB of Network Traffic
● 7600 Samples
● 687627 Files
● 807537 HTTP Requests

Is it possible to used supervised learning
(classification) to detect strings that are
likely SQL Injection?
● Gather data
● Explore data
● Clean up data
● Transform data
● Select/create features (numeric only)
● Run machine learning algorithm
● Analyze results

*Transform the data into a form that
might give better insight than a
signature

● Strings are great, but patterns might be better
● Extract patterns from the strings
● N-Grams!!!

● It’s possible to make quality decisions/find interesting activity using
data
● The more data you have the more accurate your predictions can
be
● Gathering (the right) data for the use case is important
● Cleaning the data takes a lot of effort, but it’s necessary
● Unfortunately none of this is a silver bullet, but it can help point you
in the right direction(s)
● None of this is magic, you can do it too!

http://clicksecurity.github.io/data_hacking/

Similar to Identifying Web Attacks Via Data Analysis

Splunk, SIEMs, and Big Data - The Undercroft - November 2019

Jonathan Singer

21 people attended the July 2014 program meeting hosted by BDPA Cincinnati chapter. The topic was 'Open Source Tools and Resources'. The guest speaker was Greg Greenlee (Blacks In Technology). 'Open source' refers to a computer program in which the source code is available to the general public for use or modification from its original design. Open source code is typically created as a collaborative effort in which programmers improve upon the code and share the changes within the community. Open source sprouted in the technological community as a response to proprietary software owned by corporations. Over 85% of enterprises are using open source software. Managers are quickly realizing the benefit that community-based development can have on their businesses. This month, we put on our geek hats and detective gloves to learn how we can monitor our computers’ environments using open source tools. This meetup covered some of the most popular ‘Free and Open Source Software’ (FOSS) tools used to monitor various aspects of your computer environment.

Handout: 'Open Source Tools & Resources'

BDPA Education and Technology Foundation

Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems. All of a sudden to monitor all of the components becomes a big data problem itself. In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like: Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services. Not only the tools, what should you monitor about the actual data that flows in the system? And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy. Demi Ben-Ari is a Co-Founder and CTO @ Panorays. Demi has over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems. Describing himself as a software development groupie, Interested in tackling cutting edge technologies. Demi is also a co-founder of the “Big Things” Big Data community: http://somebigthings.com/big-things-intro/

Monitoring Big Data Systems - "The Simple Way"

Demi Ben-Ari

Dfrws eu 2014 rekall workshop

Tamas K Lengyel

Passive Intelligence Gathering and Analytics - It's All Just Metadata!

CTruncer

Python in Industry

Dharmit Shah

A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry… Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS & GCP and Data Center infrastructure to answer the basic questions of anyone starting their way in the big data world. how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORC,AVRO which technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL? GCS? Big Query? Data flow? Data Lab? tensor flow? how to handle streaming? how to manage costs? Performance tips? Security tip? Cloud best practices tips? In this meetup we shall present lecturers working on several cloud vendors, various big data platforms such hadoop, Data warehourses , startups working on big data products. basically - if it is related to big data - this is THE meetup. Some of our online materials (mixed content from several cloud vendor): Website: https://big-data-demystified.ninja (under construction) Meetups: https://www.meetup.com/Big-Data-Demystified https://www.meetup.com/AWS-Big-Data-Demystified/ You tube channels: https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber Audience: Data Engineers Data Science DevOps Engineers Big Data Architects Solution Architects CTO VP R&D

AWS Big Data Demystified #1.2 | Big Data architecture lessons learned

Omid Vahdaty

MongoDB - visualisation of slow operations

Kay1A

Building a data pipeline to ingest data into Hadoop in minutes using Streamse...

Guglielmo Iozzia

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017

Demi Ben-Ari

OpenSearch.pdf

Abhi Jain

An EyeWitness View into your Network

CTruncer

This talk is from Distributed Data Summit SF 2018 - http://distributeddatasummit.com/2018-sf/sessions#chella Audit logging is one of the most critical features in an enterprise-ready database in terms of security compliance. Furthermore, live traffic troubleshooting is critical for operators to troubleshoot production issues quickly. While past versions have lacked these critical features, the Cassandra team understood the need for better solutions and in the upcoming release of Cassandra both of these features now come out of the box which makes Cassandra even more awesome to work with. Cassandra now supports Audit logging and query logging as part of C* itself. As part of this talk, audience will learn about how to enable, configure, and tune audit logging for their C* clusters and how to log live traffic/queries for serverel needs including troubleshooting or even live traffic reply

Query and audit logging in cassandra

Vinay Kumar Chella

Presto Bangalore Meetup1 Repertoire@Myntra

Shubham Tagra

BSides Rochester 2018: Jonathan Myers: IoT Malware Detection with Machine Lea...

JosephTesta9

Webinar - Unleash AI power with MySQL and MindsDB

Federico Razzoli

TRHUG 2015 - Veloxity Big Data Migration Use Case

Hakan Ilter

MongoDB Basics Unileon

Juan Antonio Roy Couto

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...

Demi Ben-Ari

Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system using tools like: Web Services,Spark,Cassandra,MongoDB,AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.

Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...

Codemotion

Similar to Identifying Web Attacks Via Data Analysis (20)

Splunk, SIEMs, and Big Data - The Undercroft - November 2019

Handout: 'Open Source Tools & Resources'

Monitoring Big Data Systems - "The Simple Way"

Dfrws eu 2014 rekall workshop

Passive Intelligence Gathering and Analytics - It's All Just Metadata!

Python in Industry

AWS Big Data Demystified #1.2 | Big Data architecture lessons learned

MongoDB - visualisation of slow operations

Building a data pipeline to ingest data into Hadoop in minutes using Streamse...

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017

OpenSearch.pdf

An EyeWitness View into your Network

Query and audit logging in cassandra

Presto Bangalore Meetup1 Repertoire@Myntra

BSides Rochester 2018: Jonathan Myers: IoT Malware Detection with Machine Lea...

Webinar - Unleash AI power with MySQL and MindsDB

TRHUG 2015 - Veloxity Big Data Migration Use Case

MongoDB Basics Unileon

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...

Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Top 10 Most Downloaded Games on Play Store in 2024

SynarionITSolutions

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

A Domino Admins Adventures (Engage 2024)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

presentation ICT roal in 21st century education

Axa Assurance Maroc - Insurer Innovation Award 2024

How to Troubleshoot Apps for the Modern Connected Worker

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

AWS Community Day CPH - Three problems of Terraform

Powerful Google developer tools for immediate impact! (2023-24 C)

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Partners Life - Insurer Innovation Award 2024

Strategies for Landing an Oracle DBA Job as a Fresher

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

HTML Injection Attacks: Impact and Mitigation Strategies

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Top 10 Most Downloaded Games on Play Store in 2024

Identifying Web Attacks Via Data Analysis

2. Mike Sconzo @sooshie R&D at Click Security Focused on data analysis for security use cases Interested in machine learning/statistical analysis NetWitness ERCOT Sandia National Labs

3. ● Introduction ● How to use basic log information to detect different attack types ○ Drive-by ○ SQL Injection ● Closing

4. ● Python ○ IPython ○ pandas ○ numpy ○ matplotlib ○ scikit learn ● Bro ● Google ● sqlmap ● JBroFuzz ● sqlparse

5. ● Gather data ● Clean up data ● Explore data ● Select/create features (numeric only)* ● Run machine learning algorithm* ● Analyze results *optional

7. Is it possible to find clients being exploited by various exploit kits by just looking at traffic patterns? ● Gather data ● Clean up data ● Explore data ● Analyze results

9. ● 21GB of Network Traffic ● 7600 Samples ● 687627 Files ● 807537 HTTP Requests

10. *MHR will be used as our ground truth

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23. Is it possible to used supervised learning (classification) to detect strings that are likely SQL Injection? ● Gather data ● Explore data ● Clean up data ● Transform data ● Select/create features (numeric only) ● Run machine learning algorithm ● Analyze results

24.

25.

26.

27.

28.

29.

30. *Transform the data into a form that might give better insight than a signature

31.

32. ● Strings are great, but patterns might be better ● Extract patterns from the strings ● N-Grams!!!

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45. ● It’s possible to make quality decisions/find interesting activity using data ● The more data you have the more accurate your predictions can be ● Gathering (the right) data for the use case is important ● Cleaning the data takes a lot of effort, but it’s necessary ● Unfortunately none of this is a silver bullet, but it can help point you in the right direction(s) ● None of this is magic, you can do it too!

46. http://clicksecurity.github.io/data_hacking/

Identifying Web Attacks Via Data Analysis

Recommended

Recommended

More Related Content

Similar to Identifying Web Attacks Via Data Analysis

Similar to Identifying Web Attacks Via Data Analysis (20)

Recently uploaded

Recently uploaded (20)

Identifying Web Attacks Via Data Analysis