The document proposes a modular architecture for analyzing HTTP payloads using multiple classifiers to detect anomalies and intrusions. It trains ensembles of hidden Markov models on different lines of HTTP payloads like the request line, host, and user agent. The HMM outputs are then used as features for a one-class classifier to classify the full payload. The approach is evaluated on real traffic datasets and shown to outperform similar systems with high detection rates while being computationally efficient.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
HTTP Payload Analysis Using Multiple Classifiers
1. University of Cagliari Department of Electric and
Electronic Engineering
A modular architecture for the
analysis of HTTP payloads based
on Multiple Classifiers
Davide Ariu Giorgio Giacinto
davide.ariu@diee.unica.it giacinto@diee.unica.it
Napoli, 17 Giugno 2011
This research was sponsored by the
Pattern Recognition and Applications Group Autonomous Region of Sardinia through a grant
Group http://prag.diee.unica.it financed with the ”Sardinia PO FSE 2007‐2013”
funds and provided according to the L.R. 7/2007
2. Outline
• Motivations
• The proposed system
• Experimental Setup and Results
• Conclusions
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
2
3. The objective
Design of an anomaly based
Intrusion Detection System
for the protection of
Web Servers and Applications.
The HTTP traffic toward the web
servers is inspected by a
multiple classifier system.
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
3
4. Why Web Applications?
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
4
5. Why Anomaly Detection?
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
5
6. A legitimate Payload...
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Accept: text/*, text/html
User-Agent: Mozilla/4.0
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
6
7. A legitimate Payload...
Request Line
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Accept: text/*, text/html
User-Agent: Mozilla/4.0
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
7
8. A legitimate Payload...
Request Line
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Accept: text/*, text/html
User-Agent: Mozilla/4.0
Request Headers
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
8
9. ...and some attacks
• Long Request Buffer Overflow
HEAD / aaaaaaa…aaaaaaaaaaaa
• URL Decoding Error
GET /d/winnt/sys32/cmd.exe?/c+dir HTTP/1.0
Host: www
Connection: close
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
9
10. Why Payload Analysis?
• Detection of Web-based attacks based
on the
– Analysis of the Request-Line
• Allows detecting only attacks that exploit
input-validation flows
e.g. Spectrogram ([Song,2009]), HMM-Web
([Corona,2009])
– HTTP Payload Analysis
• Takes into account the whole HTTP-request,
and thus it can (in principle) detect any
kind of attack
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
10
11. SOA - Payload Analysis
• Payl [Wang,2004]
– n-grams to represent byte statistics
• McPAD [Perdisci,2009]
– Ensemble of one-class SVM trained on ν-grams
• Spectrogram [Wang,2009]
– Ensemble of Markov Chains to analyze the request-Line
• HMMPayl [Ariu,2011]
– Ensemble of HMM to analyze sequences of bytes from
the whole payload
None of the above techniques
represented the structure of the payload
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
11
12. The proposed system
Basic Idea
• We propose to take into account the
structure of HTTP payloads
– For each line of the payload, an
ensemble of HMM is used to model the
sequences of bytes.
– The final decision is obtained by
using the HMM outputs as features.
The payload is thus classified by a
one-class classifier trained on the
outputs of the HMM ensembles.
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
12
13. The proposed system
A scheme
HMM Ensemble
HTTP Payload
Request‐Line
IDS
HMM Ensemble
GET /pra/index.php HTTP/1.1
Accept‐Language
0.62
Host: prag.diee.unica.it
‐1
User-Agent: Mozilla/5.0
Output Score
One‐Class
Accept-Encoding: gzip, deflate
HMM Ensemble 0.53 or
Classifier
Host Class‐Label
0.34
HMM Ensemble 0.49
User‐Agent
HMM Ensemble
Accept‐Encoding
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
13
14. Missing Features
• Each request typically does not
contain all the headers
– Training phase: the value of the
feature related to a missing header has
been set to the average value
– Testing phase: the value of the feature
related to a missing header has been
set to -1
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
14
15. Experimental Setup - 1
• 2 Datasets of Real legitimate
traffic
– DIEE, collected at the University of
Cagliari
– GT, collected at Georgia Tech
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
15
16. Experimental Setup - 2
• 3 Datasets of Real Attacks
– Generic, 66 Attacks
– Shell-code, 11 Attacks
– XSS-SQL Injection,38 Attacks
• Training: 1 day of traffic
• Test: the remaining traffic plus
attacks
– K-fold CV
16
17. Experimental Setup - 3
• 4 One-class classification algorithms
with default setting of parameters
– Gauss - Gaussian distribution
– Mog – Mixture of Gaussians
– Parzen – Parzen density estimator
– SVM – SVM with RBF Kernel
• Performance evaluated using the Partial
AUC
– Computed in the FP range [0,0.1]
– Normalized dividing by 0.1
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
17
18. Experimental Results
Partial AUC – DIEE Dataset
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
18
19. Experimental Results
Multiple HMM – DIEE Dataset – Shellcode Attacks
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
19
20. Experimental Results
Partial AUC – GT Dataset
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
20
21. Experimental Results
Comparison with similar IDS
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
21
22. Computational Cost
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
22
23. Conclusions
• We proposed an anomaly based IDS for the
protection of Web-Servers and Web-
Applications
• We exploited the MCS paradigm
– To analyze the structure of the HTTP payload
– By combining the outputs through a One-class
classifier
• Compared to similar systems, our propoal
– Provides high performance in attack detection
– Is fast
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
23