Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Noorbehbahani data preprocessing for anomaly based network intrusion

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 58 Anzeige

Noorbehbahani data preprocessing for anomaly based network intrusion

Herunterladen, um offline zu lesen

comprehensively reviewing the features derived from network traffic, and the related data preprocessing techniques which have been used in anomaly-based NIDS since 1999.

grouping anomaly-based NIDS based on the types of network traffic features used for detection. The aim is to show where the majority of research has been focused. The groups show a trend from previously using packet header features exclusively, to using more payload features.

comprehensively reviewing the features derived from network traffic, and the related data preprocessing techniques which have been used in anomaly-based NIDS since 1999.

grouping anomaly-based NIDS based on the types of network traffic features used for detection. The aim is to show where the majority of research has been focused. The groups show a trend from previously using packet header features exclusively, to using more payload features.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (19)

Ähnlich wie Noorbehbahani data preprocessing for anomaly based network intrusion (20)

Anzeige

Aktuellste (20)

Noorbehbahani data preprocessing for anomaly based network intrusion

  1. 1. By :F.Noorbehbahani Fall 2013 Data preprocessing for anomaly based network intrusion detection: A review
  2. 2. u Dataset creation u involves identifying representative network traffic for training and testing. These datasets should be labeled indicating whether the connection is normal or anomalous. u Feature construction u create additional features with a better discriminative ability than the initial feature set. This can bring significant improvement to machinelearning algorithms. Features can be constructed manually, or by using data mining methods such as sequence analysis, association mining, and frequent-episode mining. u Reduction u is commonly used to decrease the dimensionality of the dataset by discarding any redundant or irrelevant features.(FS) Data preprocessing
  3. 3. u comprehensively reviewing the features derived from network traffic, and the related data preprocessing techniques which have been used in anomaly-based NIDS since 1999. u grouping anomaly-based NIDS based on the types of network traffic features used for detection. The aim is to show where the majority of research has been focused. The groups show a trend from previously using packet header features exclusively, to using more payload features. paper main contributions
  4. 4. AnomalyBasedFeatures Packet Header Basic Single Connection Multiple Connection Protocol Based Specification Based Parser Based AP Keyboard Based KDD Cup 99 Payload Based N-gram analysis of request to server Analysis of request to Web App General payload pattern matching Analysis of web content to clients
  5. 5. u Minimize data preprocessing requirements u Real-time, High bandwidth links u Summarizing a series of network packet headers into a single flow record, such as NetFlow, further reduces resource requirements u Packet header approaches also have the advantage of remaining valid when traffic payloads are encrypted, such as with SSL sessions. Packet header anomaly detection
  6. 6. u Data preprocessing to extract packet headers is traightforward. u Many software programs and libraries already exist to process network traffic, e.g. libpcap, tcpdump, tshark, tcptrace, Softflowd, NetFlow, and IPFIX implementations. u The complex part of the data preprocessing is using appropriate feature construction to derive more discriminative features (e.g. time-based statistical measures) from this basic traffic information.
  7. 7. u Only three papers use the basic features extracted directly from individual packet headers without further feature construction. u PHAD u to detect attacks against the TCP/IP stack, IDS evasion techniques, imperfect attack code, and anomalous traffic from victim machines u learns normal ranges for each packet header field at the data link (Ethernet), Network (IP), and Transport/control (TCP, UDP, ICMP) layers u The result is 33 packet header fields used as basic features. The possible numeric range of each packet header field is very large, so to reduce this space, clustering is used. u a univariate approach which cannot model dependencies between features. Packet header basic features
  8. 8. u SPADE : one of the first attempts to use an anomaly method for portscan detection u the basic features are instead used to build a normal traffic distribution model for the monitored network. u Traffic distributions are maintained in real time by tracking joint probability measurements, e.g. P (source address, destination address, destination port), or using a Bayes Network. u During detection, packets are compared to the probability distribution to calculate an anomaly score. u By retaining these unusual packets, it is possible to look for portscans over u a much wider time window. Packet header basic features
  9. 9. u Attacks against wireless networks have also been detected using packet headers, in this case from the MAC layer frame header. u The approach requires tapping the local wireless network. u Guennoun et al. (2008) perform preprocessing to extract all the frame headers, convert any continuous features to categorical ones, and derive new features u A wrapper approach is then used to find the best set of features. It uses a forward search algorithm which starts with the single most relevant feature, tests it with a k-means classifier, and then iteratively adds the next most relevant feature to the set. It was found that the top eight ranked features produced a classifier with the best accuracy. Packet header basic features
  10. 10. Packet header basic features
  11. 11. u use complete network flows as data instances rather than individual packet data. u Analyzing flows provides more context than analyzing individual packets standalone. u Flows are unidirectional sequences of packets sharing a common key such as the same source address and port, and destination address and port. u complete after a timeout period, or for TCP with end of session flags (e.g. FIN or RST). u A convenient way of obtaining flow information is to use NetFlow records. Single connection derived features
  12. 12. u Having a router generate NetFlow data saves the NIDS from doing its own data preprocessing tasks such as parsing of IP headers, maintaining packet counts, and stream (flow) reassembly. u Alternatively, NetFlow records can be produced on a computer host using software such as softflowd NetFlow records also significantly reduce the storage requirements compared to full packet capture. u NetFlow information is only based on packet headers, so the transport payload is ignored. SCD features
  13. 13. u The most common and important SCD features: timebased statistical measures by monitoring basic features over the duration of the flow. u Examples u counts of packets and bytes in the flow (as per NetFlow records), u the average inter-packet arrival time, u the mean packet length. u These features are useful for fingerprinting sessions, detecting unusual data flows, or finding other anomalies within a single session. SCD features
  14. 14. u ANDSOM u Data preprocessing first segments the dataset by service type (TCP or UDP) and the application protocol (HTTP or SMTP). u For each data segment a different model is created. In this case self-organizing maps (SOM) are used. u The calculated SCD features are quad, start time, end time, whether the session had a valid start (2 SYN packets), whether the connection was closed properly (FINs) or improperly (RST), number of queries per second, average size of questions, average size of answers, question answer idle time, answer question idle time, and the duration of the connection. u These features provide a fingerprint for the session. During the detection phase the data instances were compared to the appropriate SOM model to detect anomalies in that service. u Testing successfully found an injected BIND attack and an HTTP tunnel, both of which are detectable within a single flow. SCD features
  15. 15. u Yamada et al. u use SCD features to find attacks against webservers when the traffic is encrypted by SSL or TLS. u only use information from the unencrypted protocol headers for detection. u The features used are : u the HTTP request and response sizes, calculated across each continuous activity of each user. u Since using size features alone would produce many false positives, frequency analysis is also performed to eliminate alerts common to the webserver. u Statistically rare alerts are flagged as anomalies. SCD features
  16. 16. u Anomaly detection using only TCP flags as SCD features u TCP flags are extracted from packets within each TCP session, and each flag combination is quantized as a symbol. u A separate model is produced for each of the observed protocols SSH, HTTP and FTP u During the detection phase, network traffic is evaluated against the appropriate model for anomaly detection. u The approach was found to detect scans initiated by nmap, and SSH and HTTP misuse. u While this approach detects attacks which modify TCP characteristics, it is not likely to detect payload-based attacks. SCD features
  17. 17. u SCD features have been used to detect connections which pass through multiple stepping stones (Yang and Huang, 2007). u SCD features are also used by Early and Brodley (2006). Their aim is to automatically detect which application protocol (e.g. SSH, telnet, SMTP, or HTTP) is being used without using the destination port as a guide. SCD features
  18. 18. u Are useful for finding anomalous behavior within a single session, such as an unexpected protocol, unusual data sizes, unusual packet timing, or unusual TCP flag sequences. u Particular detection capabilities include backdoors, HTTP tunnels, stepping stones, BIND attacks, and command and control channels. u However, by themselves they cannot be used to find activity spanning multiple flows such as DoS attacks or network probes. For that, MCD features are required. SCD features
  19. 19. u Are constructed by monitoring base features over multiple flows or connections u They enable detection of anomalies which manifest themselves as unusual patterns of traffic, such as network probes and DoS attacks. u Domain knowledge is used to choose a window of data to consider. u The time windows range from 5 s to 24 h, with shorter time windows detecting bursty attacks, and long time windows more likely to detect slow and stealthy attacks. u Connection based windows are also used, such as nalyzing the most recent 100 connections Multiple connection derived features
  20. 20. u Domain knowledge is used to choose a window of data to consider. u The time windows range from 5 s to 24 h, with shorter time windows detecting bursty attacks, and long time windows more likely to detect slow and stealthy attacks. u Connection based windows are also used, such as nalyzing the most recent 100 connections. MCD features
  21. 21. u it has known limitations u Advantages u being publicly available, labeled, and preprocessed ready for machine learning. u Each network connection was processed into a labeled vector of 41 features constructed using data mining techniques and expert domain knowledge when creating a machine learning misuse-based NIDS KDD cup 99
  22. 22. u 9 basic and SCD header features for each connection (similar to NetFlow) u 9 time-based MCD header features constructed over a 2 s window u 10 host-based MCD header features constructed over a 100 connection window to detect slow probes. u 13 content-based features were constructed from the traffic payloads using domain knowledge. Data mining algorithms could not be used since the payloads were unprocessed and therefore unstructured. They were designed to specifically detect U2R and R2L attacks. KDD 99 data preprocessing produced
  23. 23. MCD features
  24. 24. u Many remote attacks on computers place the exploit code inside the payload of network packets. Hence these attacks are not directly detectable by packet header approaches u Payload attacks are more computationally expensive to detect due to requiring deeper searches into network sessions. Content anomaly detection
  25. 25. u SANS Top Cyber Security Risks” 2009 report lists the top two cyber risks as client side software which remains unpatched, and vulnerable Internet-facing websites. u The first risk can be exploited using malicious content destined for a client, while the second can be exploited using crafted content in requests to servers. u In these cases, bytes containing the exploit code are contained within network packet payloads beyond the TCP/IP headers, such as within downloaded files. Content anomaly detection
  26. 26. u PAYL u uses 1-g and unsupervised learning to build a byte-frequency distribution model of u network traffic payloads. u A 1-g is simply a single byte with value in the range 0e255. The result of preprocessing a packet payload this way is a feature vector containing the relative frequency count of each of the 256 possible 1-g (bytes) in the payload. u The model also includes the average frequency, as well as the variance and standard deviation as other features. u Separate models of normal traffic are created for each combination of destination port and length of the flow. N-gram analysis of requests to servers
  27. 27. u PAYL was designed to detect zero-day worms, since flows with worm payloads can produce an unusual byte-frequency distribution. u Testing was performed on all attacks in the DARPA 1999 dataset using individual packets as data units (connection data units were also attempted). u The overall detection rate was close to 60% at a false positive rate less then 1%. u The authors point to a large non-overlap between PAYL and PHAD, with one modeling header data and the other modeling payloads. The two approaches could complement each other.
  28. 28. u ANAGRAM also builds on PAYL, but uses a mixture of high- order N-grams with N > 1. u This reduces its susceptibility to mimicry attacks since higher order N-grams are harder to emulate in padded bytes. u By contrast, PAYL can be easily evaded if normal byte frequencies are known to an attacker since malicious payloads can be padded with bytes to match it. u ANAGRAM uses supervised learning to model normal traffic by storing N-grams of normal packets into one bloom filter. N-gram analysis of requests to servers
  29. 29. u Similarly, McPAD creates 2v-grams and uses a sliding window to cover all sets of 2 bytes, n positions apart in network traffic payloads. u Since each byte can have values in the range 0 to 255, and n = 2, the feature space is 256^2 = 65,536. By varying v , different feature spaces are constructed, each handled by a different classifier. u The dimensionality of the feature space is then reduced using a clustering algorithm. u Multiple one-class SVMs are used for classification, and a meta- classifier combines these outputs into a final classification prediction. The results of testing McPAD showed it could detect shellcode attacks in HTTP requests. N-gram analysis of requests to servers
  30. 30. u Organizations may require additional monitoring of critical applications. u One method is to create an application-specific anomaly detector, such as for web applications. u anomaly-based SQL injection detector : host based and relied on the interception of SQL statements between the web application and the database. Analysis of requests to web applications
  31. 31. u Common network architectures ensure client hosts (workstations) within an organization are not directly exposed to the Internet at the network layer. This protects the client hosts from external threats such as probes, DoS, network worms and other attacks against open ports (services). u However, many other threats are faced by these clients, particularly when they are exposed to untrusted code or data. Analysis of web content to clients
  32. 32. u This review has identified the various feature sets used by anomaly-based NIDS. u When designing a NIDS, the choice of network traffic features is largely driven by the detection requirements. u If broad anomaly detection is desired, then separate anomaly detectors should be built for each of the feature sets. u For more targetted anomaly detection, a single feature set can be used. Conclusion and Feature set recommendation
  33. 33. u Packet header features have the advantages of u being fast, with relatively low computation and memory overheads, and avoid some of the privacy and legal concerns regarding network data analysis. u Basic features can be used to u flag single packets which are anomalous with respect to a normal training model (e.g. PHAD), u or as a filtering mechanism so only unusual packets are fed to downstream algorithms (e.g. SPADE). u Individual packets cannot be used to identify unusual trends or patterns over time. Conclusion and Feature set recommendation
  34. 34. u To identify anomalous patterns across multiple packets,but within a single connection, SCD header features are used. u e.g. if all connections to port 80 on the local network are expected to be HTTP traffic, but the timing of packets within a monitored port 80 connection does not match an HTTP profile, then an anomaly can be raised. Conclusion and Feature set recommendation
  35. 35. u MCD features are generally derived over a time window of connections. u Most MCD features are volume-based, such as the count of connections to a particular destination IP address and port in a given time window. u MCD features can be easily used to detect unusual traffic volumes associated with DoS attacks or scanning behavior, but at the cost of overlooking individual anomalous packets (since these will not meet the volume- based threshold). Conclusion and Feature set recommendation
  36. 36. u While packet header feature limitations : u packet header approaches cannot be used to directly detect attacks aimed at applications, since the attack bytes are embedded in the packet body. u many of today’s exploits are directed at applications rather than network services. u Eg : buffer overflow attacks against web servers, web application exploits, and attacks targetting web clients such as drive-by-downloads. Conclusion and Feature set recommendation
  37. 37. u NIDS must use payload-based features extracted from packet bodies to detect these types of attacks, since the packet headers can remain completely normal. u Payload analysis is more computationally expensive than header analysis. This is due to requiring deeper packet inspection, dealing with a variety of payload types (HTML, XML, pdf, jpg, etc.), transfer encoding (gzip, Base64), and obfuscation techniques. u The advantage of payload analysis is having access to all bytes transferred between network devices. u This allows a rich set of payload-based features to be constructed for anomaly detection. Conclusion and Feature set recommendation
  38. 38. u Due to the complexity of payload analysis, many techniques focus on small subsets of the payload, e.g. the HTTP request, or only the JavaScript sections of downloaded web content. u The anomaly-based techniques do not try to match signatures of known malware, however they can apply heuristics such as pattern matching for the presence of shellcode, or highlighting suspiciously long strings which may indicate a buffer overflow attempt. u The reviewed payload based approaches derive features from either the payload of a single connection or a user application session, and compare the features to a normal model. u In effect these are SCD payload-based features. Extending this approach to multiple connections to produce MCD payload-based features could allow different types of anomalies to stand out, e.g. detecting an unusually large number of HTTP redirects in a network could indicate a widespread infection attempt. Conclusion and Feature set recommendation
  39. 39. List of features

×