SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
ISSN: 2278 – 1323
                                                                 International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                     Volume 1, Issue 4, June 2012




                            A FAST POSITIVE APPROACH OF P-DPL IN THE PACKET INSPECTION
                                                1
                                                 N.Kannaiya Raja, 2K.Arulanandam, 3M.Balaji


                                                                           (system calls, network access and files, and memory
     Abstract-The signature extraction process is based on a modifications) [1]–[3]. In static detection method is based on
comparison with a common function repository. By eliminatin information explicitly extracted or implicitly from the
functions appearing in the common function repository from the executable source code. The main processing of static detection
signature candidate list, P-DPL can minimize the risk of false-positive method is in providing rapid categorized. Since antivirus
detection errors. To minimize false-positive rates for P-DPL proposes vendors are handling every day an overcome amount of suspect
intelligent candidate selection using entropy score to generate files for inspection [4], fast detection is essential. Static method
signatures. Evaluation of P-DPL was conducted under various analysis solutions are mainly implemented using two methods:
conditions. The findings suggest that the proposed method can be used signature-based and heuristic-based. Signature based methods
for automatically generating signatures that are both specific, sensitive.
In this paper we propose a new automatic mechanism, termed P-DPL
                                                                           trust on the finding the unique strings in the source code [4].
for extracting signatures from malware files and unwanted mapping The algorithmic methods are based on procedure, which are
files. Signatures generated by P-DPL are comprised of multiple byte- either determined by expert staff or by machine that specify a
strings, which can be used by high-speed, network-based, malware malicious [5], [6]. As a case in point, Zhang et al. [7] in the
filtering devices. In order to minimize the risk of false positives (i.e., random forest data-mining algorithm to detect misuse and
detection of a malware signature in benign executable files), P-DPL abnormal network intrusions. The time period of time from the
employs a method for sanitizing executable file from chunks of code release           of an       unknown      malware until security
that originate from the underlying standard development platforms and software/hardware vendors update their client with the proper
replicated in various instances of begins and malicious programs malware signature is extremely critical. At this time, the
developed by these platforms. In this method we have developed a
new innovative form to find malicious data in the packet. We believe
                                                                           malware is undetectable by most signature-based solutions and
that P-DPL Another direction we intend to examine is the use of a is usually termed a zero attack. This malware can easily spread
malware function library (MFL) in the signature generation process in and corrupt all machines, it is extremely essential to detect it as
order to further strengthen the signatures and minimize the risk of false soon as possible. So that signature-based solutions generate a
positives. In addition, regular expressions defined by two or more suitable signature for block all threats. Defend organizations
distinct signatures can be used in order to further minimize the risk of by prevent all type of malware. Carry through deep packet
false positives.                                                           check all signatures for detecting and removing attacks such as
                                                                           malware spreads worms, denial-of-service, or distant
Key words—Packet-Deployment payload (P-DPL), Automatic                     exploitation of vulnerabilities. Monitor network for prevent
signature generation (ASG), malware, malware filtering.
                                                                           performance. Devices analysis the content of the packets. The
                           I.    INTRODUCTION                              process of generating unique signatures for malware filtering
                                                                           devices. Different methods are used for automatic signature
      n communication system are highly hyper sensitized to generation have been proposed in domain. The techniques
    I  various types of attack. A parliamentary way of processing focusing on malware, worm, and where the signature is
these attacks is by means of malicious software, such as worms, extracted that the after the malware is executed in the course of
viruses, and Trojan horses. When it is spread, it can cause launching the attack. Different methods processed to extract
severe problems to all users, companies, and governments. signatures from full-fledged malware executables that may
Now the development in high-speed Internet connections gives contain a significant portion of code emanating from
a higher level for creating and rapidly spread the new malware. development tools and platforms. In this research we find the
Several techniques for detecting and deleting malware have problems, and evaluate an automatic signature generation
been proposed. They are two types one is static and another one technique for P-DPL.
dynamic. In dynamic detection method is based on information
collected from the operating system at execution of the program

1
  N.Kannaiya Raja, M.E., (P.hd) .,A.P/CSE Dept.
  Arulmigu Meenakshi Amman College of Engg,
  Thiruvannamalai Dt, near Kanchipuram
  Kanniya13@hotmail.co.in
2
   Dr. K.Arulanandam, Prof &
  Head, CSE Department
  Ganadipathy Tulsi’s Jain Engineering College, Vellore
  sakthsivamkva@gmail.com
3
   M. Balaji, M.E.,
  Arulmigu Meenakshi Amman College of
  Engg,Thiruvannamalai Dt, near Kanchipuram.                          Fig 1 P-DPL creation and signature generating processes
  mbalaji23@gmail.com                                                 .
                                                                              P-DPL is created for multiple-string, signatures that
                                                                      can be used in intrusion detection systems for filtering
                                                                      malware. To improve its imprecision, P-DPL process and
ISSN: 2278 – 1323
                                                               International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                   Volume 1, Issue 4, June 2012




complete and structured method, which are extracts the                ordered sets of multiple tokens that match multiple variants of
malware’s unique code from other segments of common and               multiform worms. Honeycomb overlays parts of the flows in
usually benign code, such as library files. When the sensitivity      the traffic and uses a longest common substring (LCS)
process ends, the remaining codes are the malicious code. Since       algorithm to spot similarities in packet payloads. Subsequently
the process is go on by generating a unique signature from the        designed a double-honey pot system and introduced the
malicious code, which can be used for removing the malware.           position-aware distribution signatures (PADS) that are
The main objective of this research is creating signatures from       computed from polymorphic worm samples and are composed
malware, spyware, Trojan horses, worms and viruses. The main          of a byte frequency distribution instead of a fixed value for
hypothesis is that in a superior step, suspected files are            each position in the signature ―string.‖ Tang et al. [11]
classified as benign or malicious by a human expert or by an          use sequence alignment techniques, drawn from bio informatics,
automated detection tool. This processing allows us to focus on       to derive simplified regular aspect exploit-based signatures.
the signature-generation process, but it also in the quality of the   Exploit-based signatures can be generated quickly to detect
signatures on the accuracy of the of mistrustful files. P-DPL         zero-attack exploits of uncovered vulnerabilities. However, low
was used as the automatic signature generation (ASG) module           damage on multiform malware. And also the signatures created
of the eDare (early detection, alert, and response) framework         by the above techniques that are extracted and tested for short,
[8] eDare is aimed at mitigating the spread of both known and         worm, malware, the fact is that the malware, for example
unknown malware in computer networks. eDare operates by               viruses and Trojan horses, can be as large executable files, it
first monitoring network traffic and filtering out known              consist of full-fledged applications. These files usually contain
malware using high-speed filtering devices that are                   a significant portion of different code segments that are spread
continuously updated with signatures generated by P-DPL               by the software development platform spawning the malware.
.Next; unknown files are extracted from the remaining traffic         For this case the large malware files, selecting a signature that
and examined using various machine-learning and temporal              will be both sensitive and specific. Another limitation of these
reasoning methods in order to classify the files as malicious or      techniques is that they focus on detecting malware after it has
benign. P-DPL is implemented in the last step to extract              been unleashed and try to generate a signature from the traffic it
signatures from newly detected malicious files. When eDare            creates at the time attack is being processed. A payload-based
identifies a new threat, P-DPL automatically produces a               signature finding the malware code. In this paper falls into the
signature, and then, the filtering devices that are stationed on      payload-based signature concept. Payload-based signature
the network infrastructure are automatically updated. This            generation methods are presented in [4]. At present a two-step
process is very fast, and also faster than when human                 statistical method for automatically extracting well, ―the best‖
intervention, it is effective against zero attacks. The P-DPL         signatures from the code of a malware. First of all programs
technique and a set of research that were performed on a              on detached machines are intentionally affected with the virus.
collection of malicious and benign executable. We were work           The affected portion of the program are analyze with one
is in finding the length and selection of a signature among           another to found that regions of the virus are constant from one
several candidates.                                                   instance to another. These regions are considered as signature
                                                                      candidates. The second phase estimates the probability that
                       II.    Related Works                           each of all candidate signatures will match a randomly chosen.
                                                                      The candidate with the lowest estimated false-positive is
          Since the signature must be general enough to capture       selected as a signature. The Hancock system [4] was proposed
as instances of the malware, Thus far sufficiently specific to        for automatically extracting signatures for antivirus software.
avoid over lapping with the content of normal traffic in order to     Based on several heuristics, the Hancock system generates a set
minimize false positives. The malware signatures can be               of signature candidates, selecting the candidates that are not
classified as vulnerability-based, exploit-based and payload-         likely to be found in benign code. Our approach, Hancock
based [9]. A vulnerability-based signature describes the              relies on modeling benign code in order to minimize false-
properties of a certain bug in the system that can be maliciously     alarm risks. The Auto-Sign signature generator modeled both
exploited by the malware. Vulnerability-based signatures do           benign and malicious code using byte 3-grams representation in
not process to detect each every malicious code exploiting the        order to select good signature candidates. Next, the signature
vulnerability; it is very effective when dealing with multiform       candidates are ranked according to three different measures in
malware. Even though, a vulnerability-based signature can be          order to select the best signature. Although, the Hancock
generated only when the vulnerability is find. An exploit-based       system and Auto-Sign differ from our approach, which is
signature describes a sequence of commands triggered by the           semantic aware in the sense that it does not rely on arbitrary
malware, which process exploits vulnerability in the system.          byte code sequences, but the code representing internal
Exploit-based methods include Autograph, P-DPL sensor Net             functions of the software. In addition, the methods presented in
spy .which focus on analyzing similarities in packet payloads         [4] and focus on generating signatures for antivirus software,
belonging to network. These systems first identify abnormal           the limitation of signature length is not necessarily considered.
traffic originating from distrustful IP addresses, and then,          Other solutions have been proposed for protecting systems and
generate a signature by identifying most frequently occurring         preventing an attack beforehand rather than detecting the attack
byte sequences. The Nemean architecture first clusters similar        after it has been launched. This can be done by generating
sessions, and uses machine-learning techniques to generate            signatures based on sequences of instructions that represent
semantic-aware signatures for each cluster. Polygraph expands         malicious or benign behavior. These sequences can be extracted
the notion of single substring signatures to be joined, and to        either by statically analyzing the program after disassembly or
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


by monitoring the program during execution. For example,             resilience of signatures to polymorphic malware variants.
protecting a system from buffer overflow attacks can be              Another common method for detecting polymorphic malware is
achieved by: 1) creating signatures for legitimate instruction       to incorporate semantics awareness into signatures. For
blocks and matching instruction sequences of monitored               example, Christodorescu et al. proposed static semantics-aware
programs with the signature repository 2) using obfuscation of       malware detection. They applied a matching algorithm on the
pointers in such a way that a malicious application that tries to    disassembled binaries to find the instruction sequences that
exploit a buffer overflow vulnerability will not be able to          match the manually generated templates of malicious
create valid pointers or 3) by applying array and pointer            behaviors, e.g., decryption loop. A framework for automatic
boundary checking. As opposed to such methods, our goal in           generation of intrusion signatures from honey net packet traces.
this research is to generate signatures for high-speed traffic       Nemean applied clustering techniques on connections and
filtering devices that do not rely on installation or modification   sessions to create protocol-semantic-aware signatures, thereby
of end points and that will protect the end points at the network    reducing the possibility of false alarms.
level. In summation, each of the aforementioned techniques                     Another loosely related area is the automatic
suffers from at least one critical limitation. Some rely on small    generation of attack signatures, vulnerability signatures and
and coherent malware files, but such files may not constitute        software patches. TaintCheck [12] and Vigilante [13] applied
the general case. Other techniques rely on observing malware         taint analysis to track the propagation of network inputs to data
behavior, but such malware cannot always be fully monitored.         used in attacks, e.g., jump addresses, format strings and system
Other methods search for packet similar, not assure true low         call arguments, which are used to create signatures for the
false positive. Our method disregards the malware size               attacks. Other heuristic-based roaches [14] have also been
assumption. In addition, it does not require activating the          proposed to exploit properties of specific exploits (e.g., buffer
malware.                                                             overflow) and create attack signatures. Generalizing from these
          Modern anti-virus software typically employ a variety      approaches, Brumley et al. proposed a systematic method that
of methods to detect malware programs, such as signature-            used a formal model to reason about vulnerability signatures
based scanning, heuristic-based detection, and behavioral            and quantify the signature qualities. An alternative approach to
detection [10]. Although less proactive, signature-based             preventing malware from exploiting vunerabilities is to apply
malware scanning is still the most prevalent approach to             data patches in the firewalls to filter malicious traffic. To
identify malware because of its efficiency and low false             automatically generate data patches. Which leveraged the
positive rate. Traditionally, the malware signatures are created     knowledge of data format of malicious attacks to generate
manually, which is both slow and error-prone. As a result,           potential attack instances and then created signatures from the
efficient generation of malware signatures has become a major        instances that successfully exploit the vulnerabilities?
challenge for anti-virus companies to handle the exponential                   Hancock differs from previous work by focusing on
growth of unique malware files. To solve this problem, several       automatically generating high-coverage string signatures with
automatic signature generation approaches have been proposed.        extremely low false positives. Our research was based loosely
          Most previous work focused on creating signatures          on the virus signature extraction, which was commercially used
that are used by Network Intrusion Detection Systems (NIDS)          by IBM. They used a 5-gram Markov chain model of good
to detect network worms. Singh et al. Proposed EarlyBird [11],       software to estimate the probability that a given byte sequence
which used packet content prevalence and address dispersion to       would show up in good software. They tested hand-generated
automatically generate worm signatures from the invariant            signatures and found that it was quite easy to set a model
portions of worm payloads. Autograph exploited a similar idea        probability threshold with a zero false positive rate and a
to create worm signatures by dividing each suspicious network        modest false negative rate (the fraction of rejected signatures
flow into blocks terminated by some breakmark and then               that would not be found in goodware) of 48%. They also
analyzing the prevalence of each content block. The suspicious       generated signatures from assembly code (as Hancock does),
flows are selected by a port-scanning flow classifier to reduce      rather than data, and identified candidate signatures by running
false positives. Kreibich and Crowcroft developed Honeycomb,         the malware in a test environment. Hancock does not do this, as
a system that uses honeypots to gather inherently suspicious         dynamic analysis is very slow in large-scale applications.
traffic and generates signatured by applying the longest             Symantec acquired this technology from IBM in the mid-90s
common sub string (LCS) algorithm to search for similarities in      and found that it led to many false positives. The Symantec
the packet payloads. One potential drawback of signatures            engineers believed that it worked well for IBM because IBM’s
generated from previous approaches is that they are all              anti-virus technology was used mainly in corporate
continuous strings and may fail to match polymorphic worm            environments, making it much easier for IBM to collect a
payloads. Polygraph instead searched for invariant content in        representative set of goodware. By contrast, signatures
the network flows and created signatures consisting of multiple      generated by Hancock are mainly for home users, who have a
disjoint content sub strings. Polygraph also utilized a naive        much broader set of goodware. The model’s training set cannot
Bayes classifier to allow the probabilistic matching and             possibly contain, or even represent, all of this goodware. This
classification, and thus provided better proactive detection         poses a significant challenge for Hancock in avoiding FP-prone
capabilities. A system that used a model-based algorithm to          signatures.
analyze the invariant contents of polymorphic worms and
analytically prove the attack-resilience of generated signatures.               III.     Payload Based Anomaly Detection
PDAS (Position-Aware Distribution Signatures) took advantage
of a statistical anomaly-based approach to improve the                   A. Overview of the P-DPL Sensor:
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


           The P-DPL sensor is based on the principle that zero-     To compare the similarity between test data at detection time
day attacks are delivered in packets whose data is unusual and       and the trained models computed during the training period, P-
distinct from all prior ―normal content‖ flowing to or from the      DPL uses simplified Mahalanobis distance [18]. Mahalanobis
victim site. We assume that the packet content is available to       distance. To compare the similarity between test data at time
the sensor for modeling3. We compute a normal profile of a           and the trained models computed during the training period, P-
site’s unique content flow, and use this information to detect       DPL uses simplified Mahalanobis distance [18]. Mahalanobis
anomalous data. A ―profile‖ is a model or a set of models that       distance weights each variable, the mean frequency of a 1-
represent the set of data seen during training. Since we are         gram, by its standard deviation and covariance. The distance
profiling content data flows, the method must be general to          values produced by the models are then subjected to a threshold
work across all sites and all services, and it must be efficient     test. If the distance of a test datum is greater than the threshold,
and accurate.Our initial design of P-DPL uses a ―language            P-DPL issues an alert for the packet. There is a distinct
independent‖ methodology, the statistical distribution of n-         threshold setting for each centroid computed automatically by
grams [15] extracted from network packet datagrams. This             P-DPL during a calibration step.
methodology requires no parsing, no interpretation and no
emulation of the content.
           An n-gram is the sequence of n adjacent byte values in
a packet payload. A sliding window with width n is passed over
the whole payload one byte at a time and the frequency of each
n-gram is computed. This frequency count distribution
represents a statistical centroid or model of the content flow.
The normalized average frequency and the variance of each
gram are computed. The first implementation of P-DPL uses
the byte value distribution when n=1. The statistical means and
variances of the 1- grams are stored in two 256-element
vectors. However, we condition a distinct model on the port (or
service) and on packet length, producing a set of statistical
centroids that in total provides a fine-grained, compact and
effective model of a site’s actual content flow. Full details of
this method and its effectiveness are described in [18].
           The first packet of CRII illustrates the 1-gram data
representation implemented in P-DPL. Figure 1 shows a portion        Fig. 3. CRII payload distribution (top plot) and its rank
of the CRII packet, and its computed byte value distribution         order distribution (bottom plot)
along with the rank ordered distribution is displayed in Figure
2, from which we extract a Z-string. The Z-string is a the string
                                                                     To calibrate the sensor, a sample of test data is measured
of distinct bytes whose frequency in the data is ordered from
                                                                     against the centroids and an initial threshold setting is chosen.
most frequent to least, serving as representative of the entire
                                                                     A subsequent round of testing of new data updates the
distribution, ignoring those byte values that do not appear in the
                                                                     threshold settings to calibrate the sensor to the operating
data. The rank ordered distribution appears similar to the Zipf
                                                                     environment. Once this step converges, P-DPL is ready to enter
distribution, and hence the name Z-string. The Z-string
                                                                     detection mode. Although the very initial results of testing P-
representation provides a privacy-preserving summary of
                                                                     DPL looked quite promising, we devised several improvements
payload that may be exchanged between domains without
                                                                     to the modeling technique to reduce the percentage of false
revealing the true content. Z-strings are not used for detection,
                                                                     positives.
but rather for message exchange and cross domain correlation
of alerts.
                                                                          B. New P-DPL Features Multiple Centroids
                                                                               P-DPL is a fully automatic, ―hands-free‖ online
          GET./default.ida?XXXXXXXXXXX x                             anomaly detection sensor. It trains models and determines when
          XXXXXXXXXXXXXXXXXXXXXX                                     they are stable; it is self-calibrating, automatically observes
          XXXXXXXXXXXXXXXXXXXXXX                                     itself, and updates its models as warranted. The most important
          XXXXXXXXXXXXXXXXXXXXXX                                     new feature implemented in P-DPL over our prior work is the
          XXXXXXXXXXXXXXXXXXXXXX                                     use of multiple centroids, and ingress/egress correlation. In the
          XXXXXXXXXXXXXXXXXXXXXX                                     first implementation, P-DPL computes one centroid per length
                                                                     bin, followed by a stage of clustering similar centroids across
          XXXXXXXXXXXXXXXXXXXXXX                                     neighboring bins. We previously computed a model Mij for
          XXXXXX%u9090%u6858%ucbd3%u7                                each specific observed packet payload length i of each port j. In
          801%u9090%u6858%ucbd3%u7801%u                              this newer version, we compute a set of models Mkij , k≥1.
          9090%u6858%ucbd3%u7801%u9090%                              Hence, within each length bin, multiple models are computed
          u9090%u8190%u00c3%u0003%u8b000                             prior to a final clustering stage. The clustering is now executed
          %u531b%u53ff%u0078%u0000%u0 u0                             across centroids within a length bin, and then memory
                                                                     requirements for models while representing normal content
       Fig. 2. A portion of the first packet of CodeRed II           flow more accurately and revealing anomalous data with
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


greater clarity. Since there might be different types of payload     the byte distributions among the sites for the same length bin.
sent to the same service, e.g., pure text, .pdf, or .jpg, we used    This is confirmed by the values of Manhattan distances
an incremental online clustering algorithm to create multiple        computed between the distributions, with results displayed in
centroids to model the traffic with finer granularity. This          Table 1.
modeling idea can be extended to include centroids for different               The content traffic among the sites is quite different.
media that may be transmitted in packet flows. Different file        For example, the EX dataset is more complex containing file
and media types follow their own characteristic 1-gram               uploads of different media types (pdf, jpg, ppt, etc. ) and
distribution; including models for standard file types can help      webmail traffic; the W dataset contain less of this type of traffic
reduce false positives. The multi-centroid strategy requires a       while W1 is the simplest, containing almost no file uploads.
different test methodology. During testing,an alert will be          Hence, each of the site-specific payload models is diverse,
generated by P- DPL if a test packet matches none of the             increasing the likelihood that a worm payload will be detected
centroids within its length bin. The multicentroid technique         by at least one of these sites. To avoid detection, the worm
produces more accurate payload models and separates the              exploit would have to be padded in such a way that its content
anomalous payloads in a more precise manner.                         description would appear to be normal concurrently for all of
                                                                     these sites.
    C. Data Diversity across Sites

          A crucial issue we study is whether or not payload
models are truly distinct across multiple sites. This is an
important question in a collaborative security context. We have
claimed that the monoculture problem applies not only to
common services and applications, but also to security
technologies. Hence, if a site is blind to a zero-day attack this
implies that many other sites are blind to the same attack.
Researchers are considering solutions to the monoculture
problem      by     various    techniques     that
             ―diversify‖ implementations. We conjecture that
the content data flow among different sites is already diverse
even when running the exact same services. In our previous
work we have shown that byte distributions differ for each port
and length. We also conjecture that it should be different for
each host. For example, each web server contains different
                                                                                              ASCII characters 0-255
URLs, implements different functionality like web email or
media uploads, and the population of service requests and            Fig. 4. Example byte distribution for Payload length 249 of
responses sent to and from each site may differ, producing a         port 80 for the three sites EX, W, W1, in order from top to three
diverse set of content Profiles across all collaborating hosts       sites EX, W, W1 bottom
and sites. Hence, each host or site’s profile will be
substantially different from all others. A
zero-day attack that may appear as normal data at one site, will                                        EX-data              Length 1380
likely not appear as normal data at other sites since the normal
profiles are different. We test whether or not this conjecture is
true by several experiments.
          One of the most difficult aspects of doing research in
this area is the lack of real- world datasets available to
researchers that have full packet content for formal scientific
study4. Privacy policies typically prevent sites from sharing
their content data. However, we were able to use data from
three sources, and show the distribution or each. The first one is
an external commercial organization that wishes to remain
anonymous, which we call EX. The others are the two web                                                  W1-data             Length 1380

servers     of    the    CS     Department       of     Columbia,
www.cs.columbia.edu and www1.cs.columbia.edu.
We call these two data sets W and W1, respectively. The
following plots show the profiles of the traffic content flow of
each site. The plots display the payload distributions for                                     ASCII characters 0-255
different packet payload lengths i.e. 249 bytes and 1380 bytes,
spanning the whole range of possible payload lengths in order
to give a general view of the diversity of the data coming from      Fig. 5. Example byte distribution for payload length of 1380 of
the three sites. Each byte distribution corresponds to the first     port 80 for the
centroid that is built for the respective payload lengths. We
observe from the above plots that there is a visible difference in   Table 1. The Manhattan distance between the byte distributions
ISSN: 2278 – 1323
                                                             International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                 Volume 1, Issue 4, June 2012


of the profiles computed for the three sites, for three length    first created a clean set of packets free of any known worms
bins.                                                             still flowing on the Internet as background radiation. We then
                                                                  inserted the same set of worm traffic into the cleaned test set
                                                                  using tcpslice. Thus, we created ground truth in order to
                     249 bytes      940 bytes     1380 bytes      compute the accuracy and false positive rates.
                                                                            The worm set includes CodeRed, CodeRed II,
   MD(EX, W)          0.4841          0.6723         0.2533       WebDAV, and a worm that exploits the IIS Windows media
   MD(EX,W1)          0.3710          0.8120         0.4962       service, the nsiislog.dll buffer overflow vulnerability (MS03-
                                                                  022). These worm samples were collected from real traffic as
   MD(W,W1)           0.3689          0.5972         0.6116       they appeared in the wild, from both our own dataset and from
                                                                  a third-party. Because P-DPL only considers the packet
                                                                  payload, the worm set is inserted at random places in the test
Mimicry attacks are possible if the attacker has access to the data. The ROC plots in Figure 5 show the result of the detection
same information as the victim. In the case of application rate versus false positive rate over varying threshold settings of
payloads, attackers (including worms) would not know the the P-DPL sensor.
distribution of the normal flow to their intended victim. The
attacker would need to sniff each site for a long period of time
and analyze the traffic in the same fashion as the detector
described herein, and would also then need to figure out how to                                  All worms reliably detected
pad their poison payload to mimic the normal model. This is a
daunting task for the attacker who would have to be clever
indeed to guess the exact distribution as well as the threshold
logic to deliver attack data that would go unnoticed.
Additionally, any attempt to do this via probing, crawling or
other means is very likely to be detected.
          Besides mimicry attack, clever worm writers may
figure a way to launch 'training attacks’ against anomaly
detectors such as P-DPL. In this case, the worm may send a
stream of content with increasing diversity to its next victim
site in order to train the content sensor to produce models
where its exploit no longer would appear anomalous. This as
well is a daunting task for the worm. The worm would be
fortunate indeed to launch its training attack when the sensor is                           False Positive Rate(%)
in training mode and that a stream of diverse data would go Fig. 6 ROC of P-DPL detecting incoming worms, false positive
unnoticed while the sensor is in detection mode. Furthermore, rate restricted to less than 0.5%
the worm would have to be extremely lucky that each of the
content examples it sends to train the sensor would produce a               The detection rate and false positive are both based on
"non-error" response from the intended victim. Indeed, P-DPL the number of packets. The test set contains 40 worm packets
ignores content that does not produce a normal service although there are only 4 actual worms in our zoo. The plots
response. These two evasion techniques, mimicry and training show the results for each data set, where each graphed line is
attack, is part of our ongoing research on anomaly detection, the detection rate of the sensor where all 4 worms were
and a formal treatment of the range of "counter-evasion" detected. (This means more than half of each the worm’s
strategies we are developing is beyond the scope of this paper.   packets were detected as anomalous content.) From the plot we
                                                                  can see that although the three sites are quite different in
     D. Worm Detection Evaluation                                 payload distribution, P-DPL can successfully detect all the
                                                                  worms at a very low false positive rate. To provide a concrete
     In this section, we provide experimental evidence of the example we measured the average false alerts per hour for these
effectiveness of P-DPL to detect incoming worms. In our three sites. For 0.1% false positive rate, the EX dataset has 5.8
previous RAID paper [18], we showed P-DPL’s accuracy for alerts per hour, W1 has 6 alerts per hour and W has 8 alerts per
the DARPA99 dataset, which contains a lot of artifacts that hour.
make the data too regular [16]. Here we report how P-DPL                    We manually checked the packets that were deemed
performs over the three real-world datasets using known worms     false positives. Indeed, most of these are actually quite
available for our research. Since all three datasets were anomalous containing very odd abnormal payload. For
captured from real traffic, there is no ground truth, and example, in the EX dataset, there are weird file uploads, in one
measuring accuracy was not immediately possible. We thus case a whole packet containing nothing but a repetition of a
needed to create test sets with ground truth, and we applied character with byte value E7 as part of a word file. Other
Snort for this purpose.                                           packets included unusual HTTP Get requests, with the referrer
          Each dataset was split into two distinct chrono field padded with many ―Y‖ characters via product providing
logically-ordered portions, one for training and the other for anonymization.
testing, following the 80%-20% rule. For each test dataset, we              We note that some worms might fragment their
ISSN: 2278 – 1323
                                                               International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                   Volume 1, Issue 4, June 2012


content into a series of tiny packets to evade detection. For this   comparison is performed against the packet contents and a
problem, P-DPL buffers and concatenates very small packets of        string similarity score is computed. If the score is higher than
a session prior to testing.                                          some threshold, we treat this as possible worm propagation and
          We also tested the detection rate of the W32.Blaster       block or delay this outgoing traffic. This is different from the
worm (MS03-026) on TCP port 135 port using real RPC traffic          common quarantining or containment approaches which block
inside Columbia’s CS department. Despite being much more             all the traffic to or from some machine. P-DPL will only block
regular compared to HTTP traffic, the worm packets in each           traffic whose content is deemed very suspicious, while all other
case were easily detected with zero false positives. Although at     traffic may precede unabated maintaining critical services.
first blush, 5-8 alerts per hour may seem too high, a key                      There are many possible metrics which can apply to
contribution of this paper is a method to correlate multiple         decide the similarity of two strings. The several approaches we
alerts to extract from the stream of alerts true worm events.        have considered, tested and evaluated include:
                                                                          1. String equality (SE)
       IV.     Worm Propagation Detection and signature                        This is the most intuitive approach. We decide that
                     Generation by Correlation                                 propagation has started only if the egress payload is
                                                                               exactly the same as the ingress suspect packet. This
           In the previous section, we described the results using             metric is very strict and good at reducing false
P-DPL to detect anomalous packet content. We extended the                      positives, but too sensitive to any tiny change in the
detection strategy to model both inbound and outbound traffic                  packet payload. If the worm changes a single byte or
from a protected host, computing models of content flows for                   just changes its packet fragmentation, the anomalous
ingress and egress packets. The strategy thus implies that                     packet correlation will miss the propagation attempt.
within a protected LAN, some infected internal host will begin                 (The same is true when comparing thumbprints of
a propagation sending outbound anomalous packets. When this                    content.)
occurs for any host in the LAN, we wish to inoculate all other            2. Longest common substring (LCS)
hosts by generating and distributing worm packet signatures to                 The next metric we considered is the LCS approach.
other hosts for content filtering.                                             LCS is less exact than SE, but avoids the
           We leverage the fact that self-propagating worms will               fragmentation problem and other small payload
start attacking other machines automatically by replicating                    manipulations. The longer the LCS that is computed
itself, or at least the exploit portion of its content, shortly after a        between two packets, the greater the confidence that
host is infected. (Polymorphic worms may randomly pad their                    the suspect anomalous ingress/egress packets are more
content, but the exploit should remain intact.) Thus if we detect              similar. The main shortcoming of this approach is its
these anomalous egress packets to port i that are very similar to              computation overhead compared to string equality,
those anomalous ingress traffic to port i, there is a high                     although it can also be implemented in linear time.
probability that a worm that exploits the service at port i has           3. Longest common subsequence (LCSeq):
started its propagation. Note that these are the very first packets            This is similar to LCS, but the longest common
of the propagation, unlike the other approaches which have to                  subsequence need not be contiguous. LCSeq has the
wait until the host has already shown substantial amounts of                   advantage of being able to detect polymorphic worms,
unusual scanning and probing behavior. Thus, the worm may                      but it may introduce more false positives. For each
be stopped at its very first propagation attempt from the first                pair of strings that are compared, we compute a
victim even if the worm attempts to be slow and stealthy to                    similarity score, the higher the score, the more similar
avoid detection by probe detectors. We describe the                            the strings are to each other. For SE, the score is 0 or
ingress/egress correlation strategy in the following section. We               1, where 1 means equality. For both LCS and LCSeq,
note, however, that the same strategy can be applied to ingress                we use the percentage of the LCS or LCSeq length out
packets flowing from arbitrary (external) sources to internal                  of the total length of the candidate strings. Let’s say
target IP's. Hence, ingress/ingress anomalous packet correlation               string s1 has length L1, and string s2 has length L2,
may be viewed as a special case of this strategy.                              and their LCS/LCSeq has length C. We compute the
           Careful treatment of port-forwarding protocols and                  similarity score as 2*C/( L1+ L2). This normalizes the
services, such as P2P and NTP (Port 123) is required to apply                  score in the range of [0...1], where 1 means the strings
this correlation strategy, otherwise normal port forwarding may                are exactly equal.
be misinterpreted as worm propagations. Our work in this area
involves two strategies, truncation of packets (focusing on                      Since we may have to check each outgoing packet (to
control data) and modeling of the content of media. This work port i) against possibly many suspect strings inbound to port i,
is beyond the scope of this paper due to space limitations, and we need to concern ourselves with the computational costs and
will be addressed in a future paper.                                    storage required for such a strategy. On a real server machine,
                                                                        e.g., a web server, there are large numbers of incoming requests
     A. Ingress and Egress Traffic Correlation                          but very few, if any, outgoing requests to port 80 from the
                                                                        server (to other servers). So any outgoing request is already
           When P-DPL detects some incoming anomalous quite suspicious, and we should compare each of them against
traffic to port i, it generates an alert and places the packet the suspects. If the host machine is used as both a server and a
content on a buffer list of ―suspects‖. Any outbound traffic to client simultaneously, then both incoming and outgoing
port i that is deemed anomalous is compared to the buffer. The requests may occur frequently. This is mitigated somewhat by
ISSN: 2278 – 1323
                                                            International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                Volume 1, Issue 4, June 2012


the fact that we check only packets deemed anomalous, not                     Different fragmentation for CR and CRII
every possible packet flowing to and from a machine. We apply
the same modeling technique to the outgoing traffic and only
compare the egress traffic we already labeled as anomalous.
                                                                   Incoming                               Outgoing
    B. Automatic Worm Signature Generation                         1448, 1448, 1143                       4, 13, 362, 91, 1460,
                                                                                                          1460, 649
          There is another very important benefit that accrues
from the ingress/egress packet content correlation and string                                             4, 375, 1460, 1460, 740
similarity comparison: automatic worm signature generation.                                               4, 13, 453, 1460, 1460,
The computation of the similarity score produces the matching                                             649
sub-string or subsequence which represents the common part of
the ingress and egress malicious traffic. This common              Code Red II (total 3818 bytes)
subsequence serves as a signature content-filter. Ideally, a       Incoming                               Outgoing
worm signature should match worms and only worms. Since
the traffic being compared is already judged as anomalous, and     1448, 1448, 922                        1460, 1460, 898
has exhibited propagation behavior quite different from normal
behavior – and the similar malicious payload is being sent to                To evaluate the accuracy of worm propagation
the same service at other hosts, these common parts are very       detection, we appended the propagation trace at the very end of
possibly core exploit strings and hence can represent the worm     one full day’s network data from each of the three sites. When
signature. By using LCSeq, we may capture even polymorphic         we collected the trace from our attack network, we not only
worms since the core exploit usually remains the same within       captured the incoming port 80 requests, but also all the
each worm instance even though it may be reordered within the      outgoing traffic directed to port 80. We checked each dataset
packet datagram. Thus, by correlating the ingress and egress       manually, and found there is a small number of outgoing
malicious payload, we are able to detect the very initial worm     packets for the servers that produced the datasets W and W1, as
propagation, and compute its signature immediately. Further, if    we expected, and not a single one for the EX dataset. Hence,
we distribute these strings to collaborating sites, they too can   any egress packets to port 80 would be obviously anomalous
leverage the added benefit of corroborating suspects they may      without having to inspect their content. For this experiment, we
have detected, and they may choose to employ content filters,      captured all suspect incoming anomalous payloads in an
preventing them from being exploited by a new and zero-day         unlimited sized buffer for comparison across all of the available
worm.                                                              data in our test sets. We also purposely lowered P-DPL’s
                                                                   threshold setting (after calibration) in order to generate a very
                       V.     Evaluations                          high number of suspects in order to test the accuracy of the
                                                                   string comparison and packet correlation strategies. In other
          In this section, we evaluate the performance of          words, we increased the noise (increasing the number of false
ingress/egress correlation and the quality of the automatically    positives) in order to determine how well the correlation can
generated signatures. Since none of the machines were attacked     still separate out the important signal in the traffic (the actual
by worms during our data collection time at the three sites, we    worm content).
launched real worms to un-patched Windows 2000 machines in                   The result of this experiment is displayed in the
a controlled environment. For testing purposes, the packet         following table for the different similarity metrics. The number
traces of the worm propagation were merged into the three          in the parenthesis is the threshold used for the similarity score.
sites’ packet flows as if the worm infection actually happened     For an outgoing packet, P-DPL checks the suspect buffer and
at each site. Since P-DPL only uses payload, the source and        returns the highest similarity score. If the score is higher than
target IP addresses of the merged content are irrelevant.          the threshold, we judge there is a worm propagation. False
Without a complete collection of worms, and with limited           alerts suggest that an alert was mistakenly generated for a
capability to attack machines, we only tested CodeRed and          normal outgoing packet. The reason why SE does not work
CodeRed II out of the executable worms we collected. After         here is obvious: worm fragmentation blinds the method from
launching these in our test environment and capturing the          seeing the worm’s entire matching content. The other two
packet flow trace, we noticed interesting behavior: after          metrics worked perfectly, detecting all the worm propagations
infection, these two worms propagate with packets fragmented       with zero false alerts.
differently than the ones that initially infected the host. In
particular,   CodeRed can        separate ―GET.‖            and Results of correlation for different metrics
―/default.ida?‖
and ―NNN...N‖ into different packets to avoid detection by
many signature-based IDSes. The following table shows the                            Detect propagate        False alerts
length sequences of different packet fragmentation for
CodeRed and CodeRed II.                                            SE                        Yes                 No
                                                                   LCS(0.5)                  Yes                 No
                                                                       LCSeq(0.5)               Yes                     No
ISSN: 2278 – 1323
                                                                  International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                      Volume 1, Issue 4, June 2012


To use some other traffic to simulate the outgoing traffic of the      time, cross-site collaboration and exchange of suspect packet
servers. For EX data, we used the outgoing port 80 traffic of          payloads might provide a solution. We discuss this in the next
other clients in that enterprise as if it originated from the EX       section.
server itself. For the W1 and W datasets, we used the outgoing               VI.     Anomalous Payload Collaboration among Sites
port 80 traffic from the CS department. Then we repeated the                     Most current attack detection systems are constrained
previous experiments to detect The worm propagation with the           to a single ingress point within an enterprise without sharing
injected outgoing traffic on each server. The result remains the       any information with other sites. There are ongoing efforts that
same - using the same thresholds as before, we can successfully        share suspicious source IP address [5, 10], but to our
detect all the worm propagations without any false alerts.             knowledge no such effort exists to share content information
                                                                       across sites in real time until now. Here we focus on evaluating
          As we mentioned earlier, the worm signature is a             the detection accuracy of using collaboration among sites,
natural byproduct of the ingress/egress correlation. When we           assuming       a     scaleable,    privacy-preserving      secured
identified a possible worm propagation, the LCS or LCseq can           communication infrastructure is available. (We have
be used as the worm signature. Figure 6 displays the actual            implemented a prototype in Worminator [17].)
content signatures computed for the CR II propagations                           Recall that, in section3.4, we described experiments
detected by P-DPL in a style suitable for deployment in Snort.         measuring the diversity of the models computed at multiple
Note the signature contains some of the ystem calls used to            sites. As we saw, the different sites tested have different normal
infect a host, which is one of the reasons the false positive rate     payload models. This implies from a statistical perspective that
is so low for these detailed signatures.                               they should also have different false positive alerts. Any
                                                                       ―common or highly similar anomalous payloads‖
                                                                       detected among two or more sites logically would be
    |d0|$@|0 ff|5|d0|$@|0|h|d0| @|0|j|1|j|0|U|ff||d0| @| U|f           caused by a common worm exploit targeting many sites. Cross-
    5|d8|$@|0 e8 19 0 0 0 c3 ff|%`0@|0 ff|%d0@|0 ff|%`
                                                                       site or cross- domain sharing may thus reduce the false positive
    ff|%h0@|0 ff|%p0@|0 ff|%t0@|0 ff|%x0@|0 ff|%| ff|
                                                                       problem at each site, and may more accurately identify worm
    0@|fc fc fc fc fc fc fc fc fc fc fc fc fc fc LORER.EXE
                                                                       outbreaks in the earliest stages of an infection.
    fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0|EXP
                                                                                 To test this idea, we used the traffic from the three
    |0 0 0 0 0 0 0 0|SOFTWAREMicrosoftWindows NT
                                                                       sites. There are two goals we seek to achieve in this
    CurrentVersionWinlogon|0 0 0 0 0|SFCDisable|0 0
                                                                       experiment. One is to test whether different sites can help
    9 9 9d ffff ff ff|SYSTEMCurrentControlSetService
                                                                       confirm with each other that a worm is spreading and attacking
    sW3SVCParametersVirtual Roots|0 0 0 00 0 0|/Scr
                                                                       the Internet. The other is to test whether false alerts can be
    ipts|0 0 0 0|/MSADC| 0 00 0|/C|0 0 0|/D|0 00|c:,,21
                                                                       reduced, or even eliminated at each site when content alerts are
    7|0 0 0 0 00 0|d:,,217|fc fc f cfc f cfc fc fc fc fc fc fc
                                                                       correlated.
                                                                                 In this experiment, we used the following simple
Fig. 7.The initial portion of the P-DPL generated signature for        correlation rule: if two alerts from distinct sites are similar, the
CodeRed II.                                                            two alerts are considered true worm attacks; otherwise they are
                                                                       ignored. Each site’s content alerts act as confirmatory evidence
          We replicated the above experiments in order to test if      of a new worm outbreak, even after two such initial alerts are
any normal packet is blocked when we filter the real traffic           generated. This is very strict, aiming for the optimal solution to
against all the worm signatures generated. For our experiments         the worm problem.
we used the datasets from all the three sites, which have had the                This is a key observation. The optimal result we seek
CRII attacks cleaned beforehand, and in all cases no normal            is that for any payload alerts generated from the same worm
packet was blocked.In these experiments, we used an unlimited          launched at two ore more sites, those payloads should be
buffer for the incoming suspect payloads. The buffer size              similar to each other, but not for normal data from either site
essentially stores packets for some period of time that is             that was a false positive. That is to say, if a site generates a
dependent upon the traffic rate, and the number of anomalous           false positive alert about normal traffic it has seen, it will not
packet alerts that are generated from that traffic. That amount is     produce suspect payloads that any other site will deem to be
indeterminate a priori, and is specific to both the environment        worm propagation. Since we conjectured that each site’s
being sniffed and the quality of the models computed by P-DPL          content models are diverse and highly distinct, even the false
for that environment. Since CR and CR II launch their                  positives each site may generate will not match the false
propagations immediately after infecting their victim hosts, a         positives of other sites; only worms (i.e., true positives) will be
buffer holding only the most recent 5 or 10 suspects is enough         commonly matched as anomalous data among multiple sites.
to detect their propagation. But for slow-propagating or stealthy                To make the experiment more convincing, we no
worms which might start propagating after an arbitrarily long          longer test the same worm traffic against each site as in the
hibernation period, the question is how many suspects should           previous section, since the sensor will obviously generate the
we save in the suspect buffer? If the ingress anomalous                exact same payload alert at all the sites. Instead, we use
payloads have been removed from the suspect buffer before              multiple variants of CodeRed and CodeRed II, which were
such a worm starts propagating, P-DPL can no longer detect it          extracted from real traffic. To make the evaluation strict, we
by correlation. Theoretically, the larger the buffer the better,       tested different packet payloads for the same worm, and all the
but there is tradeoff in memory usage and computation time.            variant packet fragments it generates. We purposely lowered
But for those worms that may hibernate for a long period of            the P-DPL threshold to generate many more false positives
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


from each site than it otherwise would produce. As in the case       computed for false P-DPL alerts. The x-axis shows the
described above the cross-site correlation uses the same metrics     similarity score, defined within the range [0...1], and the y-axis
(SE, LCS and LCSeq) to judge whether two payload alerts are          is the number of pairs of alerts within the same score range.
―similar‖. However, another problem that we need to consider         The similarity scores for the worm alerts are shown separately
when we exchange information between sites is privacy. It may        as dots on the x-axis. The worm alerts include those for CR and
be the case that a site is unwilling to allow packet content to be   CR II and their variant fragments. Note that all of the scores
revealed to some external collaborating site. a false positive       calculated between worm alerts are much higher than those of
may reveal true content.                                             the ―false‖ P-DPL alerts and thus they would be correctly
          A packet payload could be presented by its 1-gram          detected as true worms among collaborating sites. The alerts
frequency distribution (see Figure 2). This representation           that scored too low would not have sufficient corroboration to
already aggregates the actual content byte values in a form          deem them as true worms.
making it nearly impossible (but not totally impossible) to
reconstruct the actual payload. (Since byte value distributions
do not contain sequential information, the actual content is hard
                                                                                                                 CRII against CRII
to recover. 2-gram distributions simplify the problem making it                                                  CR against CR
more likely to recover the content since adjacent byte values                                                    Other Alerts
are represented. 3-grams nearly make the problem trivial to
recover the actual content in many cases.) However, we note
that the 1-gram frequency distribution reordered into the rank
ordered frequency distribution produces a distribution that
appears quite similar to the exponential decreasing Zipf-like
distribution. The rank ordering of the resultant distinct byte
values is a string that we call the ―Z-string‖ (as discussed in
Section 3.1).
          One cannot recover the actual content from the Z-
String. Rather, only an aggregated representation of the byte
value frequencies is revealed, without the actual frequency
information. This representation may convey sufficient
                                                                                         Similarity Scores of Zstr Metric
information to correlate suspect payloads, without revealing the
actual payload itself. Hence, false positive content alerts would
not reveal true content, and privacy policies would be
                                                                                                                 CRII against CRII
maintained among sites.                                                                                          CR against CR
          In this cross-domain correlation experiment we                                                         CRII against CR
propose two more metrics which don’t require exchanging raw                                                      Other Alerts
payloads, but instead only the 1-gram distributions, and the
privacy-preserving Z-string representation of the payload:

    A. Manhattan distance (MD)
       Manhattan distance requires exchange of the byte
       distribution of the packet, which has 256 float
       numbers. Two payloads are similar if they have a
       small Manhattan distance. The maximum possible MD
       is 2. So we define the similarity score as (MD)/2, to
       normalize the score range to the same range of the
       other metrics described above.

    B. LCS of Z-string (Zstr)
       While maintaining maximal privacy preservation, we                       Similarity Scores of LCSeq Metric
       perform the LCS on the Z-string of two alerts. The
       similarity score is the same as the one for LCS, but Fig. 8. Similarity scores of Zstr and LCSeq metrics for
       here the score evaluates the similarity of two Z-strings, collaboration
       not the raw payload strings.

Figure 7 presents the results achieved by sharing P-DPL alerts       The above two plots show the similarity scores using Zstr and
among the three sites using CR and CR II and their variant           LCSeq metrics. LCS produced a similar result to LCSeq. String
packet fragments. The results are shown in terms of the              equality and Manhattan distance metrics did not perform well
similarity scores computed by each of the metrics. Each plot is      in distinguishing true alerts from false ones, so their plots are
composed of two different representations: one for false alerts      not shown here. The other two metrics presented in Figure 5
(histogram) and the other for worm alerts (dots on the x-axis).      give particularly good results. The worms and their variant
The bars in the plots are histograms for the similarity scores       packet fragments have much higher similarity scores than all
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


the other alerts generated at each distinct site. This provides      only consider one single host, a stealthy worm can hibernate for
some evidence that this approach may work very well in               a long period of time until a record of its appearance as an
practice and provide reliable information that a new zero-day        anomaly is no longer stored in the buffer of suspect packets.
attack is ongoing at different sites.                                However, in the context of collaborating sites, the suspect
          Note too that each site can contribute to false positive   anomaly can be corroborated by some other site that may also
reduction since the scores of the suspects are relatively low in     have a record of it in their buffer, as a remote site may have a
comparison to the true worms.                                        larger buffer or may have received the worm at a different time.
          Furthermore, the Zstr metric shows the best separation     The distributed sites essentially serve as a remote long-term
here, and with the added advantage of preserving the privacy of      store of information, extending the local buffer memory
the exchanged content. These two metrics can also be applied         available at one site. Further, this strategy concurrently
to the ingress/egress traffic correlation, especially for            generates content filtering signatures. Any two sites that
polymorphic worms that might re-order their content.                 correlate and validate suspects as being true worms both have
          There are two interesting observations from this data.     available the actual packet content from which to generate a
The circle in the LCSeq plot represents the similarity score         signature, even if only Z-strings are exchanged between those
when exchanging the alerts among the sites that P-DPL                sites.
generated for CR and CR II. LCSeq is the only metric that gave                          VII.     Evaluation Results
a relatively higher score that is worth noticing, while all the
others provide less compelling scores. When we looked back at                 As the first step in evaluating P-DPL, we compared
the tcpdump of CR and CR II, both of them contained the              the two functions’ extraction methods (i.e., SM and IDA) and
string:                                                              the common-function-filtering capability of the CFLs that were
          ―GET./default.ida?........u9090%u6858%ucbd3%u780           generated using the two methods. Fig. 8 shows the percentage
1%u9090%u6858%ucbd3%‖while CR has a string of repeated               of candidate and common function corpora among all functions
―N‖, and CR II has string of repeated ―X‖ padding                    extracted from the malware set by the SM and IDA-Pro
their content. Since subsequences do not need to be adjacent in      extraction methods for each CFL size (500, 1000, 2000, 4000,
the LCSseq metric, LCSeq ignored the repetitions of                  and 8000 files). It is evident from the diagram that SM is
the unmatched ―N‖ and X substrings and successfully picked           capable of extracting more functions from the malware files:
out the other common substrings. LCS also had a higher-than-         IDA extracted 57 929 functions, while SM extracted 249 158.
average score here, but not as good as LCSeq. This example           Function size was limited to a minimum of 16 B and a
suggests that polymorphic worms attempting to mask                   maximum of 256 B.6 The figure also shows that SM is capable
themselves by changing their padding may be detectable by            of trimming a larger portion of functions, which appear in the
cross-site collaboration under the LCSeq metric.                     CFL and that the portion of remaining functions becomes
          Another observation is that the LCSeq and LCS results      smaller along with the increase in the size of the CFL.
display several packet content alerts with high similarity scores.            The first observation, regarding SM’s extraction
These were false alerts generated by the correlation among the       superiority, is consistent with IDA-Pro detecting fewer
sites. The scores were measured at about 0.4 to 0.5. Although        functions from the training set. This is probably due to the fact
they are still much smaller than the worm scores, they are           that IDA-Pro is more rigid software and cannot deal effectively
already outliers since they exceeded the score threshold used in     with code obfuscation, which is a prominent technique,
this experiment. We inspected the content of these packets, and      employed by hackers [30]. However, the SM method, by
discovered that they included long padded strings attempting to      nature, works on high recall (extracting as many functions as
hide the HTTP headers. Some proxies try to hide the query            possible), and low precision (many of the extracted function
identity by replacing some headers with meaningless characters       might not be really functions); thus it extracts more functions.
– in our case, consisting of a string of ―Y‖s. Such                  The second observation, the filtering capability of the two
payloads                                                             methods, can be explained straightforwardly by the fact that as
were correlated as true alerts while using LCSeq/LCS as              the size of the CFL grows, the likelihood increases that a
metrics, although they are not worms. However, these                 function extracted from the malware set will appear in the CFL.
anomalies did not appear when we used the Zstr metric, since                  For some malware files, it might be the case that all of
the long string of ―Y’s‖ used in padding the HTTP header             the extracted functions were identified as common functions,
only influences one position in the Z-string, but has no impact      and therefore, were filtered out by the CFL. In such cases, the
on the remainder of the Z- string.                                   method cannot generate a signature for the malware. Fig. 9
These results suggest that cross-sites collaboration can greatly     depicts the percentage of malware that was left without
help identify the early appearance of new zero-day worms             candidates. The figure shows that IDA missed more malware.
while reducing the false positive rates of the constituent P-DPL     The reason for that is that 1) it extracts fewer functions and 2)
anomaly detectors. The similarity score between worms and            IDA only detects functions that are being called from other
their variants are much higher than those between ―true‖ false       functions using standard protocols. This may not be the case in
positives (normal data incorrectly deemed anomalies), and can        malware that wishes to camouflage its existence. As expected,
be readily separated with high accuracy.                             for both methods, increasing the CFL also increases the missed
          When several sites on the Internet detect similar          malware, but also the gap between IDA and SM narrows. Figs
anomalous payloads directed at them, they can confirm and            10–13 depict the detection rate of candidate signatures and
validate with each other with high confidence that an attack is      signatures selected for the malware set in the control file set.
underway. As we mentioned earlier, this strategy can also solve      This rate serves as a measure of the false-positive detection rate
the limited buffer size problem described in Section 4.3. If we
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


of malware in benign files.We checked the false-positive rate of
signature candidates in the control set files. Fig. 10 depicts the
percentage of signature candidate detected in the control set as
a function of the candidate’s length in bytes and CFL size. We
expected that the false-positive rate would drop for longer
signature candidates and for larger CFL. The length of a
signature candidate affects the probability of finding the same
byte sequence in an arbitrary file. Indeed, regardless of function
and signature extraction techniques, (SM/IDA), short
candidates caused most of the hits. Consequently, based on the
diagram, we recommend using function candidates if their
length is above 112 B in order to ensure a lower false-positive
rate. An exception is shown with the SM with CFL size 500
MB and SM with CFL size 1000 MB, where we see a high
false-positive rate for candidates that are 160–176 B. Using Fig. 9.Percentage/number of common functions versus
CFL of 2000 MB and more eliminates the problematic candidate functions extracted from malware set for IDA and
candidates. Additionally, we can see that using a larger CFL SM for several CFL dataset sizes.
contributes considerably to lowering the false-positive rate in
both function extraction methods.
         In Fig. 11, we compare the false-positive rate of the
two function extraction methods: SM and IDA-Pro for different
CFL sizes. With both extraction methods, the false-positive rate
is reduced when a large CFL is used.
         Compared to IDA, the SM method achieves a lower
false-positive rate when using CFL with a size greater than
1000 files.

          Next, we compare the mean false-positive rate
(averaged over both SM and IDA function extraction methods)
when using candidates with/without a 16-B offset (see Fig. 12)
and when randomly choosing a signature or by using the
entropy score see Fig. 13). As expected, adding a 16-B offset to
the candidate functions and using the entropy score to choose a
signature helps in reducing the percentage of signatures
detected in the control set files. This was consistent for all
tested CFL sizes. The entropy selection method favors large        Fig 10. Malware without signature candidates—the percentage
(signatures; approximately 80% of the signatures that the          of malware that were left without candidates (i.e., all extracted
entropy-based method selected were larger than 112 most            functions were filtered by the functions in the CFL), and thus,
significant improvement when the CFL size is increased. This       the method cannot extract signatures.
is shown by the detection rate declining from 2.7% to 0% for a
CFL greater than 2000 MB. Note that detection rate in this                 The worse signature-generation method is IDA when
context B. Selecting a candidate randomly shows that only 50%      not using an offset of 16 B. However, this method, when
are 112 B and more. This observation complies with the results     combined with CFL containing 8000
presented in Fig. 9 regarding the recommended size of
candidates.
          Since the entropy method evidently showed better
results than Rand, we continued to investigate the signature-
generating methods using only the entropy method.

          The goal of the next and final experiment was to show
how IDA and SM methods are affected by using offset along
with the signature candidate for different CFL sizes. It also
provides the most significant improvement when the CFL size
is increased. This is shown by the detection rate declining from
2.7% to 0% for a CFL greater than 2000 MB. Note that
detection rate in this context relates to the false positives or,
undesirable detection on a malware signature in benign files;
thus, a lower detection rate is better.                           Fig. 11. False-positive rate of candidates in the control set files
                                                                  as a function of the candidate size in bytes.
ISSN: 2278 – 1323
                                                           International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                               Volume 1, Issue 4, June 2012


files, still manages to have a low false positive rate (FPR) of
                                                                Fig. 13.Comparing the mean false-positive rate (averaged over
0.4%. Finally, we tested the signatures generated by P-DPL for
                                                                SM and IDA function extraction methods) with/without adding
false negatives using a DefensePro intrusion detection
                                                                offset bytes to function candidates.
appliance. False negatives in the context of P-DPL mean that a
signature generated for a malware file was not identified in an




                                                                Fig. 14. Comparing the mean false-positive rate (averaged over
                                                                SM and IDA function extraction methods) when using random
                                                                (Rand) signature selection and entropy-based selection. The
Fig. 12. Comparing the false-positive rate for the two function entropy-based heuristic performs better than when using the
extraction methods: SM and IDA-Pro (IDA) as a function of the random selection method, for all CFL sizes.
CFL size.
                                                                       Entropy-based selection. The entropy-based heuristic
instance of the same malware (e.g., as a result of a long with performs better than when using the random selection method,
signature split over multiple packets). Therefore, false for all CFL sizes.Different fragmentation from given graft chart
negatives depend on the detection engine. The malware over SM and IDA function extraction methods.
detection capability of DefensePro is based on IP packet
inspection and does not reconstruct the files. We uploaded the
signatures extracted by P-DPL to the DefensePro signature
database and configured the device to reset any session for
which a packet was identified with a malware signature. We
transmitted all malware (for which P-DPL successfully
generated a signature) via the DefensePro (at the maximal
speed that we could load the link) and executed several tests in
which DefensePro successfully removed all malware.
         Additionally, we measured the time required by P-
DPL to generate a signature. The extraction time of a signature
as a function of the file size range from 100 - 900. A linear
increase in signature extraction time as a function of the file
size.




                                                                                        VIII.     Conclusion

                                                                          In this paper we propose a new automatic mechanism,
                                                                as P-DPL for extracting signatures from malware files.
                                                                Signatures generated by P-DPL are comprised of multiple byte-
                                                                strings, which are used by high speed network and malware
                                                                filtering devices. To minimize the risk of false positives P-DPL
                                                                employs a method for creating extracting files from originate
                                                                from the underlying standard development platforms and
                                                                malicious programs developed by these platforms.
ISSN: 2278 – 1323
                                                              International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                                  Volume 1, Issue 4, June 2012


          We tested our method in a network-security laboratory          based network intrusion detection systems,‖ IEEE Trans.
on various configurations in terms IDA-Pro, SM the IDA-Pro,              Syst. Man, Cybern.—Part C, vol. 38, no. 5, pp. 649–659,
SM are fast technique for extracting functions from assembly             Sep. 2008.
files. However SM for new compilers done by manually this
makes high prone to errors. In order to overcome this                [8] A. Shabtai, D. Potashnik, Y. Fledel, R. Moskovitch, and Y.
limitation, we are developed P-DPL and support viability of the          Elovici, ―Monitoring, analysis and filtering system for
general approach proposed by this research which suggested               purifying network traffic of known          and unknown
that general code identified as the functions in the program can         malicious content,‖ Secur. Commun. Netw. [Online].
be discarded. Realizing P-DPL in generating signatures for high          DOI:
speed network appliances for system methodology for building             10.1002/sec.229.
common repositories. The global variety of development
platforms are facilitated by the Internet, ensuring the external     [9] Y. Tang, B. Xiao, and X. Lu, ―Using a bioinformatics
validity of this study relies substantially on reaching a critical       approach to generate accurate exploit-based signatures for
mass of malware files. P-DPL is also helpful for creating and            polymorphic worms,‖ Comput.Secur., vol. 28, pp.
indentify allergy attacks against any signature is automatically         827–
created this type of attack is mainly relevant based and                 842, 2009.
realistic that use machine-learning algorithms In learning-base
algorithm can make the automated signature-generation method         [10] Jacob, G., Debar, H., Filiol, E.: Behavioral detection of
to consider malicious data.                                              malware: from a survey towards an established taxonomy.
          In order to cope with these fully obfuscated malware           Journal in Computer Virology 4(3) (2008)
on the packet easily on the high-speed deep packet inspection
devices. We believe that P-DPL should be implemented for             [11] Singh, S., Estan, C., Varghese, G., Savage, S.: Automated
filtering most of the malware by using high-speed malware                worm fingerprinting. In: OSDI’04: Proceedings of the 6th
filtering devices. We plan for detecting and extracting the              conference on Symposium on Operating Systems Design
binary code as well as selecting the best signature out of the           & Implementation, Berkeley, CA, USA, USENIX
collection of candidates using probability and variance method.          Association (2004) 4–4
In regular expressions defined by two or more distinct
signatures can be used in order to minimize the risk of              [12] Newsome, J., Song, D.: Dynamic taint analysis for
malwares.                                                                automatic detection, analysis, and signature generation of
                                                                         exploits on commodity software. In: Proceedings of the
                          References                                     Network and Distributed System Security Symposium
                                                                         (NDSS 2005). (2005)
[1] S. B. Cho, ―Incorporating soft computing techniques into
    a probabilistic intrusion detection system,‖ IEEE Trans. [13] Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou,
    Syst., Man, Cybern.—Part C, vol. 32, no. 2, pp. 154–160,       L., Zhang, L., Barham, P.: Vigilante: end-to-end
    May 2002.                                                      containment of internet worms. In: SOSP ’05: Proceedings
                                                                   of the twentieth ACM symposium on Operating systems
[2]                                        ssel, and P. Laskov,    principles, New York, NY, USA, ACM (2005) 133–147
    ―Learning and u classification of malware behavior,‖
    in Proc. Conf. Detect. Intrusions Malware Vulnerability [14 ] Xu, J., Ning, P., Kil, C., Zhai, Y., Bookholt, C.: Automatic
    Assessment, Springer Press, 2008, pp. 108–125.                 diagnosis and response to memory corruption
                                                                   vulnerabilities. In: CCS ’05: Proceedings of the 12th ACM
[3] M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F.            conference on Computer and communications security,
    Jahanian, and J. Nazario, ―Automated classification and        New York, NY, USA, ACM (2005) 223–234
    analysis of internet malware,‖ in Proc. 12th Int. Symp.
    Recent Adv. Intrusion Detect., Springer Press, 2007, pp. [15] M. Damashek. Gauging similarity with n-grams: language
    178–197.                                                       independent categorization of text. Science, 267(5199)
                                                                   :843--848, 1995
[4] K. Griffin, S. Schneider, X. Hu, and T. Chiueh, ―Automatic
    generation of string signatures for malware detection,‖ in [16] R. Lippmann, et al. The 1999 DARPA Off-Line Intrusion
    Proc. 12th Int. Symp. Recent Adv. Intrusion Detect.,           Detection Evaluation, Computer Networks 34(4) 579-595,
    Springer Press, 2009, pp. 101–120.                             2000.

[5] G. Jacob, H. Debar, and E. Filiol, ―Behavioral detection of [17] M. Locasto, J. Parekh, S. Stolfo, A. Keromytis, T. Malkin
     malware: From         a survey towards an established           and V. Misra. Collaborative Distributed Intrusion
     taxonomy,‖ J. Comput. Virol. vol. 4, pp. 251–266, 2008.         Detection, Columbia University Tech Report CUCS-012-
                                                                     04, 2004.
[6] D. Gryaznov, ―Scanners of the year 2000: Heuristics,‖ in
     Proc. 5th Int. Virus Bull., 1999, pp. 225–234.             [18] K. Wang and S. Stolfo. Anomalous payload-based network
 [7] J. Zhang, M. Zulkernine, and A. Haque, ―Random-forests-         intrusion detection, in Proceedings of Recent Advance in
                                                                     Intrusion Detection (RAID), Sept. 2004.
ISSN: 2278 – 1323
                                                          International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                              Volume 1, Issue 4, June 2012


AUTHOR DETAILS
                       1
                         N.Kannaiya Raja received MCA degree from Alagappa
                      University and ME degree in Computer Science and
                      Engineering from Anna University Chennai in 2007 and
                      he is pursing PhD degree in Manonmaniam Sundranar
                      University from 2008 and joined assistant professor
                      in various engineering collages in Tamil Nadu affiliated
                      to Anna University and has eight years teaching
                      experience his research work in deep packet inspection.
                      He has been
session chair in major conference and workshops in computer vision on
algorithm, network, mobile communication, image processing papers and
pattern reorganization. His current primary areas of research are
packet inspection and network. He is interested to conduct guest lecturer in
various engineering in Tamil Nadu.


                    2
                     Dr.K.Arulanandam received Ph.D. doctorate degree in
                    2010 from Vinayaka Missions University. He has twelve
                    years teaching experience in various engineering colleges
                    in Tamil Nadu which are affiliated to Anna University and
                    his     research   experience         network,     mobile
                    communication networks, image processing papers and
                    algorithm papers. Currently            working          in
                    Ganadipathy       Tulasi’s      Jain Engineering College
                    Vellore.


                   3
                    M.Balaji received B.Tech degree in      Information
                   Technology from Anna University Chennai in 2008
                   and now pursuing ME degree in Computer Science
                   and Engineering in Arulmigu Meenakshi Amman
                   College of Engineering affiliated to Anna University
                   Chennai.

Weitere ähnliche Inhalte

Was ist angesagt?

Malware Analysis and Prediction System
Malware Analysis and Prediction SystemMalware Analysis and Prediction System
Malware Analysis and Prediction SystemAzri Hafiz
 
Optimal remote access trojans detection based on network behavior
Optimal remote access trojans detection based on network behaviorOptimal remote access trojans detection based on network behavior
Optimal remote access trojans detection based on network behaviorIJECEIAES
 
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...CSCJournals
 
Formal Modeling and Verification of Trusted OLSR Protocol Using I-SPIN Model...
Formal Modeling and Verification of Trusted OLSR Protocol  Using I-SPIN Model...Formal Modeling and Verification of Trusted OLSR Protocol  Using I-SPIN Model...
Formal Modeling and Verification of Trusted OLSR Protocol Using I-SPIN Model...IOSR Journals
 
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSA STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSijaia
 
Anomaly detection final
Anomaly detection finalAnomaly detection final
Anomaly detection finalAkshay Bansal
 
Intrusion Alert Correlation
Intrusion Alert CorrelationIntrusion Alert Correlation
Intrusion Alert Correlationamiable_indian
 
A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningjaigera
 
Optimised malware detection in digital forensics
Optimised malware detection in digital forensicsOptimised malware detection in digital forensics
Optimised malware detection in digital forensicsIJNSA Journal
 
A Comparison Study of Open Source Penetration Testing Tools
A Comparison Study of Open Source Penetration Testing ToolsA Comparison Study of Open Source Penetration Testing Tools
A Comparison Study of Open Source Penetration Testing Toolsijtsrd
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionGrijesh Chauhan
 
When Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningWhen Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningLior Rokach
 
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMSA NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMSIJNSA Journal
 
A SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLS
A SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLSA SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLS
A SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLSIJNSA Journal
 
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONIEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONranjith kumar
 
Workshop on Setting up Malware Lab
Workshop on Setting up Malware LabWorkshop on Setting up Malware Lab
Workshop on Setting up Malware LabCharles Lim
 
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...Alexander Decker
 

Was ist angesagt? (20)

Malware Analysis and Prediction System
Malware Analysis and Prediction SystemMalware Analysis and Prediction System
Malware Analysis and Prediction System
 
Optimal remote access trojans detection based on network behavior
Optimal remote access trojans detection based on network behaviorOptimal remote access trojans detection based on network behavior
Optimal remote access trojans detection based on network behavior
 
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
Integrated Feature Extraction Approach Towards Detection of Polymorphic Malwa...
 
Formal Modeling and Verification of Trusted OLSR Protocol Using I-SPIN Model...
Formal Modeling and Verification of Trusted OLSR Protocol  Using I-SPIN Model...Formal Modeling and Verification of Trusted OLSR Protocol  Using I-SPIN Model...
Formal Modeling and Verification of Trusted OLSR Protocol Using I-SPIN Model...
 
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSA STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
 
Anomaly detection final
Anomaly detection finalAnomaly detection final
Anomaly detection final
 
Intrusion Alert Correlation
Intrusion Alert CorrelationIntrusion Alert Correlation
Intrusion Alert Correlation
 
A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learning
 
Malware1
Malware1Malware1
Malware1
 
Antimalware
AntimalwareAntimalware
Antimalware
 
Optimised malware detection in digital forensics
Optimised malware detection in digital forensicsOptimised malware detection in digital forensics
Optimised malware detection in digital forensics
 
A Comparison Study of Open Source Penetration Testing Tools
A Comparison Study of Open Source Penetration Testing ToolsA Comparison Study of Open Source Penetration Testing Tools
A Comparison Study of Open Source Penetration Testing Tools
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and Detection
 
When Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningWhen Cyber Security Meets Machine Learning
When Cyber Security Meets Machine Learning
 
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMSA NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
A NOVEL HEADER MATCHING ALGORITHM FOR INTRUSION DETECTION SYSTEMS
 
Kg2417521755
Kg2417521755Kg2417521755
Kg2417521755
 
A SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLS
A SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLSA SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLS
A SURVEY ON MALWARE DETECTION AND ANALYSIS TOOLS
 
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONIEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
 
Workshop on Setting up Malware Lab
Workshop on Setting up Malware LabWorkshop on Setting up Malware Lab
Workshop on Setting up Malware Lab
 
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
 

Andere mochten auch

Andere mochten auch (7)

61 66
61 6661 66
61 66
 
525 529
525 529525 529
525 529
 
536 541
536 541536 541
536 541
 
Jucecha
JucechaJucecha
Jucecha
 
Green Hectares Rural Tech Workshop – Learning Skype
Green Hectares Rural Tech Workshop – Learning SkypeGreen Hectares Rural Tech Workshop – Learning Skype
Green Hectares Rural Tech Workshop – Learning Skype
 
Sacos Plasticos
Sacos PlasticosSacos Plasticos
Sacos Plasticos
 
95 101
95 10195 101
95 101
 

Ähnlich wie 714 728

Malware analysis and detection using reverse Engineering, Available at: www....
Malware analysis and detection using reverse Engineering,  Available at: www....Malware analysis and detection using reverse Engineering,  Available at: www....
Malware analysis and detection using reverse Engineering, Available at: www....Research Publish Journals (Publisher)
 
Pre-filters in-transit malware packets detection in the network
Pre-filters in-transit malware packets detection in the networkPre-filters in-transit malware packets detection in the network
Pre-filters in-transit malware packets detection in the networkTELKOMNIKA JOURNAL
 
IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...
IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...
IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...IRJET Journal
 
Survey on classification techniques for intrusion detection
Survey on classification techniques for intrusion detectionSurvey on classification techniques for intrusion detection
Survey on classification techniques for intrusion detectioncsandit
 
DYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFA
DYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFADYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFA
DYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFAIJNSA Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Ceis 9 padeep kumar_final_paper
Ceis 9 padeep kumar_final_paperCeis 9 padeep kumar_final_paper
Ceis 9 padeep kumar_final_paperAlexander Decker
 
Intrusion Detection Systems By Anamoly-Based Using Neural Network
Intrusion Detection Systems By Anamoly-Based Using Neural NetworkIntrusion Detection Systems By Anamoly-Based Using Neural Network
Intrusion Detection Systems By Anamoly-Based Using Neural NetworkIOSR Journals
 
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROIDMACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROIDIRJET Journal
 
Optimised Malware Detection in Digital Forensics
Optimised Malware Detection in Digital Forensics Optimised Malware Detection in Digital Forensics
Optimised Malware Detection in Digital Forensics IJNSA Journal
 
Survey on Malware Detection Techniques
Survey on Malware Detection TechniquesSurvey on Malware Detection Techniques
Survey on Malware Detection TechniquesEditor IJMTER
 
Yolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda Chiramba
 
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...IRJET Journal
 
rpaper
rpaperrpaper
rpaperimu409
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...IJCSIS Research Publications
 
DB-OLS: An Approach for IDS1
DB-OLS: An Approach for IDS1DB-OLS: An Approach for IDS1
DB-OLS: An Approach for IDS1IJITE
 

Ähnlich wie 714 728 (20)

Malware analysis and detection using reverse Engineering, Available at: www....
Malware analysis and detection using reverse Engineering,  Available at: www....Malware analysis and detection using reverse Engineering,  Available at: www....
Malware analysis and detection using reverse Engineering, Available at: www....
 
Pre-filters in-transit malware packets detection in the network
Pre-filters in-transit malware packets detection in the networkPre-filters in-transit malware packets detection in the network
Pre-filters in-transit malware packets detection in the network
 
IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...
IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...
IRJET - Netreconner: An Innovative Method to Intrusion Detection using Regula...
 
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
 
Survey on classification techniques for intrusion detection
Survey on classification techniques for intrusion detectionSurvey on classification techniques for intrusion detection
Survey on classification techniques for intrusion detection
 
DYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFA
DYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFADYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFA
DYNAMIC IDP SIGNATURE PROCESSING BY FAST ELIMINATION USING DFA
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Ceis 9 padeep kumar_final_paper
Ceis 9 padeep kumar_final_paperCeis 9 padeep kumar_final_paper
Ceis 9 padeep kumar_final_paper
 
Intrusion Detection Systems By Anamoly-Based Using Neural Network
Intrusion Detection Systems By Anamoly-Based Using Neural NetworkIntrusion Detection Systems By Anamoly-Based Using Neural Network
Intrusion Detection Systems By Anamoly-Based Using Neural Network
 
Kx3419591964
Kx3419591964Kx3419591964
Kx3419591964
 
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROIDMACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
 
504 508
504 508504 508
504 508
 
Optimised Malware Detection in Digital Forensics
Optimised Malware Detection in Digital Forensics Optimised Malware Detection in Digital Forensics
Optimised Malware Detection in Digital Forensics
 
Survey on Malware Detection Techniques
Survey on Malware Detection TechniquesSurvey on Malware Detection Techniques
Survey on Malware Detection Techniques
 
Internet of Things (IoT)
Internet of Things (IoT)Internet of Things (IoT)
Internet of Things (IoT)
 
Yolinda chiramba Survey Paper
Yolinda chiramba Survey PaperYolinda chiramba Survey Paper
Yolinda chiramba Survey Paper
 
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
IRJET- Zombie - Venomous File: Analysis using Legitimate Signature for Securi...
 
rpaper
rpaperrpaper
rpaper
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
 
DB-OLS: An Approach for IDS1
DB-OLS: An Approach for IDS1DB-OLS: An Approach for IDS1
DB-OLS: An Approach for IDS1
 

Mehr von Editor IJARCET

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationEditor IJARCET
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Editor IJARCET
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Editor IJARCET
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Editor IJARCET
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Editor IJARCET
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Editor IJARCET
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Editor IJARCET
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Editor IJARCET
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Editor IJARCET
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Editor IJARCET
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Editor IJARCET
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Editor IJARCET
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Editor IJARCET
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Editor IJARCET
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Editor IJARCET
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Editor IJARCET
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Editor IJARCET
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Editor IJARCET
 

Mehr von Editor IJARCET (20)

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturization
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107
 

Kürzlich hochgeladen

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 

Kürzlich hochgeladen (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 

714 728

  • 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 A FAST POSITIVE APPROACH OF P-DPL IN THE PACKET INSPECTION 1 N.Kannaiya Raja, 2K.Arulanandam, 3M.Balaji (system calls, network access and files, and memory Abstract-The signature extraction process is based on a modifications) [1]–[3]. In static detection method is based on comparison with a common function repository. By eliminatin information explicitly extracted or implicitly from the functions appearing in the common function repository from the executable source code. The main processing of static detection signature candidate list, P-DPL can minimize the risk of false-positive method is in providing rapid categorized. Since antivirus detection errors. To minimize false-positive rates for P-DPL proposes vendors are handling every day an overcome amount of suspect intelligent candidate selection using entropy score to generate files for inspection [4], fast detection is essential. Static method signatures. Evaluation of P-DPL was conducted under various analysis solutions are mainly implemented using two methods: conditions. The findings suggest that the proposed method can be used signature-based and heuristic-based. Signature based methods for automatically generating signatures that are both specific, sensitive. In this paper we propose a new automatic mechanism, termed P-DPL trust on the finding the unique strings in the source code [4]. for extracting signatures from malware files and unwanted mapping The algorithmic methods are based on procedure, which are files. Signatures generated by P-DPL are comprised of multiple byte- either determined by expert staff or by machine that specify a strings, which can be used by high-speed, network-based, malware malicious [5], [6]. As a case in point, Zhang et al. [7] in the filtering devices. In order to minimize the risk of false positives (i.e., random forest data-mining algorithm to detect misuse and detection of a malware signature in benign executable files), P-DPL abnormal network intrusions. The time period of time from the employs a method for sanitizing executable file from chunks of code release of an unknown malware until security that originate from the underlying standard development platforms and software/hardware vendors update their client with the proper replicated in various instances of begins and malicious programs malware signature is extremely critical. At this time, the developed by these platforms. In this method we have developed a new innovative form to find malicious data in the packet. We believe malware is undetectable by most signature-based solutions and that P-DPL Another direction we intend to examine is the use of a is usually termed a zero attack. This malware can easily spread malware function library (MFL) in the signature generation process in and corrupt all machines, it is extremely essential to detect it as order to further strengthen the signatures and minimize the risk of false soon as possible. So that signature-based solutions generate a positives. In addition, regular expressions defined by two or more suitable signature for block all threats. Defend organizations distinct signatures can be used in order to further minimize the risk of by prevent all type of malware. Carry through deep packet false positives. check all signatures for detecting and removing attacks such as malware spreads worms, denial-of-service, or distant Key words—Packet-Deployment payload (P-DPL), Automatic exploitation of vulnerabilities. Monitor network for prevent signature generation (ASG), malware, malware filtering. performance. Devices analysis the content of the packets. The I. INTRODUCTION process of generating unique signatures for malware filtering devices. Different methods are used for automatic signature n communication system are highly hyper sensitized to generation have been proposed in domain. The techniques I various types of attack. A parliamentary way of processing focusing on malware, worm, and where the signature is these attacks is by means of malicious software, such as worms, extracted that the after the malware is executed in the course of viruses, and Trojan horses. When it is spread, it can cause launching the attack. Different methods processed to extract severe problems to all users, companies, and governments. signatures from full-fledged malware executables that may Now the development in high-speed Internet connections gives contain a significant portion of code emanating from a higher level for creating and rapidly spread the new malware. development tools and platforms. In this research we find the Several techniques for detecting and deleting malware have problems, and evaluate an automatic signature generation been proposed. They are two types one is static and another one technique for P-DPL. dynamic. In dynamic detection method is based on information collected from the operating system at execution of the program 1 N.Kannaiya Raja, M.E., (P.hd) .,A.P/CSE Dept. Arulmigu Meenakshi Amman College of Engg, Thiruvannamalai Dt, near Kanchipuram Kanniya13@hotmail.co.in 2 Dr. K.Arulanandam, Prof & Head, CSE Department Ganadipathy Tulsi’s Jain Engineering College, Vellore sakthsivamkva@gmail.com 3 M. Balaji, M.E., Arulmigu Meenakshi Amman College of Engg,Thiruvannamalai Dt, near Kanchipuram. Fig 1 P-DPL creation and signature generating processes mbalaji23@gmail.com . P-DPL is created for multiple-string, signatures that can be used in intrusion detection systems for filtering malware. To improve its imprecision, P-DPL process and
  • 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 complete and structured method, which are extracts the ordered sets of multiple tokens that match multiple variants of malware’s unique code from other segments of common and multiform worms. Honeycomb overlays parts of the flows in usually benign code, such as library files. When the sensitivity the traffic and uses a longest common substring (LCS) process ends, the remaining codes are the malicious code. Since algorithm to spot similarities in packet payloads. Subsequently the process is go on by generating a unique signature from the designed a double-honey pot system and introduced the malicious code, which can be used for removing the malware. position-aware distribution signatures (PADS) that are The main objective of this research is creating signatures from computed from polymorphic worm samples and are composed malware, spyware, Trojan horses, worms and viruses. The main of a byte frequency distribution instead of a fixed value for hypothesis is that in a superior step, suspected files are each position in the signature ―string.‖ Tang et al. [11] classified as benign or malicious by a human expert or by an use sequence alignment techniques, drawn from bio informatics, automated detection tool. This processing allows us to focus on to derive simplified regular aspect exploit-based signatures. the signature-generation process, but it also in the quality of the Exploit-based signatures can be generated quickly to detect signatures on the accuracy of the of mistrustful files. P-DPL zero-attack exploits of uncovered vulnerabilities. However, low was used as the automatic signature generation (ASG) module damage on multiform malware. And also the signatures created of the eDare (early detection, alert, and response) framework by the above techniques that are extracted and tested for short, [8] eDare is aimed at mitigating the spread of both known and worm, malware, the fact is that the malware, for example unknown malware in computer networks. eDare operates by viruses and Trojan horses, can be as large executable files, it first monitoring network traffic and filtering out known consist of full-fledged applications. These files usually contain malware using high-speed filtering devices that are a significant portion of different code segments that are spread continuously updated with signatures generated by P-DPL by the software development platform spawning the malware. .Next; unknown files are extracted from the remaining traffic For this case the large malware files, selecting a signature that and examined using various machine-learning and temporal will be both sensitive and specific. Another limitation of these reasoning methods in order to classify the files as malicious or techniques is that they focus on detecting malware after it has benign. P-DPL is implemented in the last step to extract been unleashed and try to generate a signature from the traffic it signatures from newly detected malicious files. When eDare creates at the time attack is being processed. A payload-based identifies a new threat, P-DPL automatically produces a signature finding the malware code. In this paper falls into the signature, and then, the filtering devices that are stationed on payload-based signature concept. Payload-based signature the network infrastructure are automatically updated. This generation methods are presented in [4]. At present a two-step process is very fast, and also faster than when human statistical method for automatically extracting well, ―the best‖ intervention, it is effective against zero attacks. The P-DPL signatures from the code of a malware. First of all programs technique and a set of research that were performed on a on detached machines are intentionally affected with the virus. collection of malicious and benign executable. We were work The affected portion of the program are analyze with one is in finding the length and selection of a signature among another to found that regions of the virus are constant from one several candidates. instance to another. These regions are considered as signature candidates. The second phase estimates the probability that II. Related Works each of all candidate signatures will match a randomly chosen. The candidate with the lowest estimated false-positive is Since the signature must be general enough to capture selected as a signature. The Hancock system [4] was proposed as instances of the malware, Thus far sufficiently specific to for automatically extracting signatures for antivirus software. avoid over lapping with the content of normal traffic in order to Based on several heuristics, the Hancock system generates a set minimize false positives. The malware signatures can be of signature candidates, selecting the candidates that are not classified as vulnerability-based, exploit-based and payload- likely to be found in benign code. Our approach, Hancock based [9]. A vulnerability-based signature describes the relies on modeling benign code in order to minimize false- properties of a certain bug in the system that can be maliciously alarm risks. The Auto-Sign signature generator modeled both exploited by the malware. Vulnerability-based signatures do benign and malicious code using byte 3-grams representation in not process to detect each every malicious code exploiting the order to select good signature candidates. Next, the signature vulnerability; it is very effective when dealing with multiform candidates are ranked according to three different measures in malware. Even though, a vulnerability-based signature can be order to select the best signature. Although, the Hancock generated only when the vulnerability is find. An exploit-based system and Auto-Sign differ from our approach, which is signature describes a sequence of commands triggered by the semantic aware in the sense that it does not rely on arbitrary malware, which process exploits vulnerability in the system. byte code sequences, but the code representing internal Exploit-based methods include Autograph, P-DPL sensor Net functions of the software. In addition, the methods presented in spy .which focus on analyzing similarities in packet payloads [4] and focus on generating signatures for antivirus software, belonging to network. These systems first identify abnormal the limitation of signature length is not necessarily considered. traffic originating from distrustful IP addresses, and then, Other solutions have been proposed for protecting systems and generate a signature by identifying most frequently occurring preventing an attack beforehand rather than detecting the attack byte sequences. The Nemean architecture first clusters similar after it has been launched. This can be done by generating sessions, and uses machine-learning techniques to generate signatures based on sequences of instructions that represent semantic-aware signatures for each cluster. Polygraph expands malicious or benign behavior. These sequences can be extracted the notion of single substring signatures to be joined, and to either by statically analyzing the program after disassembly or
  • 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 by monitoring the program during execution. For example, resilience of signatures to polymorphic malware variants. protecting a system from buffer overflow attacks can be Another common method for detecting polymorphic malware is achieved by: 1) creating signatures for legitimate instruction to incorporate semantics awareness into signatures. For blocks and matching instruction sequences of monitored example, Christodorescu et al. proposed static semantics-aware programs with the signature repository 2) using obfuscation of malware detection. They applied a matching algorithm on the pointers in such a way that a malicious application that tries to disassembled binaries to find the instruction sequences that exploit a buffer overflow vulnerability will not be able to match the manually generated templates of malicious create valid pointers or 3) by applying array and pointer behaviors, e.g., decryption loop. A framework for automatic boundary checking. As opposed to such methods, our goal in generation of intrusion signatures from honey net packet traces. this research is to generate signatures for high-speed traffic Nemean applied clustering techniques on connections and filtering devices that do not rely on installation or modification sessions to create protocol-semantic-aware signatures, thereby of end points and that will protect the end points at the network reducing the possibility of false alarms. level. In summation, each of the aforementioned techniques Another loosely related area is the automatic suffers from at least one critical limitation. Some rely on small generation of attack signatures, vulnerability signatures and and coherent malware files, but such files may not constitute software patches. TaintCheck [12] and Vigilante [13] applied the general case. Other techniques rely on observing malware taint analysis to track the propagation of network inputs to data behavior, but such malware cannot always be fully monitored. used in attacks, e.g., jump addresses, format strings and system Other methods search for packet similar, not assure true low call arguments, which are used to create signatures for the false positive. Our method disregards the malware size attacks. Other heuristic-based roaches [14] have also been assumption. In addition, it does not require activating the proposed to exploit properties of specific exploits (e.g., buffer malware. overflow) and create attack signatures. Generalizing from these Modern anti-virus software typically employ a variety approaches, Brumley et al. proposed a systematic method that of methods to detect malware programs, such as signature- used a formal model to reason about vulnerability signatures based scanning, heuristic-based detection, and behavioral and quantify the signature qualities. An alternative approach to detection [10]. Although less proactive, signature-based preventing malware from exploiting vunerabilities is to apply malware scanning is still the most prevalent approach to data patches in the firewalls to filter malicious traffic. To identify malware because of its efficiency and low false automatically generate data patches. Which leveraged the positive rate. Traditionally, the malware signatures are created knowledge of data format of malicious attacks to generate manually, which is both slow and error-prone. As a result, potential attack instances and then created signatures from the efficient generation of malware signatures has become a major instances that successfully exploit the vulnerabilities? challenge for anti-virus companies to handle the exponential Hancock differs from previous work by focusing on growth of unique malware files. To solve this problem, several automatically generating high-coverage string signatures with automatic signature generation approaches have been proposed. extremely low false positives. Our research was based loosely Most previous work focused on creating signatures on the virus signature extraction, which was commercially used that are used by Network Intrusion Detection Systems (NIDS) by IBM. They used a 5-gram Markov chain model of good to detect network worms. Singh et al. Proposed EarlyBird [11], software to estimate the probability that a given byte sequence which used packet content prevalence and address dispersion to would show up in good software. They tested hand-generated automatically generate worm signatures from the invariant signatures and found that it was quite easy to set a model portions of worm payloads. Autograph exploited a similar idea probability threshold with a zero false positive rate and a to create worm signatures by dividing each suspicious network modest false negative rate (the fraction of rejected signatures flow into blocks terminated by some breakmark and then that would not be found in goodware) of 48%. They also analyzing the prevalence of each content block. The suspicious generated signatures from assembly code (as Hancock does), flows are selected by a port-scanning flow classifier to reduce rather than data, and identified candidate signatures by running false positives. Kreibich and Crowcroft developed Honeycomb, the malware in a test environment. Hancock does not do this, as a system that uses honeypots to gather inherently suspicious dynamic analysis is very slow in large-scale applications. traffic and generates signatured by applying the longest Symantec acquired this technology from IBM in the mid-90s common sub string (LCS) algorithm to search for similarities in and found that it led to many false positives. The Symantec the packet payloads. One potential drawback of signatures engineers believed that it worked well for IBM because IBM’s generated from previous approaches is that they are all anti-virus technology was used mainly in corporate continuous strings and may fail to match polymorphic worm environments, making it much easier for IBM to collect a payloads. Polygraph instead searched for invariant content in representative set of goodware. By contrast, signatures the network flows and created signatures consisting of multiple generated by Hancock are mainly for home users, who have a disjoint content sub strings. Polygraph also utilized a naive much broader set of goodware. The model’s training set cannot Bayes classifier to allow the probabilistic matching and possibly contain, or even represent, all of this goodware. This classification, and thus provided better proactive detection poses a significant challenge for Hancock in avoiding FP-prone capabilities. A system that used a model-based algorithm to signatures. analyze the invariant contents of polymorphic worms and analytically prove the attack-resilience of generated signatures. III. Payload Based Anomaly Detection PDAS (Position-Aware Distribution Signatures) took advantage of a statistical anomaly-based approach to improve the A. Overview of the P-DPL Sensor:
  • 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 The P-DPL sensor is based on the principle that zero- To compare the similarity between test data at detection time day attacks are delivered in packets whose data is unusual and and the trained models computed during the training period, P- distinct from all prior ―normal content‖ flowing to or from the DPL uses simplified Mahalanobis distance [18]. Mahalanobis victim site. We assume that the packet content is available to distance. To compare the similarity between test data at time the sensor for modeling3. We compute a normal profile of a and the trained models computed during the training period, P- site’s unique content flow, and use this information to detect DPL uses simplified Mahalanobis distance [18]. Mahalanobis anomalous data. A ―profile‖ is a model or a set of models that distance weights each variable, the mean frequency of a 1- represent the set of data seen during training. Since we are gram, by its standard deviation and covariance. The distance profiling content data flows, the method must be general to values produced by the models are then subjected to a threshold work across all sites and all services, and it must be efficient test. If the distance of a test datum is greater than the threshold, and accurate.Our initial design of P-DPL uses a ―language P-DPL issues an alert for the packet. There is a distinct independent‖ methodology, the statistical distribution of n- threshold setting for each centroid computed automatically by grams [15] extracted from network packet datagrams. This P-DPL during a calibration step. methodology requires no parsing, no interpretation and no emulation of the content. An n-gram is the sequence of n adjacent byte values in a packet payload. A sliding window with width n is passed over the whole payload one byte at a time and the frequency of each n-gram is computed. This frequency count distribution represents a statistical centroid or model of the content flow. The normalized average frequency and the variance of each gram are computed. The first implementation of P-DPL uses the byte value distribution when n=1. The statistical means and variances of the 1- grams are stored in two 256-element vectors. However, we condition a distinct model on the port (or service) and on packet length, producing a set of statistical centroids that in total provides a fine-grained, compact and effective model of a site’s actual content flow. Full details of this method and its effectiveness are described in [18]. The first packet of CRII illustrates the 1-gram data representation implemented in P-DPL. Figure 1 shows a portion Fig. 3. CRII payload distribution (top plot) and its rank of the CRII packet, and its computed byte value distribution order distribution (bottom plot) along with the rank ordered distribution is displayed in Figure 2, from which we extract a Z-string. The Z-string is a the string To calibrate the sensor, a sample of test data is measured of distinct bytes whose frequency in the data is ordered from against the centroids and an initial threshold setting is chosen. most frequent to least, serving as representative of the entire A subsequent round of testing of new data updates the distribution, ignoring those byte values that do not appear in the threshold settings to calibrate the sensor to the operating data. The rank ordered distribution appears similar to the Zipf environment. Once this step converges, P-DPL is ready to enter distribution, and hence the name Z-string. The Z-string detection mode. Although the very initial results of testing P- representation provides a privacy-preserving summary of DPL looked quite promising, we devised several improvements payload that may be exchanged between domains without to the modeling technique to reduce the percentage of false revealing the true content. Z-strings are not used for detection, positives. but rather for message exchange and cross domain correlation of alerts. B. New P-DPL Features Multiple Centroids P-DPL is a fully automatic, ―hands-free‖ online GET./default.ida?XXXXXXXXXXX x anomaly detection sensor. It trains models and determines when XXXXXXXXXXXXXXXXXXXXXX they are stable; it is self-calibrating, automatically observes XXXXXXXXXXXXXXXXXXXXXX itself, and updates its models as warranted. The most important XXXXXXXXXXXXXXXXXXXXXX new feature implemented in P-DPL over our prior work is the XXXXXXXXXXXXXXXXXXXXXX use of multiple centroids, and ingress/egress correlation. In the XXXXXXXXXXXXXXXXXXXXXX first implementation, P-DPL computes one centroid per length bin, followed by a stage of clustering similar centroids across XXXXXXXXXXXXXXXXXXXXXX neighboring bins. We previously computed a model Mij for XXXXXX%u9090%u6858%ucbd3%u7 each specific observed packet payload length i of each port j. In 801%u9090%u6858%ucbd3%u7801%u this newer version, we compute a set of models Mkij , k≥1. 9090%u6858%ucbd3%u7801%u9090% Hence, within each length bin, multiple models are computed u9090%u8190%u00c3%u0003%u8b000 prior to a final clustering stage. The clustering is now executed %u531b%u53ff%u0078%u0000%u0 u0 across centroids within a length bin, and then memory requirements for models while representing normal content Fig. 2. A portion of the first packet of CodeRed II flow more accurately and revealing anomalous data with
  • 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 greater clarity. Since there might be different types of payload the byte distributions among the sites for the same length bin. sent to the same service, e.g., pure text, .pdf, or .jpg, we used This is confirmed by the values of Manhattan distances an incremental online clustering algorithm to create multiple computed between the distributions, with results displayed in centroids to model the traffic with finer granularity. This Table 1. modeling idea can be extended to include centroids for different The content traffic among the sites is quite different. media that may be transmitted in packet flows. Different file For example, the EX dataset is more complex containing file and media types follow their own characteristic 1-gram uploads of different media types (pdf, jpg, ppt, etc. ) and distribution; including models for standard file types can help webmail traffic; the W dataset contain less of this type of traffic reduce false positives. The multi-centroid strategy requires a while W1 is the simplest, containing almost no file uploads. different test methodology. During testing,an alert will be Hence, each of the site-specific payload models is diverse, generated by P- DPL if a test packet matches none of the increasing the likelihood that a worm payload will be detected centroids within its length bin. The multicentroid technique by at least one of these sites. To avoid detection, the worm produces more accurate payload models and separates the exploit would have to be padded in such a way that its content anomalous payloads in a more precise manner. description would appear to be normal concurrently for all of these sites. C. Data Diversity across Sites A crucial issue we study is whether or not payload models are truly distinct across multiple sites. This is an important question in a collaborative security context. We have claimed that the monoculture problem applies not only to common services and applications, but also to security technologies. Hence, if a site is blind to a zero-day attack this implies that many other sites are blind to the same attack. Researchers are considering solutions to the monoculture problem by various techniques that ―diversify‖ implementations. We conjecture that the content data flow among different sites is already diverse even when running the exact same services. In our previous work we have shown that byte distributions differ for each port and length. We also conjecture that it should be different for each host. For example, each web server contains different ASCII characters 0-255 URLs, implements different functionality like web email or media uploads, and the population of service requests and Fig. 4. Example byte distribution for Payload length 249 of responses sent to and from each site may differ, producing a port 80 for the three sites EX, W, W1, in order from top to three diverse set of content Profiles across all collaborating hosts sites EX, W, W1 bottom and sites. Hence, each host or site’s profile will be substantially different from all others. A zero-day attack that may appear as normal data at one site, will EX-data Length 1380 likely not appear as normal data at other sites since the normal profiles are different. We test whether or not this conjecture is true by several experiments. One of the most difficult aspects of doing research in this area is the lack of real- world datasets available to researchers that have full packet content for formal scientific study4. Privacy policies typically prevent sites from sharing their content data. However, we were able to use data from three sources, and show the distribution or each. The first one is an external commercial organization that wishes to remain anonymous, which we call EX. The others are the two web W1-data Length 1380 servers of the CS Department of Columbia, www.cs.columbia.edu and www1.cs.columbia.edu. We call these two data sets W and W1, respectively. The following plots show the profiles of the traffic content flow of each site. The plots display the payload distributions for ASCII characters 0-255 different packet payload lengths i.e. 249 bytes and 1380 bytes, spanning the whole range of possible payload lengths in order to give a general view of the diversity of the data coming from Fig. 5. Example byte distribution for payload length of 1380 of the three sites. Each byte distribution corresponds to the first port 80 for the centroid that is built for the respective payload lengths. We observe from the above plots that there is a visible difference in Table 1. The Manhattan distance between the byte distributions
  • 6. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 of the profiles computed for the three sites, for three length first created a clean set of packets free of any known worms bins. still flowing on the Internet as background radiation. We then inserted the same set of worm traffic into the cleaned test set using tcpslice. Thus, we created ground truth in order to 249 bytes 940 bytes 1380 bytes compute the accuracy and false positive rates. The worm set includes CodeRed, CodeRed II, MD(EX, W) 0.4841 0.6723 0.2533 WebDAV, and a worm that exploits the IIS Windows media MD(EX,W1) 0.3710 0.8120 0.4962 service, the nsiislog.dll buffer overflow vulnerability (MS03- 022). These worm samples were collected from real traffic as MD(W,W1) 0.3689 0.5972 0.6116 they appeared in the wild, from both our own dataset and from a third-party. Because P-DPL only considers the packet payload, the worm set is inserted at random places in the test Mimicry attacks are possible if the attacker has access to the data. The ROC plots in Figure 5 show the result of the detection same information as the victim. In the case of application rate versus false positive rate over varying threshold settings of payloads, attackers (including worms) would not know the the P-DPL sensor. distribution of the normal flow to their intended victim. The attacker would need to sniff each site for a long period of time and analyze the traffic in the same fashion as the detector described herein, and would also then need to figure out how to All worms reliably detected pad their poison payload to mimic the normal model. This is a daunting task for the attacker who would have to be clever indeed to guess the exact distribution as well as the threshold logic to deliver attack data that would go unnoticed. Additionally, any attempt to do this via probing, crawling or other means is very likely to be detected. Besides mimicry attack, clever worm writers may figure a way to launch 'training attacks’ against anomaly detectors such as P-DPL. In this case, the worm may send a stream of content with increasing diversity to its next victim site in order to train the content sensor to produce models where its exploit no longer would appear anomalous. This as well is a daunting task for the worm. The worm would be fortunate indeed to launch its training attack when the sensor is False Positive Rate(%) in training mode and that a stream of diverse data would go Fig. 6 ROC of P-DPL detecting incoming worms, false positive unnoticed while the sensor is in detection mode. Furthermore, rate restricted to less than 0.5% the worm would have to be extremely lucky that each of the content examples it sends to train the sensor would produce a The detection rate and false positive are both based on "non-error" response from the intended victim. Indeed, P-DPL the number of packets. The test set contains 40 worm packets ignores content that does not produce a normal service although there are only 4 actual worms in our zoo. The plots response. These two evasion techniques, mimicry and training show the results for each data set, where each graphed line is attack, is part of our ongoing research on anomaly detection, the detection rate of the sensor where all 4 worms were and a formal treatment of the range of "counter-evasion" detected. (This means more than half of each the worm’s strategies we are developing is beyond the scope of this paper. packets were detected as anomalous content.) From the plot we can see that although the three sites are quite different in D. Worm Detection Evaluation payload distribution, P-DPL can successfully detect all the worms at a very low false positive rate. To provide a concrete In this section, we provide experimental evidence of the example we measured the average false alerts per hour for these effectiveness of P-DPL to detect incoming worms. In our three sites. For 0.1% false positive rate, the EX dataset has 5.8 previous RAID paper [18], we showed P-DPL’s accuracy for alerts per hour, W1 has 6 alerts per hour and W has 8 alerts per the DARPA99 dataset, which contains a lot of artifacts that hour. make the data too regular [16]. Here we report how P-DPL We manually checked the packets that were deemed performs over the three real-world datasets using known worms false positives. Indeed, most of these are actually quite available for our research. Since all three datasets were anomalous containing very odd abnormal payload. For captured from real traffic, there is no ground truth, and example, in the EX dataset, there are weird file uploads, in one measuring accuracy was not immediately possible. We thus case a whole packet containing nothing but a repetition of a needed to create test sets with ground truth, and we applied character with byte value E7 as part of a word file. Other Snort for this purpose. packets included unusual HTTP Get requests, with the referrer Each dataset was split into two distinct chrono field padded with many ―Y‖ characters via product providing logically-ordered portions, one for training and the other for anonymization. testing, following the 80%-20% rule. For each test dataset, we We note that some worms might fragment their
  • 7. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 content into a series of tiny packets to evade detection. For this comparison is performed against the packet contents and a problem, P-DPL buffers and concatenates very small packets of string similarity score is computed. If the score is higher than a session prior to testing. some threshold, we treat this as possible worm propagation and We also tested the detection rate of the W32.Blaster block or delay this outgoing traffic. This is different from the worm (MS03-026) on TCP port 135 port using real RPC traffic common quarantining or containment approaches which block inside Columbia’s CS department. Despite being much more all the traffic to or from some machine. P-DPL will only block regular compared to HTTP traffic, the worm packets in each traffic whose content is deemed very suspicious, while all other case were easily detected with zero false positives. Although at traffic may precede unabated maintaining critical services. first blush, 5-8 alerts per hour may seem too high, a key There are many possible metrics which can apply to contribution of this paper is a method to correlate multiple decide the similarity of two strings. The several approaches we alerts to extract from the stream of alerts true worm events. have considered, tested and evaluated include: 1. String equality (SE) IV. Worm Propagation Detection and signature This is the most intuitive approach. We decide that Generation by Correlation propagation has started only if the egress payload is exactly the same as the ingress suspect packet. This In the previous section, we described the results using metric is very strict and good at reducing false P-DPL to detect anomalous packet content. We extended the positives, but too sensitive to any tiny change in the detection strategy to model both inbound and outbound traffic packet payload. If the worm changes a single byte or from a protected host, computing models of content flows for just changes its packet fragmentation, the anomalous ingress and egress packets. The strategy thus implies that packet correlation will miss the propagation attempt. within a protected LAN, some infected internal host will begin (The same is true when comparing thumbprints of a propagation sending outbound anomalous packets. When this content.) occurs for any host in the LAN, we wish to inoculate all other 2. Longest common substring (LCS) hosts by generating and distributing worm packet signatures to The next metric we considered is the LCS approach. other hosts for content filtering. LCS is less exact than SE, but avoids the We leverage the fact that self-propagating worms will fragmentation problem and other small payload start attacking other machines automatically by replicating manipulations. The longer the LCS that is computed itself, or at least the exploit portion of its content, shortly after a between two packets, the greater the confidence that host is infected. (Polymorphic worms may randomly pad their the suspect anomalous ingress/egress packets are more content, but the exploit should remain intact.) Thus if we detect similar. The main shortcoming of this approach is its these anomalous egress packets to port i that are very similar to computation overhead compared to string equality, those anomalous ingress traffic to port i, there is a high although it can also be implemented in linear time. probability that a worm that exploits the service at port i has 3. Longest common subsequence (LCSeq): started its propagation. Note that these are the very first packets This is similar to LCS, but the longest common of the propagation, unlike the other approaches which have to subsequence need not be contiguous. LCSeq has the wait until the host has already shown substantial amounts of advantage of being able to detect polymorphic worms, unusual scanning and probing behavior. Thus, the worm may but it may introduce more false positives. For each be stopped at its very first propagation attempt from the first pair of strings that are compared, we compute a victim even if the worm attempts to be slow and stealthy to similarity score, the higher the score, the more similar avoid detection by probe detectors. We describe the the strings are to each other. For SE, the score is 0 or ingress/egress correlation strategy in the following section. We 1, where 1 means equality. For both LCS and LCSeq, note, however, that the same strategy can be applied to ingress we use the percentage of the LCS or LCSeq length out packets flowing from arbitrary (external) sources to internal of the total length of the candidate strings. Let’s say target IP's. Hence, ingress/ingress anomalous packet correlation string s1 has length L1, and string s2 has length L2, may be viewed as a special case of this strategy. and their LCS/LCSeq has length C. We compute the Careful treatment of port-forwarding protocols and similarity score as 2*C/( L1+ L2). This normalizes the services, such as P2P and NTP (Port 123) is required to apply score in the range of [0...1], where 1 means the strings this correlation strategy, otherwise normal port forwarding may are exactly equal. be misinterpreted as worm propagations. Our work in this area involves two strategies, truncation of packets (focusing on Since we may have to check each outgoing packet (to control data) and modeling of the content of media. This work port i) against possibly many suspect strings inbound to port i, is beyond the scope of this paper due to space limitations, and we need to concern ourselves with the computational costs and will be addressed in a future paper. storage required for such a strategy. On a real server machine, e.g., a web server, there are large numbers of incoming requests A. Ingress and Egress Traffic Correlation but very few, if any, outgoing requests to port 80 from the server (to other servers). So any outgoing request is already When P-DPL detects some incoming anomalous quite suspicious, and we should compare each of them against traffic to port i, it generates an alert and places the packet the suspects. If the host machine is used as both a server and a content on a buffer list of ―suspects‖. Any outbound traffic to client simultaneously, then both incoming and outgoing port i that is deemed anomalous is compared to the buffer. The requests may occur frequently. This is mitigated somewhat by
  • 8. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 the fact that we check only packets deemed anomalous, not Different fragmentation for CR and CRII every possible packet flowing to and from a machine. We apply the same modeling technique to the outgoing traffic and only compare the egress traffic we already labeled as anomalous. Incoming Outgoing B. Automatic Worm Signature Generation 1448, 1448, 1143 4, 13, 362, 91, 1460, 1460, 649 There is another very important benefit that accrues from the ingress/egress packet content correlation and string 4, 375, 1460, 1460, 740 similarity comparison: automatic worm signature generation. 4, 13, 453, 1460, 1460, The computation of the similarity score produces the matching 649 sub-string or subsequence which represents the common part of the ingress and egress malicious traffic. This common Code Red II (total 3818 bytes) subsequence serves as a signature content-filter. Ideally, a Incoming Outgoing worm signature should match worms and only worms. Since the traffic being compared is already judged as anomalous, and 1448, 1448, 922 1460, 1460, 898 has exhibited propagation behavior quite different from normal behavior – and the similar malicious payload is being sent to To evaluate the accuracy of worm propagation the same service at other hosts, these common parts are very detection, we appended the propagation trace at the very end of possibly core exploit strings and hence can represent the worm one full day’s network data from each of the three sites. When signature. By using LCSeq, we may capture even polymorphic we collected the trace from our attack network, we not only worms since the core exploit usually remains the same within captured the incoming port 80 requests, but also all the each worm instance even though it may be reordered within the outgoing traffic directed to port 80. We checked each dataset packet datagram. Thus, by correlating the ingress and egress manually, and found there is a small number of outgoing malicious payload, we are able to detect the very initial worm packets for the servers that produced the datasets W and W1, as propagation, and compute its signature immediately. Further, if we expected, and not a single one for the EX dataset. Hence, we distribute these strings to collaborating sites, they too can any egress packets to port 80 would be obviously anomalous leverage the added benefit of corroborating suspects they may without having to inspect their content. For this experiment, we have detected, and they may choose to employ content filters, captured all suspect incoming anomalous payloads in an preventing them from being exploited by a new and zero-day unlimited sized buffer for comparison across all of the available worm. data in our test sets. We also purposely lowered P-DPL’s threshold setting (after calibration) in order to generate a very V. Evaluations high number of suspects in order to test the accuracy of the string comparison and packet correlation strategies. In other In this section, we evaluate the performance of words, we increased the noise (increasing the number of false ingress/egress correlation and the quality of the automatically positives) in order to determine how well the correlation can generated signatures. Since none of the machines were attacked still separate out the important signal in the traffic (the actual by worms during our data collection time at the three sites, we worm content). launched real worms to un-patched Windows 2000 machines in The result of this experiment is displayed in the a controlled environment. For testing purposes, the packet following table for the different similarity metrics. The number traces of the worm propagation were merged into the three in the parenthesis is the threshold used for the similarity score. sites’ packet flows as if the worm infection actually happened For an outgoing packet, P-DPL checks the suspect buffer and at each site. Since P-DPL only uses payload, the source and returns the highest similarity score. If the score is higher than target IP addresses of the merged content are irrelevant. the threshold, we judge there is a worm propagation. False Without a complete collection of worms, and with limited alerts suggest that an alert was mistakenly generated for a capability to attack machines, we only tested CodeRed and normal outgoing packet. The reason why SE does not work CodeRed II out of the executable worms we collected. After here is obvious: worm fragmentation blinds the method from launching these in our test environment and capturing the seeing the worm’s entire matching content. The other two packet flow trace, we noticed interesting behavior: after metrics worked perfectly, detecting all the worm propagations infection, these two worms propagate with packets fragmented with zero false alerts. differently than the ones that initially infected the host. In particular, CodeRed can separate ―GET.‖ and Results of correlation for different metrics ―/default.ida?‖ and ―NNN...N‖ into different packets to avoid detection by many signature-based IDSes. The following table shows the Detect propagate False alerts length sequences of different packet fragmentation for CodeRed and CodeRed II. SE Yes No LCS(0.5) Yes No LCSeq(0.5) Yes No
  • 9. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 To use some other traffic to simulate the outgoing traffic of the time, cross-site collaboration and exchange of suspect packet servers. For EX data, we used the outgoing port 80 traffic of payloads might provide a solution. We discuss this in the next other clients in that enterprise as if it originated from the EX section. server itself. For the W1 and W datasets, we used the outgoing VI. Anomalous Payload Collaboration among Sites port 80 traffic from the CS department. Then we repeated the Most current attack detection systems are constrained previous experiments to detect The worm propagation with the to a single ingress point within an enterprise without sharing injected outgoing traffic on each server. The result remains the any information with other sites. There are ongoing efforts that same - using the same thresholds as before, we can successfully share suspicious source IP address [5, 10], but to our detect all the worm propagations without any false alerts. knowledge no such effort exists to share content information across sites in real time until now. Here we focus on evaluating As we mentioned earlier, the worm signature is a the detection accuracy of using collaboration among sites, natural byproduct of the ingress/egress correlation. When we assuming a scaleable, privacy-preserving secured identified a possible worm propagation, the LCS or LCseq can communication infrastructure is available. (We have be used as the worm signature. Figure 6 displays the actual implemented a prototype in Worminator [17].) content signatures computed for the CR II propagations Recall that, in section3.4, we described experiments detected by P-DPL in a style suitable for deployment in Snort. measuring the diversity of the models computed at multiple Note the signature contains some of the ystem calls used to sites. As we saw, the different sites tested have different normal infect a host, which is one of the reasons the false positive rate payload models. This implies from a statistical perspective that is so low for these detailed signatures. they should also have different false positive alerts. Any ―common or highly similar anomalous payloads‖ detected among two or more sites logically would be |d0|$@|0 ff|5|d0|$@|0|h|d0| @|0|j|1|j|0|U|ff||d0| @| U|f caused by a common worm exploit targeting many sites. Cross- 5|d8|$@|0 e8 19 0 0 0 c3 ff|%`0@|0 ff|%d0@|0 ff|%` site or cross- domain sharing may thus reduce the false positive ff|%h0@|0 ff|%p0@|0 ff|%t0@|0 ff|%x0@|0 ff|%| ff| problem at each site, and may more accurately identify worm 0@|fc fc fc fc fc fc fc fc fc fc fc fc fc fc LORER.EXE outbreaks in the earliest stages of an infection. fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0|EXP To test this idea, we used the traffic from the three |0 0 0 0 0 0 0 0|SOFTWAREMicrosoftWindows NT sites. There are two goals we seek to achieve in this CurrentVersionWinlogon|0 0 0 0 0|SFCDisable|0 0 experiment. One is to test whether different sites can help 9 9 9d ffff ff ff|SYSTEMCurrentControlSetService confirm with each other that a worm is spreading and attacking sW3SVCParametersVirtual Roots|0 0 0 00 0 0|/Scr the Internet. The other is to test whether false alerts can be ipts|0 0 0 0|/MSADC| 0 00 0|/C|0 0 0|/D|0 00|c:,,21 reduced, or even eliminated at each site when content alerts are 7|0 0 0 0 00 0|d:,,217|fc fc f cfc f cfc fc fc fc fc fc fc correlated. In this experiment, we used the following simple Fig. 7.The initial portion of the P-DPL generated signature for correlation rule: if two alerts from distinct sites are similar, the CodeRed II. two alerts are considered true worm attacks; otherwise they are ignored. Each site’s content alerts act as confirmatory evidence We replicated the above experiments in order to test if of a new worm outbreak, even after two such initial alerts are any normal packet is blocked when we filter the real traffic generated. This is very strict, aiming for the optimal solution to against all the worm signatures generated. For our experiments the worm problem. we used the datasets from all the three sites, which have had the This is a key observation. The optimal result we seek CRII attacks cleaned beforehand, and in all cases no normal is that for any payload alerts generated from the same worm packet was blocked.In these experiments, we used an unlimited launched at two ore more sites, those payloads should be buffer for the incoming suspect payloads. The buffer size similar to each other, but not for normal data from either site essentially stores packets for some period of time that is that was a false positive. That is to say, if a site generates a dependent upon the traffic rate, and the number of anomalous false positive alert about normal traffic it has seen, it will not packet alerts that are generated from that traffic. That amount is produce suspect payloads that any other site will deem to be indeterminate a priori, and is specific to both the environment worm propagation. Since we conjectured that each site’s being sniffed and the quality of the models computed by P-DPL content models are diverse and highly distinct, even the false for that environment. Since CR and CR II launch their positives each site may generate will not match the false propagations immediately after infecting their victim hosts, a positives of other sites; only worms (i.e., true positives) will be buffer holding only the most recent 5 or 10 suspects is enough commonly matched as anomalous data among multiple sites. to detect their propagation. But for slow-propagating or stealthy To make the experiment more convincing, we no worms which might start propagating after an arbitrarily long longer test the same worm traffic against each site as in the hibernation period, the question is how many suspects should previous section, since the sensor will obviously generate the we save in the suspect buffer? If the ingress anomalous exact same payload alert at all the sites. Instead, we use payloads have been removed from the suspect buffer before multiple variants of CodeRed and CodeRed II, which were such a worm starts propagating, P-DPL can no longer detect it extracted from real traffic. To make the evaluation strict, we by correlation. Theoretically, the larger the buffer the better, tested different packet payloads for the same worm, and all the but there is tradeoff in memory usage and computation time. variant packet fragments it generates. We purposely lowered But for those worms that may hibernate for a long period of the P-DPL threshold to generate many more false positives
  • 10. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 from each site than it otherwise would produce. As in the case computed for false P-DPL alerts. The x-axis shows the described above the cross-site correlation uses the same metrics similarity score, defined within the range [0...1], and the y-axis (SE, LCS and LCSeq) to judge whether two payload alerts are is the number of pairs of alerts within the same score range. ―similar‖. However, another problem that we need to consider The similarity scores for the worm alerts are shown separately when we exchange information between sites is privacy. It may as dots on the x-axis. The worm alerts include those for CR and be the case that a site is unwilling to allow packet content to be CR II and their variant fragments. Note that all of the scores revealed to some external collaborating site. a false positive calculated between worm alerts are much higher than those of may reveal true content. the ―false‖ P-DPL alerts and thus they would be correctly A packet payload could be presented by its 1-gram detected as true worms among collaborating sites. The alerts frequency distribution (see Figure 2). This representation that scored too low would not have sufficient corroboration to already aggregates the actual content byte values in a form deem them as true worms. making it nearly impossible (but not totally impossible) to reconstruct the actual payload. (Since byte value distributions do not contain sequential information, the actual content is hard CRII against CRII to recover. 2-gram distributions simplify the problem making it CR against CR more likely to recover the content since adjacent byte values Other Alerts are represented. 3-grams nearly make the problem trivial to recover the actual content in many cases.) However, we note that the 1-gram frequency distribution reordered into the rank ordered frequency distribution produces a distribution that appears quite similar to the exponential decreasing Zipf-like distribution. The rank ordering of the resultant distinct byte values is a string that we call the ―Z-string‖ (as discussed in Section 3.1). One cannot recover the actual content from the Z- String. Rather, only an aggregated representation of the byte value frequencies is revealed, without the actual frequency information. This representation may convey sufficient Similarity Scores of Zstr Metric information to correlate suspect payloads, without revealing the actual payload itself. Hence, false positive content alerts would not reveal true content, and privacy policies would be CRII against CRII maintained among sites. CR against CR In this cross-domain correlation experiment we CRII against CR propose two more metrics which don’t require exchanging raw Other Alerts payloads, but instead only the 1-gram distributions, and the privacy-preserving Z-string representation of the payload: A. Manhattan distance (MD) Manhattan distance requires exchange of the byte distribution of the packet, which has 256 float numbers. Two payloads are similar if they have a small Manhattan distance. The maximum possible MD is 2. So we define the similarity score as (MD)/2, to normalize the score range to the same range of the other metrics described above. B. LCS of Z-string (Zstr) While maintaining maximal privacy preservation, we Similarity Scores of LCSeq Metric perform the LCS on the Z-string of two alerts. The similarity score is the same as the one for LCS, but Fig. 8. Similarity scores of Zstr and LCSeq metrics for here the score evaluates the similarity of two Z-strings, collaboration not the raw payload strings. Figure 7 presents the results achieved by sharing P-DPL alerts The above two plots show the similarity scores using Zstr and among the three sites using CR and CR II and their variant LCSeq metrics. LCS produced a similar result to LCSeq. String packet fragments. The results are shown in terms of the equality and Manhattan distance metrics did not perform well similarity scores computed by each of the metrics. Each plot is in distinguishing true alerts from false ones, so their plots are composed of two different representations: one for false alerts not shown here. The other two metrics presented in Figure 5 (histogram) and the other for worm alerts (dots on the x-axis). give particularly good results. The worms and their variant The bars in the plots are histograms for the similarity scores packet fragments have much higher similarity scores than all
  • 11. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 the other alerts generated at each distinct site. This provides only consider one single host, a stealthy worm can hibernate for some evidence that this approach may work very well in a long period of time until a record of its appearance as an practice and provide reliable information that a new zero-day anomaly is no longer stored in the buffer of suspect packets. attack is ongoing at different sites. However, in the context of collaborating sites, the suspect Note too that each site can contribute to false positive anomaly can be corroborated by some other site that may also reduction since the scores of the suspects are relatively low in have a record of it in their buffer, as a remote site may have a comparison to the true worms. larger buffer or may have received the worm at a different time. Furthermore, the Zstr metric shows the best separation The distributed sites essentially serve as a remote long-term here, and with the added advantage of preserving the privacy of store of information, extending the local buffer memory the exchanged content. These two metrics can also be applied available at one site. Further, this strategy concurrently to the ingress/egress traffic correlation, especially for generates content filtering signatures. Any two sites that polymorphic worms that might re-order their content. correlate and validate suspects as being true worms both have There are two interesting observations from this data. available the actual packet content from which to generate a The circle in the LCSeq plot represents the similarity score signature, even if only Z-strings are exchanged between those when exchanging the alerts among the sites that P-DPL sites. generated for CR and CR II. LCSeq is the only metric that gave VII. Evaluation Results a relatively higher score that is worth noticing, while all the others provide less compelling scores. When we looked back at As the first step in evaluating P-DPL, we compared the tcpdump of CR and CR II, both of them contained the the two functions’ extraction methods (i.e., SM and IDA) and string: the common-function-filtering capability of the CFLs that were ―GET./default.ida?........u9090%u6858%ucbd3%u780 generated using the two methods. Fig. 8 shows the percentage 1%u9090%u6858%ucbd3%‖while CR has a string of repeated of candidate and common function corpora among all functions ―N‖, and CR II has string of repeated ―X‖ padding extracted from the malware set by the SM and IDA-Pro their content. Since subsequences do not need to be adjacent in extraction methods for each CFL size (500, 1000, 2000, 4000, the LCSseq metric, LCSeq ignored the repetitions of and 8000 files). It is evident from the diagram that SM is the unmatched ―N‖ and X substrings and successfully picked capable of extracting more functions from the malware files: out the other common substrings. LCS also had a higher-than- IDA extracted 57 929 functions, while SM extracted 249 158. average score here, but not as good as LCSeq. This example Function size was limited to a minimum of 16 B and a suggests that polymorphic worms attempting to mask maximum of 256 B.6 The figure also shows that SM is capable themselves by changing their padding may be detectable by of trimming a larger portion of functions, which appear in the cross-site collaboration under the LCSeq metric. CFL and that the portion of remaining functions becomes Another observation is that the LCSeq and LCS results smaller along with the increase in the size of the CFL. display several packet content alerts with high similarity scores. The first observation, regarding SM’s extraction These were false alerts generated by the correlation among the superiority, is consistent with IDA-Pro detecting fewer sites. The scores were measured at about 0.4 to 0.5. Although functions from the training set. This is probably due to the fact they are still much smaller than the worm scores, they are that IDA-Pro is more rigid software and cannot deal effectively already outliers since they exceeded the score threshold used in with code obfuscation, which is a prominent technique, this experiment. We inspected the content of these packets, and employed by hackers [30]. However, the SM method, by discovered that they included long padded strings attempting to nature, works on high recall (extracting as many functions as hide the HTTP headers. Some proxies try to hide the query possible), and low precision (many of the extracted function identity by replacing some headers with meaningless characters might not be really functions); thus it extracts more functions. – in our case, consisting of a string of ―Y‖s. Such The second observation, the filtering capability of the two payloads methods, can be explained straightforwardly by the fact that as were correlated as true alerts while using LCSeq/LCS as the size of the CFL grows, the likelihood increases that a metrics, although they are not worms. However, these function extracted from the malware set will appear in the CFL. anomalies did not appear when we used the Zstr metric, since For some malware files, it might be the case that all of the long string of ―Y’s‖ used in padding the HTTP header the extracted functions were identified as common functions, only influences one position in the Z-string, but has no impact and therefore, were filtered out by the CFL. In such cases, the on the remainder of the Z- string. method cannot generate a signature for the malware. Fig. 9 These results suggest that cross-sites collaboration can greatly depicts the percentage of malware that was left without help identify the early appearance of new zero-day worms candidates. The figure shows that IDA missed more malware. while reducing the false positive rates of the constituent P-DPL The reason for that is that 1) it extracts fewer functions and 2) anomaly detectors. The similarity score between worms and IDA only detects functions that are being called from other their variants are much higher than those between ―true‖ false functions using standard protocols. This may not be the case in positives (normal data incorrectly deemed anomalies), and can malware that wishes to camouflage its existence. As expected, be readily separated with high accuracy. for both methods, increasing the CFL also increases the missed When several sites on the Internet detect similar malware, but also the gap between IDA and SM narrows. Figs anomalous payloads directed at them, they can confirm and 10–13 depict the detection rate of candidate signatures and validate with each other with high confidence that an attack is signatures selected for the malware set in the control file set. underway. As we mentioned earlier, this strategy can also solve This rate serves as a measure of the false-positive detection rate the limited buffer size problem described in Section 4.3. If we
  • 12. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 of malware in benign files.We checked the false-positive rate of signature candidates in the control set files. Fig. 10 depicts the percentage of signature candidate detected in the control set as a function of the candidate’s length in bytes and CFL size. We expected that the false-positive rate would drop for longer signature candidates and for larger CFL. The length of a signature candidate affects the probability of finding the same byte sequence in an arbitrary file. Indeed, regardless of function and signature extraction techniques, (SM/IDA), short candidates caused most of the hits. Consequently, based on the diagram, we recommend using function candidates if their length is above 112 B in order to ensure a lower false-positive rate. An exception is shown with the SM with CFL size 500 MB and SM with CFL size 1000 MB, where we see a high false-positive rate for candidates that are 160–176 B. Using Fig. 9.Percentage/number of common functions versus CFL of 2000 MB and more eliminates the problematic candidate functions extracted from malware set for IDA and candidates. Additionally, we can see that using a larger CFL SM for several CFL dataset sizes. contributes considerably to lowering the false-positive rate in both function extraction methods. In Fig. 11, we compare the false-positive rate of the two function extraction methods: SM and IDA-Pro for different CFL sizes. With both extraction methods, the false-positive rate is reduced when a large CFL is used. Compared to IDA, the SM method achieves a lower false-positive rate when using CFL with a size greater than 1000 files. Next, we compare the mean false-positive rate (averaged over both SM and IDA function extraction methods) when using candidates with/without a 16-B offset (see Fig. 12) and when randomly choosing a signature or by using the entropy score see Fig. 13). As expected, adding a 16-B offset to the candidate functions and using the entropy score to choose a signature helps in reducing the percentage of signatures detected in the control set files. This was consistent for all tested CFL sizes. The entropy selection method favors large Fig 10. Malware without signature candidates—the percentage (signatures; approximately 80% of the signatures that the of malware that were left without candidates (i.e., all extracted entropy-based method selected were larger than 112 most functions were filtered by the functions in the CFL), and thus, significant improvement when the CFL size is increased. This the method cannot extract signatures. is shown by the detection rate declining from 2.7% to 0% for a CFL greater than 2000 MB. Note that detection rate in this The worse signature-generation method is IDA when context B. Selecting a candidate randomly shows that only 50% not using an offset of 16 B. However, this method, when are 112 B and more. This observation complies with the results combined with CFL containing 8000 presented in Fig. 9 regarding the recommended size of candidates. Since the entropy method evidently showed better results than Rand, we continued to investigate the signature- generating methods using only the entropy method. The goal of the next and final experiment was to show how IDA and SM methods are affected by using offset along with the signature candidate for different CFL sizes. It also provides the most significant improvement when the CFL size is increased. This is shown by the detection rate declining from 2.7% to 0% for a CFL greater than 2000 MB. Note that detection rate in this context relates to the false positives or, undesirable detection on a malware signature in benign files; thus, a lower detection rate is better. Fig. 11. False-positive rate of candidates in the control set files as a function of the candidate size in bytes.
  • 13. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 files, still manages to have a low false positive rate (FPR) of Fig. 13.Comparing the mean false-positive rate (averaged over 0.4%. Finally, we tested the signatures generated by P-DPL for SM and IDA function extraction methods) with/without adding false negatives using a DefensePro intrusion detection offset bytes to function candidates. appliance. False negatives in the context of P-DPL mean that a signature generated for a malware file was not identified in an Fig. 14. Comparing the mean false-positive rate (averaged over SM and IDA function extraction methods) when using random (Rand) signature selection and entropy-based selection. The Fig. 12. Comparing the false-positive rate for the two function entropy-based heuristic performs better than when using the extraction methods: SM and IDA-Pro (IDA) as a function of the random selection method, for all CFL sizes. CFL size. Entropy-based selection. The entropy-based heuristic instance of the same malware (e.g., as a result of a long with performs better than when using the random selection method, signature split over multiple packets). Therefore, false for all CFL sizes.Different fragmentation from given graft chart negatives depend on the detection engine. The malware over SM and IDA function extraction methods. detection capability of DefensePro is based on IP packet inspection and does not reconstruct the files. We uploaded the signatures extracted by P-DPL to the DefensePro signature database and configured the device to reset any session for which a packet was identified with a malware signature. We transmitted all malware (for which P-DPL successfully generated a signature) via the DefensePro (at the maximal speed that we could load the link) and executed several tests in which DefensePro successfully removed all malware. Additionally, we measured the time required by P- DPL to generate a signature. The extraction time of a signature as a function of the file size range from 100 - 900. A linear increase in signature extraction time as a function of the file size. VIII. Conclusion In this paper we propose a new automatic mechanism, as P-DPL for extracting signatures from malware files. Signatures generated by P-DPL are comprised of multiple byte- strings, which are used by high speed network and malware filtering devices. To minimize the risk of false positives P-DPL employs a method for creating extracting files from originate from the underlying standard development platforms and malicious programs developed by these platforms.
  • 14. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 We tested our method in a network-security laboratory based network intrusion detection systems,‖ IEEE Trans. on various configurations in terms IDA-Pro, SM the IDA-Pro, Syst. Man, Cybern.—Part C, vol. 38, no. 5, pp. 649–659, SM are fast technique for extracting functions from assembly Sep. 2008. files. However SM for new compilers done by manually this makes high prone to errors. In order to overcome this [8] A. Shabtai, D. Potashnik, Y. Fledel, R. Moskovitch, and Y. limitation, we are developed P-DPL and support viability of the Elovici, ―Monitoring, analysis and filtering system for general approach proposed by this research which suggested purifying network traffic of known and unknown that general code identified as the functions in the program can malicious content,‖ Secur. Commun. Netw. [Online]. be discarded. Realizing P-DPL in generating signatures for high DOI: speed network appliances for system methodology for building 10.1002/sec.229. common repositories. The global variety of development platforms are facilitated by the Internet, ensuring the external [9] Y. Tang, B. Xiao, and X. Lu, ―Using a bioinformatics validity of this study relies substantially on reaching a critical approach to generate accurate exploit-based signatures for mass of malware files. P-DPL is also helpful for creating and polymorphic worms,‖ Comput.Secur., vol. 28, pp. indentify allergy attacks against any signature is automatically 827– created this type of attack is mainly relevant based and 842, 2009. realistic that use machine-learning algorithms In learning-base algorithm can make the automated signature-generation method [10] Jacob, G., Debar, H., Filiol, E.: Behavioral detection of to consider malicious data. malware: from a survey towards an established taxonomy. In order to cope with these fully obfuscated malware Journal in Computer Virology 4(3) (2008) on the packet easily on the high-speed deep packet inspection devices. We believe that P-DPL should be implemented for [11] Singh, S., Estan, C., Varghese, G., Savage, S.: Automated filtering most of the malware by using high-speed malware worm fingerprinting. In: OSDI’04: Proceedings of the 6th filtering devices. We plan for detecting and extracting the conference on Symposium on Operating Systems Design binary code as well as selecting the best signature out of the & Implementation, Berkeley, CA, USA, USENIX collection of candidates using probability and variance method. Association (2004) 4–4 In regular expressions defined by two or more distinct signatures can be used in order to minimize the risk of [12] Newsome, J., Song, D.: Dynamic taint analysis for malwares. automatic detection, analysis, and signature generation of exploits on commodity software. In: Proceedings of the References Network and Distributed System Security Symposium (NDSS 2005). (2005) [1] S. B. Cho, ―Incorporating soft computing techniques into a probabilistic intrusion detection system,‖ IEEE Trans. [13] Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou, Syst., Man, Cybern.—Part C, vol. 32, no. 2, pp. 154–160, L., Zhang, L., Barham, P.: Vigilante: end-to-end May 2002. containment of internet worms. In: SOSP ’05: Proceedings of the twentieth ACM symposium on Operating systems [2] ssel, and P. Laskov, principles, New York, NY, USA, ACM (2005) 133–147 ―Learning and u classification of malware behavior,‖ in Proc. Conf. Detect. Intrusions Malware Vulnerability [14 ] Xu, J., Ning, P., Kil, C., Zhai, Y., Bookholt, C.: Automatic Assessment, Springer Press, 2008, pp. 108–125. diagnosis and response to memory corruption vulnerabilities. In: CCS ’05: Proceedings of the 12th ACM [3] M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. conference on Computer and communications security, Jahanian, and J. Nazario, ―Automated classification and New York, NY, USA, ACM (2005) 223–234 analysis of internet malware,‖ in Proc. 12th Int. Symp. Recent Adv. Intrusion Detect., Springer Press, 2007, pp. [15] M. Damashek. Gauging similarity with n-grams: language 178–197. independent categorization of text. Science, 267(5199) :843--848, 1995 [4] K. Griffin, S. Schneider, X. Hu, and T. Chiueh, ―Automatic generation of string signatures for malware detection,‖ in [16] R. Lippmann, et al. The 1999 DARPA Off-Line Intrusion Proc. 12th Int. Symp. Recent Adv. Intrusion Detect., Detection Evaluation, Computer Networks 34(4) 579-595, Springer Press, 2009, pp. 101–120. 2000. [5] G. Jacob, H. Debar, and E. Filiol, ―Behavioral detection of [17] M. Locasto, J. Parekh, S. Stolfo, A. Keromytis, T. Malkin malware: From a survey towards an established and V. Misra. Collaborative Distributed Intrusion taxonomy,‖ J. Comput. Virol. vol. 4, pp. 251–266, 2008. Detection, Columbia University Tech Report CUCS-012- 04, 2004. [6] D. Gryaznov, ―Scanners of the year 2000: Heuristics,‖ in Proc. 5th Int. Virus Bull., 1999, pp. 225–234. [18] K. Wang and S. Stolfo. Anomalous payload-based network [7] J. Zhang, M. Zulkernine, and A. Haque, ―Random-forests- intrusion detection, in Proceedings of Recent Advance in Intrusion Detection (RAID), Sept. 2004.
  • 15. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 AUTHOR DETAILS 1 N.Kannaiya Raja received MCA degree from Alagappa University and ME degree in Computer Science and Engineering from Anna University Chennai in 2007 and he is pursing PhD degree in Manonmaniam Sundranar University from 2008 and joined assistant professor in various engineering collages in Tamil Nadu affiliated to Anna University and has eight years teaching experience his research work in deep packet inspection. He has been session chair in major conference and workshops in computer vision on algorithm, network, mobile communication, image processing papers and pattern reorganization. His current primary areas of research are packet inspection and network. He is interested to conduct guest lecturer in various engineering in Tamil Nadu. 2 Dr.K.Arulanandam received Ph.D. doctorate degree in 2010 from Vinayaka Missions University. He has twelve years teaching experience in various engineering colleges in Tamil Nadu which are affiliated to Anna University and his research experience network, mobile communication networks, image processing papers and algorithm papers. Currently working in Ganadipathy Tulasi’s Jain Engineering College Vellore. 3 M.Balaji received B.Tech degree in Information Technology from Anna University Chennai in 2008 and now pursuing ME degree in Computer Science and Engineering in Arulmigu Meenakshi Amman College of Engineering affiliated to Anna University Chennai.