Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
714 728
1. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
A FAST POSITIVE APPROACH OF P-DPL IN THE PACKET INSPECTION
1
N.Kannaiya Raja, 2K.Arulanandam, 3M.Balaji
(system calls, network access and files, and memory
Abstract-The signature extraction process is based on a modifications) [1]–[3]. In static detection method is based on
comparison with a common function repository. By eliminatin information explicitly extracted or implicitly from the
functions appearing in the common function repository from the executable source code. The main processing of static detection
signature candidate list, P-DPL can minimize the risk of false-positive method is in providing rapid categorized. Since antivirus
detection errors. To minimize false-positive rates for P-DPL proposes vendors are handling every day an overcome amount of suspect
intelligent candidate selection using entropy score to generate files for inspection [4], fast detection is essential. Static method
signatures. Evaluation of P-DPL was conducted under various analysis solutions are mainly implemented using two methods:
conditions. The findings suggest that the proposed method can be used signature-based and heuristic-based. Signature based methods
for automatically generating signatures that are both specific, sensitive.
In this paper we propose a new automatic mechanism, termed P-DPL
trust on the finding the unique strings in the source code [4].
for extracting signatures from malware files and unwanted mapping The algorithmic methods are based on procedure, which are
files. Signatures generated by P-DPL are comprised of multiple byte- either determined by expert staff or by machine that specify a
strings, which can be used by high-speed, network-based, malware malicious [5], [6]. As a case in point, Zhang et al. [7] in the
filtering devices. In order to minimize the risk of false positives (i.e., random forest data-mining algorithm to detect misuse and
detection of a malware signature in benign executable files), P-DPL abnormal network intrusions. The time period of time from the
employs a method for sanitizing executable file from chunks of code release of an unknown malware until security
that originate from the underlying standard development platforms and software/hardware vendors update their client with the proper
replicated in various instances of begins and malicious programs malware signature is extremely critical. At this time, the
developed by these platforms. In this method we have developed a
new innovative form to find malicious data in the packet. We believe
malware is undetectable by most signature-based solutions and
that P-DPL Another direction we intend to examine is the use of a is usually termed a zero attack. This malware can easily spread
malware function library (MFL) in the signature generation process in and corrupt all machines, it is extremely essential to detect it as
order to further strengthen the signatures and minimize the risk of false soon as possible. So that signature-based solutions generate a
positives. In addition, regular expressions defined by two or more suitable signature for block all threats. Defend organizations
distinct signatures can be used in order to further minimize the risk of by prevent all type of malware. Carry through deep packet
false positives. check all signatures for detecting and removing attacks such as
malware spreads worms, denial-of-service, or distant
Key words—Packet-Deployment payload (P-DPL), Automatic exploitation of vulnerabilities. Monitor network for prevent
signature generation (ASG), malware, malware filtering.
performance. Devices analysis the content of the packets. The
I. INTRODUCTION process of generating unique signatures for malware filtering
devices. Different methods are used for automatic signature
n communication system are highly hyper sensitized to generation have been proposed in domain. The techniques
I various types of attack. A parliamentary way of processing focusing on malware, worm, and where the signature is
these attacks is by means of malicious software, such as worms, extracted that the after the malware is executed in the course of
viruses, and Trojan horses. When it is spread, it can cause launching the attack. Different methods processed to extract
severe problems to all users, companies, and governments. signatures from full-fledged malware executables that may
Now the development in high-speed Internet connections gives contain a significant portion of code emanating from
a higher level for creating and rapidly spread the new malware. development tools and platforms. In this research we find the
Several techniques for detecting and deleting malware have problems, and evaluate an automatic signature generation
been proposed. They are two types one is static and another one technique for P-DPL.
dynamic. In dynamic detection method is based on information
collected from the operating system at execution of the program
1
N.Kannaiya Raja, M.E., (P.hd) .,A.P/CSE Dept.
Arulmigu Meenakshi Amman College of Engg,
Thiruvannamalai Dt, near Kanchipuram
Kanniya13@hotmail.co.in
2
Dr. K.Arulanandam, Prof &
Head, CSE Department
Ganadipathy Tulsi’s Jain Engineering College, Vellore
sakthsivamkva@gmail.com
3
M. Balaji, M.E.,
Arulmigu Meenakshi Amman College of
Engg,Thiruvannamalai Dt, near Kanchipuram. Fig 1 P-DPL creation and signature generating processes
mbalaji23@gmail.com .
P-DPL is created for multiple-string, signatures that
can be used in intrusion detection systems for filtering
malware. To improve its imprecision, P-DPL process and
2. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
complete and structured method, which are extracts the ordered sets of multiple tokens that match multiple variants of
malware’s unique code from other segments of common and multiform worms. Honeycomb overlays parts of the flows in
usually benign code, such as library files. When the sensitivity the traffic and uses a longest common substring (LCS)
process ends, the remaining codes are the malicious code. Since algorithm to spot similarities in packet payloads. Subsequently
the process is go on by generating a unique signature from the designed a double-honey pot system and introduced the
malicious code, which can be used for removing the malware. position-aware distribution signatures (PADS) that are
The main objective of this research is creating signatures from computed from polymorphic worm samples and are composed
malware, spyware, Trojan horses, worms and viruses. The main of a byte frequency distribution instead of a fixed value for
hypothesis is that in a superior step, suspected files are each position in the signature ―string.‖ Tang et al. [11]
classified as benign or malicious by a human expert or by an use sequence alignment techniques, drawn from bio informatics,
automated detection tool. This processing allows us to focus on to derive simplified regular aspect exploit-based signatures.
the signature-generation process, but it also in the quality of the Exploit-based signatures can be generated quickly to detect
signatures on the accuracy of the of mistrustful files. P-DPL zero-attack exploits of uncovered vulnerabilities. However, low
was used as the automatic signature generation (ASG) module damage on multiform malware. And also the signatures created
of the eDare (early detection, alert, and response) framework by the above techniques that are extracted and tested for short,
[8] eDare is aimed at mitigating the spread of both known and worm, malware, the fact is that the malware, for example
unknown malware in computer networks. eDare operates by viruses and Trojan horses, can be as large executable files, it
first monitoring network traffic and filtering out known consist of full-fledged applications. These files usually contain
malware using high-speed filtering devices that are a significant portion of different code segments that are spread
continuously updated with signatures generated by P-DPL by the software development platform spawning the malware.
.Next; unknown files are extracted from the remaining traffic For this case the large malware files, selecting a signature that
and examined using various machine-learning and temporal will be both sensitive and specific. Another limitation of these
reasoning methods in order to classify the files as malicious or techniques is that they focus on detecting malware after it has
benign. P-DPL is implemented in the last step to extract been unleashed and try to generate a signature from the traffic it
signatures from newly detected malicious files. When eDare creates at the time attack is being processed. A payload-based
identifies a new threat, P-DPL automatically produces a signature finding the malware code. In this paper falls into the
signature, and then, the filtering devices that are stationed on payload-based signature concept. Payload-based signature
the network infrastructure are automatically updated. This generation methods are presented in [4]. At present a two-step
process is very fast, and also faster than when human statistical method for automatically extracting well, ―the best‖
intervention, it is effective against zero attacks. The P-DPL signatures from the code of a malware. First of all programs
technique and a set of research that were performed on a on detached machines are intentionally affected with the virus.
collection of malicious and benign executable. We were work The affected portion of the program are analyze with one
is in finding the length and selection of a signature among another to found that regions of the virus are constant from one
several candidates. instance to another. These regions are considered as signature
candidates. The second phase estimates the probability that
II. Related Works each of all candidate signatures will match a randomly chosen.
The candidate with the lowest estimated false-positive is
Since the signature must be general enough to capture selected as a signature. The Hancock system [4] was proposed
as instances of the malware, Thus far sufficiently specific to for automatically extracting signatures for antivirus software.
avoid over lapping with the content of normal traffic in order to Based on several heuristics, the Hancock system generates a set
minimize false positives. The malware signatures can be of signature candidates, selecting the candidates that are not
classified as vulnerability-based, exploit-based and payload- likely to be found in benign code. Our approach, Hancock
based [9]. A vulnerability-based signature describes the relies on modeling benign code in order to minimize false-
properties of a certain bug in the system that can be maliciously alarm risks. The Auto-Sign signature generator modeled both
exploited by the malware. Vulnerability-based signatures do benign and malicious code using byte 3-grams representation in
not process to detect each every malicious code exploiting the order to select good signature candidates. Next, the signature
vulnerability; it is very effective when dealing with multiform candidates are ranked according to three different measures in
malware. Even though, a vulnerability-based signature can be order to select the best signature. Although, the Hancock
generated only when the vulnerability is find. An exploit-based system and Auto-Sign differ from our approach, which is
signature describes a sequence of commands triggered by the semantic aware in the sense that it does not rely on arbitrary
malware, which process exploits vulnerability in the system. byte code sequences, but the code representing internal
Exploit-based methods include Autograph, P-DPL sensor Net functions of the software. In addition, the methods presented in
spy .which focus on analyzing similarities in packet payloads [4] and focus on generating signatures for antivirus software,
belonging to network. These systems first identify abnormal the limitation of signature length is not necessarily considered.
traffic originating from distrustful IP addresses, and then, Other solutions have been proposed for protecting systems and
generate a signature by identifying most frequently occurring preventing an attack beforehand rather than detecting the attack
byte sequences. The Nemean architecture first clusters similar after it has been launched. This can be done by generating
sessions, and uses machine-learning techniques to generate signatures based on sequences of instructions that represent
semantic-aware signatures for each cluster. Polygraph expands malicious or benign behavior. These sequences can be extracted
the notion of single substring signatures to be joined, and to either by statically analyzing the program after disassembly or
3. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
by monitoring the program during execution. For example, resilience of signatures to polymorphic malware variants.
protecting a system from buffer overflow attacks can be Another common method for detecting polymorphic malware is
achieved by: 1) creating signatures for legitimate instruction to incorporate semantics awareness into signatures. For
blocks and matching instruction sequences of monitored example, Christodorescu et al. proposed static semantics-aware
programs with the signature repository 2) using obfuscation of malware detection. They applied a matching algorithm on the
pointers in such a way that a malicious application that tries to disassembled binaries to find the instruction sequences that
exploit a buffer overflow vulnerability will not be able to match the manually generated templates of malicious
create valid pointers or 3) by applying array and pointer behaviors, e.g., decryption loop. A framework for automatic
boundary checking. As opposed to such methods, our goal in generation of intrusion signatures from honey net packet traces.
this research is to generate signatures for high-speed traffic Nemean applied clustering techniques on connections and
filtering devices that do not rely on installation or modification sessions to create protocol-semantic-aware signatures, thereby
of end points and that will protect the end points at the network reducing the possibility of false alarms.
level. In summation, each of the aforementioned techniques Another loosely related area is the automatic
suffers from at least one critical limitation. Some rely on small generation of attack signatures, vulnerability signatures and
and coherent malware files, but such files may not constitute software patches. TaintCheck [12] and Vigilante [13] applied
the general case. Other techniques rely on observing malware taint analysis to track the propagation of network inputs to data
behavior, but such malware cannot always be fully monitored. used in attacks, e.g., jump addresses, format strings and system
Other methods search for packet similar, not assure true low call arguments, which are used to create signatures for the
false positive. Our method disregards the malware size attacks. Other heuristic-based roaches [14] have also been
assumption. In addition, it does not require activating the proposed to exploit properties of specific exploits (e.g., buffer
malware. overflow) and create attack signatures. Generalizing from these
Modern anti-virus software typically employ a variety approaches, Brumley et al. proposed a systematic method that
of methods to detect malware programs, such as signature- used a formal model to reason about vulnerability signatures
based scanning, heuristic-based detection, and behavioral and quantify the signature qualities. An alternative approach to
detection [10]. Although less proactive, signature-based preventing malware from exploiting vunerabilities is to apply
malware scanning is still the most prevalent approach to data patches in the firewalls to filter malicious traffic. To
identify malware because of its efficiency and low false automatically generate data patches. Which leveraged the
positive rate. Traditionally, the malware signatures are created knowledge of data format of malicious attacks to generate
manually, which is both slow and error-prone. As a result, potential attack instances and then created signatures from the
efficient generation of malware signatures has become a major instances that successfully exploit the vulnerabilities?
challenge for anti-virus companies to handle the exponential Hancock differs from previous work by focusing on
growth of unique malware files. To solve this problem, several automatically generating high-coverage string signatures with
automatic signature generation approaches have been proposed. extremely low false positives. Our research was based loosely
Most previous work focused on creating signatures on the virus signature extraction, which was commercially used
that are used by Network Intrusion Detection Systems (NIDS) by IBM. They used a 5-gram Markov chain model of good
to detect network worms. Singh et al. Proposed EarlyBird [11], software to estimate the probability that a given byte sequence
which used packet content prevalence and address dispersion to would show up in good software. They tested hand-generated
automatically generate worm signatures from the invariant signatures and found that it was quite easy to set a model
portions of worm payloads. Autograph exploited a similar idea probability threshold with a zero false positive rate and a
to create worm signatures by dividing each suspicious network modest false negative rate (the fraction of rejected signatures
flow into blocks terminated by some breakmark and then that would not be found in goodware) of 48%. They also
analyzing the prevalence of each content block. The suspicious generated signatures from assembly code (as Hancock does),
flows are selected by a port-scanning flow classifier to reduce rather than data, and identified candidate signatures by running
false positives. Kreibich and Crowcroft developed Honeycomb, the malware in a test environment. Hancock does not do this, as
a system that uses honeypots to gather inherently suspicious dynamic analysis is very slow in large-scale applications.
traffic and generates signatured by applying the longest Symantec acquired this technology from IBM in the mid-90s
common sub string (LCS) algorithm to search for similarities in and found that it led to many false positives. The Symantec
the packet payloads. One potential drawback of signatures engineers believed that it worked well for IBM because IBM’s
generated from previous approaches is that they are all anti-virus technology was used mainly in corporate
continuous strings and may fail to match polymorphic worm environments, making it much easier for IBM to collect a
payloads. Polygraph instead searched for invariant content in representative set of goodware. By contrast, signatures
the network flows and created signatures consisting of multiple generated by Hancock are mainly for home users, who have a
disjoint content sub strings. Polygraph also utilized a naive much broader set of goodware. The model’s training set cannot
Bayes classifier to allow the probabilistic matching and possibly contain, or even represent, all of this goodware. This
classification, and thus provided better proactive detection poses a significant challenge for Hancock in avoiding FP-prone
capabilities. A system that used a model-based algorithm to signatures.
analyze the invariant contents of polymorphic worms and
analytically prove the attack-resilience of generated signatures. III. Payload Based Anomaly Detection
PDAS (Position-Aware Distribution Signatures) took advantage
of a statistical anomaly-based approach to improve the A. Overview of the P-DPL Sensor:
4. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
The P-DPL sensor is based on the principle that zero- To compare the similarity between test data at detection time
day attacks are delivered in packets whose data is unusual and and the trained models computed during the training period, P-
distinct from all prior ―normal content‖ flowing to or from the DPL uses simplified Mahalanobis distance [18]. Mahalanobis
victim site. We assume that the packet content is available to distance. To compare the similarity between test data at time
the sensor for modeling3. We compute a normal profile of a and the trained models computed during the training period, P-
site’s unique content flow, and use this information to detect DPL uses simplified Mahalanobis distance [18]. Mahalanobis
anomalous data. A ―profile‖ is a model or a set of models that distance weights each variable, the mean frequency of a 1-
represent the set of data seen during training. Since we are gram, by its standard deviation and covariance. The distance
profiling content data flows, the method must be general to values produced by the models are then subjected to a threshold
work across all sites and all services, and it must be efficient test. If the distance of a test datum is greater than the threshold,
and accurate.Our initial design of P-DPL uses a ―language P-DPL issues an alert for the packet. There is a distinct
independent‖ methodology, the statistical distribution of n- threshold setting for each centroid computed automatically by
grams [15] extracted from network packet datagrams. This P-DPL during a calibration step.
methodology requires no parsing, no interpretation and no
emulation of the content.
An n-gram is the sequence of n adjacent byte values in
a packet payload. A sliding window with width n is passed over
the whole payload one byte at a time and the frequency of each
n-gram is computed. This frequency count distribution
represents a statistical centroid or model of the content flow.
The normalized average frequency and the variance of each
gram are computed. The first implementation of P-DPL uses
the byte value distribution when n=1. The statistical means and
variances of the 1- grams are stored in two 256-element
vectors. However, we condition a distinct model on the port (or
service) and on packet length, producing a set of statistical
centroids that in total provides a fine-grained, compact and
effective model of a site’s actual content flow. Full details of
this method and its effectiveness are described in [18].
The first packet of CRII illustrates the 1-gram data
representation implemented in P-DPL. Figure 1 shows a portion Fig. 3. CRII payload distribution (top plot) and its rank
of the CRII packet, and its computed byte value distribution order distribution (bottom plot)
along with the rank ordered distribution is displayed in Figure
2, from which we extract a Z-string. The Z-string is a the string
To calibrate the sensor, a sample of test data is measured
of distinct bytes whose frequency in the data is ordered from
against the centroids and an initial threshold setting is chosen.
most frequent to least, serving as representative of the entire
A subsequent round of testing of new data updates the
distribution, ignoring those byte values that do not appear in the
threshold settings to calibrate the sensor to the operating
data. The rank ordered distribution appears similar to the Zipf
environment. Once this step converges, P-DPL is ready to enter
distribution, and hence the name Z-string. The Z-string
detection mode. Although the very initial results of testing P-
representation provides a privacy-preserving summary of
DPL looked quite promising, we devised several improvements
payload that may be exchanged between domains without
to the modeling technique to reduce the percentage of false
revealing the true content. Z-strings are not used for detection,
positives.
but rather for message exchange and cross domain correlation
of alerts.
B. New P-DPL Features Multiple Centroids
P-DPL is a fully automatic, ―hands-free‖ online
GET./default.ida?XXXXXXXXXXX x anomaly detection sensor. It trains models and determines when
XXXXXXXXXXXXXXXXXXXXXX they are stable; it is self-calibrating, automatically observes
XXXXXXXXXXXXXXXXXXXXXX itself, and updates its models as warranted. The most important
XXXXXXXXXXXXXXXXXXXXXX new feature implemented in P-DPL over our prior work is the
XXXXXXXXXXXXXXXXXXXXXX use of multiple centroids, and ingress/egress correlation. In the
XXXXXXXXXXXXXXXXXXXXXX first implementation, P-DPL computes one centroid per length
bin, followed by a stage of clustering similar centroids across
XXXXXXXXXXXXXXXXXXXXXX neighboring bins. We previously computed a model Mij for
XXXXXX%u9090%u6858%ucbd3%u7 each specific observed packet payload length i of each port j. In
801%u9090%u6858%ucbd3%u7801%u this newer version, we compute a set of models Mkij , k≥1.
9090%u6858%ucbd3%u7801%u9090% Hence, within each length bin, multiple models are computed
u9090%u8190%u00c3%u0003%u8b000 prior to a final clustering stage. The clustering is now executed
%u531b%u53ff%u0078%u0000%u0 u0 across centroids within a length bin, and then memory
requirements for models while representing normal content
Fig. 2. A portion of the first packet of CodeRed II flow more accurately and revealing anomalous data with
5. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
greater clarity. Since there might be different types of payload the byte distributions among the sites for the same length bin.
sent to the same service, e.g., pure text, .pdf, or .jpg, we used This is confirmed by the values of Manhattan distances
an incremental online clustering algorithm to create multiple computed between the distributions, with results displayed in
centroids to model the traffic with finer granularity. This Table 1.
modeling idea can be extended to include centroids for different The content traffic among the sites is quite different.
media that may be transmitted in packet flows. Different file For example, the EX dataset is more complex containing file
and media types follow their own characteristic 1-gram uploads of different media types (pdf, jpg, ppt, etc. ) and
distribution; including models for standard file types can help webmail traffic; the W dataset contain less of this type of traffic
reduce false positives. The multi-centroid strategy requires a while W1 is the simplest, containing almost no file uploads.
different test methodology. During testing,an alert will be Hence, each of the site-specific payload models is diverse,
generated by P- DPL if a test packet matches none of the increasing the likelihood that a worm payload will be detected
centroids within its length bin. The multicentroid technique by at least one of these sites. To avoid detection, the worm
produces more accurate payload models and separates the exploit would have to be padded in such a way that its content
anomalous payloads in a more precise manner. description would appear to be normal concurrently for all of
these sites.
C. Data Diversity across Sites
A crucial issue we study is whether or not payload
models are truly distinct across multiple sites. This is an
important question in a collaborative security context. We have
claimed that the monoculture problem applies not only to
common services and applications, but also to security
technologies. Hence, if a site is blind to a zero-day attack this
implies that many other sites are blind to the same attack.
Researchers are considering solutions to the monoculture
problem by various techniques that
―diversify‖ implementations. We conjecture that
the content data flow among different sites is already diverse
even when running the exact same services. In our previous
work we have shown that byte distributions differ for each port
and length. We also conjecture that it should be different for
each host. For example, each web server contains different
ASCII characters 0-255
URLs, implements different functionality like web email or
media uploads, and the population of service requests and Fig. 4. Example byte distribution for Payload length 249 of
responses sent to and from each site may differ, producing a port 80 for the three sites EX, W, W1, in order from top to three
diverse set of content Profiles across all collaborating hosts sites EX, W, W1 bottom
and sites. Hence, each host or site’s profile will be
substantially different from all others. A
zero-day attack that may appear as normal data at one site, will EX-data Length 1380
likely not appear as normal data at other sites since the normal
profiles are different. We test whether or not this conjecture is
true by several experiments.
One of the most difficult aspects of doing research in
this area is the lack of real- world datasets available to
researchers that have full packet content for formal scientific
study4. Privacy policies typically prevent sites from sharing
their content data. However, we were able to use data from
three sources, and show the distribution or each. The first one is
an external commercial organization that wishes to remain
anonymous, which we call EX. The others are the two web W1-data Length 1380
servers of the CS Department of Columbia,
www.cs.columbia.edu and www1.cs.columbia.edu.
We call these two data sets W and W1, respectively. The
following plots show the profiles of the traffic content flow of
each site. The plots display the payload distributions for ASCII characters 0-255
different packet payload lengths i.e. 249 bytes and 1380 bytes,
spanning the whole range of possible payload lengths in order
to give a general view of the diversity of the data coming from Fig. 5. Example byte distribution for payload length of 1380 of
the three sites. Each byte distribution corresponds to the first port 80 for the
centroid that is built for the respective payload lengths. We
observe from the above plots that there is a visible difference in Table 1. The Manhattan distance between the byte distributions
6. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
of the profiles computed for the three sites, for three length first created a clean set of packets free of any known worms
bins. still flowing on the Internet as background radiation. We then
inserted the same set of worm traffic into the cleaned test set
using tcpslice. Thus, we created ground truth in order to
249 bytes 940 bytes 1380 bytes compute the accuracy and false positive rates.
The worm set includes CodeRed, CodeRed II,
MD(EX, W) 0.4841 0.6723 0.2533 WebDAV, and a worm that exploits the IIS Windows media
MD(EX,W1) 0.3710 0.8120 0.4962 service, the nsiislog.dll buffer overflow vulnerability (MS03-
022). These worm samples were collected from real traffic as
MD(W,W1) 0.3689 0.5972 0.6116 they appeared in the wild, from both our own dataset and from
a third-party. Because P-DPL only considers the packet
payload, the worm set is inserted at random places in the test
Mimicry attacks are possible if the attacker has access to the data. The ROC plots in Figure 5 show the result of the detection
same information as the victim. In the case of application rate versus false positive rate over varying threshold settings of
payloads, attackers (including worms) would not know the the P-DPL sensor.
distribution of the normal flow to their intended victim. The
attacker would need to sniff each site for a long period of time
and analyze the traffic in the same fashion as the detector
described herein, and would also then need to figure out how to All worms reliably detected
pad their poison payload to mimic the normal model. This is a
daunting task for the attacker who would have to be clever
indeed to guess the exact distribution as well as the threshold
logic to deliver attack data that would go unnoticed.
Additionally, any attempt to do this via probing, crawling or
other means is very likely to be detected.
Besides mimicry attack, clever worm writers may
figure a way to launch 'training attacks’ against anomaly
detectors such as P-DPL. In this case, the worm may send a
stream of content with increasing diversity to its next victim
site in order to train the content sensor to produce models
where its exploit no longer would appear anomalous. This as
well is a daunting task for the worm. The worm would be
fortunate indeed to launch its training attack when the sensor is False Positive Rate(%)
in training mode and that a stream of diverse data would go Fig. 6 ROC of P-DPL detecting incoming worms, false positive
unnoticed while the sensor is in detection mode. Furthermore, rate restricted to less than 0.5%
the worm would have to be extremely lucky that each of the
content examples it sends to train the sensor would produce a The detection rate and false positive are both based on
"non-error" response from the intended victim. Indeed, P-DPL the number of packets. The test set contains 40 worm packets
ignores content that does not produce a normal service although there are only 4 actual worms in our zoo. The plots
response. These two evasion techniques, mimicry and training show the results for each data set, where each graphed line is
attack, is part of our ongoing research on anomaly detection, the detection rate of the sensor where all 4 worms were
and a formal treatment of the range of "counter-evasion" detected. (This means more than half of each the worm’s
strategies we are developing is beyond the scope of this paper. packets were detected as anomalous content.) From the plot we
can see that although the three sites are quite different in
D. Worm Detection Evaluation payload distribution, P-DPL can successfully detect all the
worms at a very low false positive rate. To provide a concrete
In this section, we provide experimental evidence of the example we measured the average false alerts per hour for these
effectiveness of P-DPL to detect incoming worms. In our three sites. For 0.1% false positive rate, the EX dataset has 5.8
previous RAID paper [18], we showed P-DPL’s accuracy for alerts per hour, W1 has 6 alerts per hour and W has 8 alerts per
the DARPA99 dataset, which contains a lot of artifacts that hour.
make the data too regular [16]. Here we report how P-DPL We manually checked the packets that were deemed
performs over the three real-world datasets using known worms false positives. Indeed, most of these are actually quite
available for our research. Since all three datasets were anomalous containing very odd abnormal payload. For
captured from real traffic, there is no ground truth, and example, in the EX dataset, there are weird file uploads, in one
measuring accuracy was not immediately possible. We thus case a whole packet containing nothing but a repetition of a
needed to create test sets with ground truth, and we applied character with byte value E7 as part of a word file. Other
Snort for this purpose. packets included unusual HTTP Get requests, with the referrer
Each dataset was split into two distinct chrono field padded with many ―Y‖ characters via product providing
logically-ordered portions, one for training and the other for anonymization.
testing, following the 80%-20% rule. For each test dataset, we We note that some worms might fragment their
7. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
content into a series of tiny packets to evade detection. For this comparison is performed against the packet contents and a
problem, P-DPL buffers and concatenates very small packets of string similarity score is computed. If the score is higher than
a session prior to testing. some threshold, we treat this as possible worm propagation and
We also tested the detection rate of the W32.Blaster block or delay this outgoing traffic. This is different from the
worm (MS03-026) on TCP port 135 port using real RPC traffic common quarantining or containment approaches which block
inside Columbia’s CS department. Despite being much more all the traffic to or from some machine. P-DPL will only block
regular compared to HTTP traffic, the worm packets in each traffic whose content is deemed very suspicious, while all other
case were easily detected with zero false positives. Although at traffic may precede unabated maintaining critical services.
first blush, 5-8 alerts per hour may seem too high, a key There are many possible metrics which can apply to
contribution of this paper is a method to correlate multiple decide the similarity of two strings. The several approaches we
alerts to extract from the stream of alerts true worm events. have considered, tested and evaluated include:
1. String equality (SE)
IV. Worm Propagation Detection and signature This is the most intuitive approach. We decide that
Generation by Correlation propagation has started only if the egress payload is
exactly the same as the ingress suspect packet. This
In the previous section, we described the results using metric is very strict and good at reducing false
P-DPL to detect anomalous packet content. We extended the positives, but too sensitive to any tiny change in the
detection strategy to model both inbound and outbound traffic packet payload. If the worm changes a single byte or
from a protected host, computing models of content flows for just changes its packet fragmentation, the anomalous
ingress and egress packets. The strategy thus implies that packet correlation will miss the propagation attempt.
within a protected LAN, some infected internal host will begin (The same is true when comparing thumbprints of
a propagation sending outbound anomalous packets. When this content.)
occurs for any host in the LAN, we wish to inoculate all other 2. Longest common substring (LCS)
hosts by generating and distributing worm packet signatures to The next metric we considered is the LCS approach.
other hosts for content filtering. LCS is less exact than SE, but avoids the
We leverage the fact that self-propagating worms will fragmentation problem and other small payload
start attacking other machines automatically by replicating manipulations. The longer the LCS that is computed
itself, or at least the exploit portion of its content, shortly after a between two packets, the greater the confidence that
host is infected. (Polymorphic worms may randomly pad their the suspect anomalous ingress/egress packets are more
content, but the exploit should remain intact.) Thus if we detect similar. The main shortcoming of this approach is its
these anomalous egress packets to port i that are very similar to computation overhead compared to string equality,
those anomalous ingress traffic to port i, there is a high although it can also be implemented in linear time.
probability that a worm that exploits the service at port i has 3. Longest common subsequence (LCSeq):
started its propagation. Note that these are the very first packets This is similar to LCS, but the longest common
of the propagation, unlike the other approaches which have to subsequence need not be contiguous. LCSeq has the
wait until the host has already shown substantial amounts of advantage of being able to detect polymorphic worms,
unusual scanning and probing behavior. Thus, the worm may but it may introduce more false positives. For each
be stopped at its very first propagation attempt from the first pair of strings that are compared, we compute a
victim even if the worm attempts to be slow and stealthy to similarity score, the higher the score, the more similar
avoid detection by probe detectors. We describe the the strings are to each other. For SE, the score is 0 or
ingress/egress correlation strategy in the following section. We 1, where 1 means equality. For both LCS and LCSeq,
note, however, that the same strategy can be applied to ingress we use the percentage of the LCS or LCSeq length out
packets flowing from arbitrary (external) sources to internal of the total length of the candidate strings. Let’s say
target IP's. Hence, ingress/ingress anomalous packet correlation string s1 has length L1, and string s2 has length L2,
may be viewed as a special case of this strategy. and their LCS/LCSeq has length C. We compute the
Careful treatment of port-forwarding protocols and similarity score as 2*C/( L1+ L2). This normalizes the
services, such as P2P and NTP (Port 123) is required to apply score in the range of [0...1], where 1 means the strings
this correlation strategy, otherwise normal port forwarding may are exactly equal.
be misinterpreted as worm propagations. Our work in this area
involves two strategies, truncation of packets (focusing on Since we may have to check each outgoing packet (to
control data) and modeling of the content of media. This work port i) against possibly many suspect strings inbound to port i,
is beyond the scope of this paper due to space limitations, and we need to concern ourselves with the computational costs and
will be addressed in a future paper. storage required for such a strategy. On a real server machine,
e.g., a web server, there are large numbers of incoming requests
A. Ingress and Egress Traffic Correlation but very few, if any, outgoing requests to port 80 from the
server (to other servers). So any outgoing request is already
When P-DPL detects some incoming anomalous quite suspicious, and we should compare each of them against
traffic to port i, it generates an alert and places the packet the suspects. If the host machine is used as both a server and a
content on a buffer list of ―suspects‖. Any outbound traffic to client simultaneously, then both incoming and outgoing
port i that is deemed anomalous is compared to the buffer. The requests may occur frequently. This is mitigated somewhat by
8. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
the fact that we check only packets deemed anomalous, not Different fragmentation for CR and CRII
every possible packet flowing to and from a machine. We apply
the same modeling technique to the outgoing traffic and only
compare the egress traffic we already labeled as anomalous.
Incoming Outgoing
B. Automatic Worm Signature Generation 1448, 1448, 1143 4, 13, 362, 91, 1460,
1460, 649
There is another very important benefit that accrues
from the ingress/egress packet content correlation and string 4, 375, 1460, 1460, 740
similarity comparison: automatic worm signature generation. 4, 13, 453, 1460, 1460,
The computation of the similarity score produces the matching 649
sub-string or subsequence which represents the common part of
the ingress and egress malicious traffic. This common Code Red II (total 3818 bytes)
subsequence serves as a signature content-filter. Ideally, a Incoming Outgoing
worm signature should match worms and only worms. Since
the traffic being compared is already judged as anomalous, and 1448, 1448, 922 1460, 1460, 898
has exhibited propagation behavior quite different from normal
behavior – and the similar malicious payload is being sent to To evaluate the accuracy of worm propagation
the same service at other hosts, these common parts are very detection, we appended the propagation trace at the very end of
possibly core exploit strings and hence can represent the worm one full day’s network data from each of the three sites. When
signature. By using LCSeq, we may capture even polymorphic we collected the trace from our attack network, we not only
worms since the core exploit usually remains the same within captured the incoming port 80 requests, but also all the
each worm instance even though it may be reordered within the outgoing traffic directed to port 80. We checked each dataset
packet datagram. Thus, by correlating the ingress and egress manually, and found there is a small number of outgoing
malicious payload, we are able to detect the very initial worm packets for the servers that produced the datasets W and W1, as
propagation, and compute its signature immediately. Further, if we expected, and not a single one for the EX dataset. Hence,
we distribute these strings to collaborating sites, they too can any egress packets to port 80 would be obviously anomalous
leverage the added benefit of corroborating suspects they may without having to inspect their content. For this experiment, we
have detected, and they may choose to employ content filters, captured all suspect incoming anomalous payloads in an
preventing them from being exploited by a new and zero-day unlimited sized buffer for comparison across all of the available
worm. data in our test sets. We also purposely lowered P-DPL’s
threshold setting (after calibration) in order to generate a very
V. Evaluations high number of suspects in order to test the accuracy of the
string comparison and packet correlation strategies. In other
In this section, we evaluate the performance of words, we increased the noise (increasing the number of false
ingress/egress correlation and the quality of the automatically positives) in order to determine how well the correlation can
generated signatures. Since none of the machines were attacked still separate out the important signal in the traffic (the actual
by worms during our data collection time at the three sites, we worm content).
launched real worms to un-patched Windows 2000 machines in The result of this experiment is displayed in the
a controlled environment. For testing purposes, the packet following table for the different similarity metrics. The number
traces of the worm propagation were merged into the three in the parenthesis is the threshold used for the similarity score.
sites’ packet flows as if the worm infection actually happened For an outgoing packet, P-DPL checks the suspect buffer and
at each site. Since P-DPL only uses payload, the source and returns the highest similarity score. If the score is higher than
target IP addresses of the merged content are irrelevant. the threshold, we judge there is a worm propagation. False
Without a complete collection of worms, and with limited alerts suggest that an alert was mistakenly generated for a
capability to attack machines, we only tested CodeRed and normal outgoing packet. The reason why SE does not work
CodeRed II out of the executable worms we collected. After here is obvious: worm fragmentation blinds the method from
launching these in our test environment and capturing the seeing the worm’s entire matching content. The other two
packet flow trace, we noticed interesting behavior: after metrics worked perfectly, detecting all the worm propagations
infection, these two worms propagate with packets fragmented with zero false alerts.
differently than the ones that initially infected the host. In
particular, CodeRed can separate ―GET.‖ and Results of correlation for different metrics
―/default.ida?‖
and ―NNN...N‖ into different packets to avoid detection by
many signature-based IDSes. The following table shows the Detect propagate False alerts
length sequences of different packet fragmentation for
CodeRed and CodeRed II. SE Yes No
LCS(0.5) Yes No
LCSeq(0.5) Yes No
9. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
To use some other traffic to simulate the outgoing traffic of the time, cross-site collaboration and exchange of suspect packet
servers. For EX data, we used the outgoing port 80 traffic of payloads might provide a solution. We discuss this in the next
other clients in that enterprise as if it originated from the EX section.
server itself. For the W1 and W datasets, we used the outgoing VI. Anomalous Payload Collaboration among Sites
port 80 traffic from the CS department. Then we repeated the Most current attack detection systems are constrained
previous experiments to detect The worm propagation with the to a single ingress point within an enterprise without sharing
injected outgoing traffic on each server. The result remains the any information with other sites. There are ongoing efforts that
same - using the same thresholds as before, we can successfully share suspicious source IP address [5, 10], but to our
detect all the worm propagations without any false alerts. knowledge no such effort exists to share content information
across sites in real time until now. Here we focus on evaluating
As we mentioned earlier, the worm signature is a the detection accuracy of using collaboration among sites,
natural byproduct of the ingress/egress correlation. When we assuming a scaleable, privacy-preserving secured
identified a possible worm propagation, the LCS or LCseq can communication infrastructure is available. (We have
be used as the worm signature. Figure 6 displays the actual implemented a prototype in Worminator [17].)
content signatures computed for the CR II propagations Recall that, in section3.4, we described experiments
detected by P-DPL in a style suitable for deployment in Snort. measuring the diversity of the models computed at multiple
Note the signature contains some of the ystem calls used to sites. As we saw, the different sites tested have different normal
infect a host, which is one of the reasons the false positive rate payload models. This implies from a statistical perspective that
is so low for these detailed signatures. they should also have different false positive alerts. Any
―common or highly similar anomalous payloads‖
detected among two or more sites logically would be
|d0|$@|0 ff|5|d0|$@|0|h|d0| @|0|j|1|j|0|U|ff||d0| @| U|f caused by a common worm exploit targeting many sites. Cross-
5|d8|$@|0 e8 19 0 0 0 c3 ff|%`0@|0 ff|%d0@|0 ff|%`
site or cross- domain sharing may thus reduce the false positive
ff|%h0@|0 ff|%p0@|0 ff|%t0@|0 ff|%x0@|0 ff|%| ff|
problem at each site, and may more accurately identify worm
0@|fc fc fc fc fc fc fc fc fc fc fc fc fc fc LORER.EXE
outbreaks in the earliest stages of an infection.
fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0|EXP
To test this idea, we used the traffic from the three
|0 0 0 0 0 0 0 0|SOFTWAREMicrosoftWindows NT
sites. There are two goals we seek to achieve in this
CurrentVersionWinlogon|0 0 0 0 0|SFCDisable|0 0
experiment. One is to test whether different sites can help
9 9 9d ffff ff ff|SYSTEMCurrentControlSetService
confirm with each other that a worm is spreading and attacking
sW3SVCParametersVirtual Roots|0 0 0 00 0 0|/Scr
the Internet. The other is to test whether false alerts can be
ipts|0 0 0 0|/MSADC| 0 00 0|/C|0 0 0|/D|0 00|c:,,21
reduced, or even eliminated at each site when content alerts are
7|0 0 0 0 00 0|d:,,217|fc fc f cfc f cfc fc fc fc fc fc fc
correlated.
In this experiment, we used the following simple
Fig. 7.The initial portion of the P-DPL generated signature for correlation rule: if two alerts from distinct sites are similar, the
CodeRed II. two alerts are considered true worm attacks; otherwise they are
ignored. Each site’s content alerts act as confirmatory evidence
We replicated the above experiments in order to test if of a new worm outbreak, even after two such initial alerts are
any normal packet is blocked when we filter the real traffic generated. This is very strict, aiming for the optimal solution to
against all the worm signatures generated. For our experiments the worm problem.
we used the datasets from all the three sites, which have had the This is a key observation. The optimal result we seek
CRII attacks cleaned beforehand, and in all cases no normal is that for any payload alerts generated from the same worm
packet was blocked.In these experiments, we used an unlimited launched at two ore more sites, those payloads should be
buffer for the incoming suspect payloads. The buffer size similar to each other, but not for normal data from either site
essentially stores packets for some period of time that is that was a false positive. That is to say, if a site generates a
dependent upon the traffic rate, and the number of anomalous false positive alert about normal traffic it has seen, it will not
packet alerts that are generated from that traffic. That amount is produce suspect payloads that any other site will deem to be
indeterminate a priori, and is specific to both the environment worm propagation. Since we conjectured that each site’s
being sniffed and the quality of the models computed by P-DPL content models are diverse and highly distinct, even the false
for that environment. Since CR and CR II launch their positives each site may generate will not match the false
propagations immediately after infecting their victim hosts, a positives of other sites; only worms (i.e., true positives) will be
buffer holding only the most recent 5 or 10 suspects is enough commonly matched as anomalous data among multiple sites.
to detect their propagation. But for slow-propagating or stealthy To make the experiment more convincing, we no
worms which might start propagating after an arbitrarily long longer test the same worm traffic against each site as in the
hibernation period, the question is how many suspects should previous section, since the sensor will obviously generate the
we save in the suspect buffer? If the ingress anomalous exact same payload alert at all the sites. Instead, we use
payloads have been removed from the suspect buffer before multiple variants of CodeRed and CodeRed II, which were
such a worm starts propagating, P-DPL can no longer detect it extracted from real traffic. To make the evaluation strict, we
by correlation. Theoretically, the larger the buffer the better, tested different packet payloads for the same worm, and all the
but there is tradeoff in memory usage and computation time. variant packet fragments it generates. We purposely lowered
But for those worms that may hibernate for a long period of the P-DPL threshold to generate many more false positives
10. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
from each site than it otherwise would produce. As in the case computed for false P-DPL alerts. The x-axis shows the
described above the cross-site correlation uses the same metrics similarity score, defined within the range [0...1], and the y-axis
(SE, LCS and LCSeq) to judge whether two payload alerts are is the number of pairs of alerts within the same score range.
―similar‖. However, another problem that we need to consider The similarity scores for the worm alerts are shown separately
when we exchange information between sites is privacy. It may as dots on the x-axis. The worm alerts include those for CR and
be the case that a site is unwilling to allow packet content to be CR II and their variant fragments. Note that all of the scores
revealed to some external collaborating site. a false positive calculated between worm alerts are much higher than those of
may reveal true content. the ―false‖ P-DPL alerts and thus they would be correctly
A packet payload could be presented by its 1-gram detected as true worms among collaborating sites. The alerts
frequency distribution (see Figure 2). This representation that scored too low would not have sufficient corroboration to
already aggregates the actual content byte values in a form deem them as true worms.
making it nearly impossible (but not totally impossible) to
reconstruct the actual payload. (Since byte value distributions
do not contain sequential information, the actual content is hard
CRII against CRII
to recover. 2-gram distributions simplify the problem making it CR against CR
more likely to recover the content since adjacent byte values Other Alerts
are represented. 3-grams nearly make the problem trivial to
recover the actual content in many cases.) However, we note
that the 1-gram frequency distribution reordered into the rank
ordered frequency distribution produces a distribution that
appears quite similar to the exponential decreasing Zipf-like
distribution. The rank ordering of the resultant distinct byte
values is a string that we call the ―Z-string‖ (as discussed in
Section 3.1).
One cannot recover the actual content from the Z-
String. Rather, only an aggregated representation of the byte
value frequencies is revealed, without the actual frequency
information. This representation may convey sufficient
Similarity Scores of Zstr Metric
information to correlate suspect payloads, without revealing the
actual payload itself. Hence, false positive content alerts would
not reveal true content, and privacy policies would be
CRII against CRII
maintained among sites. CR against CR
In this cross-domain correlation experiment we CRII against CR
propose two more metrics which don’t require exchanging raw Other Alerts
payloads, but instead only the 1-gram distributions, and the
privacy-preserving Z-string representation of the payload:
A. Manhattan distance (MD)
Manhattan distance requires exchange of the byte
distribution of the packet, which has 256 float
numbers. Two payloads are similar if they have a
small Manhattan distance. The maximum possible MD
is 2. So we define the similarity score as (MD)/2, to
normalize the score range to the same range of the
other metrics described above.
B. LCS of Z-string (Zstr)
While maintaining maximal privacy preservation, we Similarity Scores of LCSeq Metric
perform the LCS on the Z-string of two alerts. The
similarity score is the same as the one for LCS, but Fig. 8. Similarity scores of Zstr and LCSeq metrics for
here the score evaluates the similarity of two Z-strings, collaboration
not the raw payload strings.
Figure 7 presents the results achieved by sharing P-DPL alerts The above two plots show the similarity scores using Zstr and
among the three sites using CR and CR II and their variant LCSeq metrics. LCS produced a similar result to LCSeq. String
packet fragments. The results are shown in terms of the equality and Manhattan distance metrics did not perform well
similarity scores computed by each of the metrics. Each plot is in distinguishing true alerts from false ones, so their plots are
composed of two different representations: one for false alerts not shown here. The other two metrics presented in Figure 5
(histogram) and the other for worm alerts (dots on the x-axis). give particularly good results. The worms and their variant
The bars in the plots are histograms for the similarity scores packet fragments have much higher similarity scores than all
11. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
the other alerts generated at each distinct site. This provides only consider one single host, a stealthy worm can hibernate for
some evidence that this approach may work very well in a long period of time until a record of its appearance as an
practice and provide reliable information that a new zero-day anomaly is no longer stored in the buffer of suspect packets.
attack is ongoing at different sites. However, in the context of collaborating sites, the suspect
Note too that each site can contribute to false positive anomaly can be corroborated by some other site that may also
reduction since the scores of the suspects are relatively low in have a record of it in their buffer, as a remote site may have a
comparison to the true worms. larger buffer or may have received the worm at a different time.
Furthermore, the Zstr metric shows the best separation The distributed sites essentially serve as a remote long-term
here, and with the added advantage of preserving the privacy of store of information, extending the local buffer memory
the exchanged content. These two metrics can also be applied available at one site. Further, this strategy concurrently
to the ingress/egress traffic correlation, especially for generates content filtering signatures. Any two sites that
polymorphic worms that might re-order their content. correlate and validate suspects as being true worms both have
There are two interesting observations from this data. available the actual packet content from which to generate a
The circle in the LCSeq plot represents the similarity score signature, even if only Z-strings are exchanged between those
when exchanging the alerts among the sites that P-DPL sites.
generated for CR and CR II. LCSeq is the only metric that gave VII. Evaluation Results
a relatively higher score that is worth noticing, while all the
others provide less compelling scores. When we looked back at As the first step in evaluating P-DPL, we compared
the tcpdump of CR and CR II, both of them contained the the two functions’ extraction methods (i.e., SM and IDA) and
string: the common-function-filtering capability of the CFLs that were
―GET./default.ida?........u9090%u6858%ucbd3%u780 generated using the two methods. Fig. 8 shows the percentage
1%u9090%u6858%ucbd3%‖while CR has a string of repeated of candidate and common function corpora among all functions
―N‖, and CR II has string of repeated ―X‖ padding extracted from the malware set by the SM and IDA-Pro
their content. Since subsequences do not need to be adjacent in extraction methods for each CFL size (500, 1000, 2000, 4000,
the LCSseq metric, LCSeq ignored the repetitions of and 8000 files). It is evident from the diagram that SM is
the unmatched ―N‖ and X substrings and successfully picked capable of extracting more functions from the malware files:
out the other common substrings. LCS also had a higher-than- IDA extracted 57 929 functions, while SM extracted 249 158.
average score here, but not as good as LCSeq. This example Function size was limited to a minimum of 16 B and a
suggests that polymorphic worms attempting to mask maximum of 256 B.6 The figure also shows that SM is capable
themselves by changing their padding may be detectable by of trimming a larger portion of functions, which appear in the
cross-site collaboration under the LCSeq metric. CFL and that the portion of remaining functions becomes
Another observation is that the LCSeq and LCS results smaller along with the increase in the size of the CFL.
display several packet content alerts with high similarity scores. The first observation, regarding SM’s extraction
These were false alerts generated by the correlation among the superiority, is consistent with IDA-Pro detecting fewer
sites. The scores were measured at about 0.4 to 0.5. Although functions from the training set. This is probably due to the fact
they are still much smaller than the worm scores, they are that IDA-Pro is more rigid software and cannot deal effectively
already outliers since they exceeded the score threshold used in with code obfuscation, which is a prominent technique,
this experiment. We inspected the content of these packets, and employed by hackers [30]. However, the SM method, by
discovered that they included long padded strings attempting to nature, works on high recall (extracting as many functions as
hide the HTTP headers. Some proxies try to hide the query possible), and low precision (many of the extracted function
identity by replacing some headers with meaningless characters might not be really functions); thus it extracts more functions.
– in our case, consisting of a string of ―Y‖s. Such The second observation, the filtering capability of the two
payloads methods, can be explained straightforwardly by the fact that as
were correlated as true alerts while using LCSeq/LCS as the size of the CFL grows, the likelihood increases that a
metrics, although they are not worms. However, these function extracted from the malware set will appear in the CFL.
anomalies did not appear when we used the Zstr metric, since For some malware files, it might be the case that all of
the long string of ―Y’s‖ used in padding the HTTP header the extracted functions were identified as common functions,
only influences one position in the Z-string, but has no impact and therefore, were filtered out by the CFL. In such cases, the
on the remainder of the Z- string. method cannot generate a signature for the malware. Fig. 9
These results suggest that cross-sites collaboration can greatly depicts the percentage of malware that was left without
help identify the early appearance of new zero-day worms candidates. The figure shows that IDA missed more malware.
while reducing the false positive rates of the constituent P-DPL The reason for that is that 1) it extracts fewer functions and 2)
anomaly detectors. The similarity score between worms and IDA only detects functions that are being called from other
their variants are much higher than those between ―true‖ false functions using standard protocols. This may not be the case in
positives (normal data incorrectly deemed anomalies), and can malware that wishes to camouflage its existence. As expected,
be readily separated with high accuracy. for both methods, increasing the CFL also increases the missed
When several sites on the Internet detect similar malware, but also the gap between IDA and SM narrows. Figs
anomalous payloads directed at them, they can confirm and 10–13 depict the detection rate of candidate signatures and
validate with each other with high confidence that an attack is signatures selected for the malware set in the control file set.
underway. As we mentioned earlier, this strategy can also solve This rate serves as a measure of the false-positive detection rate
the limited buffer size problem described in Section 4.3. If we
12. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
of malware in benign files.We checked the false-positive rate of
signature candidates in the control set files. Fig. 10 depicts the
percentage of signature candidate detected in the control set as
a function of the candidate’s length in bytes and CFL size. We
expected that the false-positive rate would drop for longer
signature candidates and for larger CFL. The length of a
signature candidate affects the probability of finding the same
byte sequence in an arbitrary file. Indeed, regardless of function
and signature extraction techniques, (SM/IDA), short
candidates caused most of the hits. Consequently, based on the
diagram, we recommend using function candidates if their
length is above 112 B in order to ensure a lower false-positive
rate. An exception is shown with the SM with CFL size 500
MB and SM with CFL size 1000 MB, where we see a high
false-positive rate for candidates that are 160–176 B. Using Fig. 9.Percentage/number of common functions versus
CFL of 2000 MB and more eliminates the problematic candidate functions extracted from malware set for IDA and
candidates. Additionally, we can see that using a larger CFL SM for several CFL dataset sizes.
contributes considerably to lowering the false-positive rate in
both function extraction methods.
In Fig. 11, we compare the false-positive rate of the
two function extraction methods: SM and IDA-Pro for different
CFL sizes. With both extraction methods, the false-positive rate
is reduced when a large CFL is used.
Compared to IDA, the SM method achieves a lower
false-positive rate when using CFL with a size greater than
1000 files.
Next, we compare the mean false-positive rate
(averaged over both SM and IDA function extraction methods)
when using candidates with/without a 16-B offset (see Fig. 12)
and when randomly choosing a signature or by using the
entropy score see Fig. 13). As expected, adding a 16-B offset to
the candidate functions and using the entropy score to choose a
signature helps in reducing the percentage of signatures
detected in the control set files. This was consistent for all
tested CFL sizes. The entropy selection method favors large Fig 10. Malware without signature candidates—the percentage
(signatures; approximately 80% of the signatures that the of malware that were left without candidates (i.e., all extracted
entropy-based method selected were larger than 112 most functions were filtered by the functions in the CFL), and thus,
significant improvement when the CFL size is increased. This the method cannot extract signatures.
is shown by the detection rate declining from 2.7% to 0% for a
CFL greater than 2000 MB. Note that detection rate in this The worse signature-generation method is IDA when
context B. Selecting a candidate randomly shows that only 50% not using an offset of 16 B. However, this method, when
are 112 B and more. This observation complies with the results combined with CFL containing 8000
presented in Fig. 9 regarding the recommended size of
candidates.
Since the entropy method evidently showed better
results than Rand, we continued to investigate the signature-
generating methods using only the entropy method.
The goal of the next and final experiment was to show
how IDA and SM methods are affected by using offset along
with the signature candidate for different CFL sizes. It also
provides the most significant improvement when the CFL size
is increased. This is shown by the detection rate declining from
2.7% to 0% for a CFL greater than 2000 MB. Note that
detection rate in this context relates to the false positives or,
undesirable detection on a malware signature in benign files;
thus, a lower detection rate is better. Fig. 11. False-positive rate of candidates in the control set files
as a function of the candidate size in bytes.
13. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
files, still manages to have a low false positive rate (FPR) of
Fig. 13.Comparing the mean false-positive rate (averaged over
0.4%. Finally, we tested the signatures generated by P-DPL for
SM and IDA function extraction methods) with/without adding
false negatives using a DefensePro intrusion detection
offset bytes to function candidates.
appliance. False negatives in the context of P-DPL mean that a
signature generated for a malware file was not identified in an
Fig. 14. Comparing the mean false-positive rate (averaged over
SM and IDA function extraction methods) when using random
(Rand) signature selection and entropy-based selection. The
Fig. 12. Comparing the false-positive rate for the two function entropy-based heuristic performs better than when using the
extraction methods: SM and IDA-Pro (IDA) as a function of the random selection method, for all CFL sizes.
CFL size.
Entropy-based selection. The entropy-based heuristic
instance of the same malware (e.g., as a result of a long with performs better than when using the random selection method,
signature split over multiple packets). Therefore, false for all CFL sizes.Different fragmentation from given graft chart
negatives depend on the detection engine. The malware over SM and IDA function extraction methods.
detection capability of DefensePro is based on IP packet
inspection and does not reconstruct the files. We uploaded the
signatures extracted by P-DPL to the DefensePro signature
database and configured the device to reset any session for
which a packet was identified with a malware signature. We
transmitted all malware (for which P-DPL successfully
generated a signature) via the DefensePro (at the maximal
speed that we could load the link) and executed several tests in
which DefensePro successfully removed all malware.
Additionally, we measured the time required by P-
DPL to generate a signature. The extraction time of a signature
as a function of the file size range from 100 - 900. A linear
increase in signature extraction time as a function of the file
size.
VIII. Conclusion
In this paper we propose a new automatic mechanism,
as P-DPL for extracting signatures from malware files.
Signatures generated by P-DPL are comprised of multiple byte-
strings, which are used by high speed network and malware
filtering devices. To minimize the risk of false positives P-DPL
employs a method for creating extracting files from originate
from the underlying standard development platforms and
malicious programs developed by these platforms.
14. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
We tested our method in a network-security laboratory based network intrusion detection systems,‖ IEEE Trans.
on various configurations in terms IDA-Pro, SM the IDA-Pro, Syst. Man, Cybern.—Part C, vol. 38, no. 5, pp. 649–659,
SM are fast technique for extracting functions from assembly Sep. 2008.
files. However SM for new compilers done by manually this
makes high prone to errors. In order to overcome this [8] A. Shabtai, D. Potashnik, Y. Fledel, R. Moskovitch, and Y.
limitation, we are developed P-DPL and support viability of the Elovici, ―Monitoring, analysis and filtering system for
general approach proposed by this research which suggested purifying network traffic of known and unknown
that general code identified as the functions in the program can malicious content,‖ Secur. Commun. Netw. [Online].
be discarded. Realizing P-DPL in generating signatures for high DOI:
speed network appliances for system methodology for building 10.1002/sec.229.
common repositories. The global variety of development
platforms are facilitated by the Internet, ensuring the external [9] Y. Tang, B. Xiao, and X. Lu, ―Using a bioinformatics
validity of this study relies substantially on reaching a critical approach to generate accurate exploit-based signatures for
mass of malware files. P-DPL is also helpful for creating and polymorphic worms,‖ Comput.Secur., vol. 28, pp.
indentify allergy attacks against any signature is automatically 827–
created this type of attack is mainly relevant based and 842, 2009.
realistic that use machine-learning algorithms In learning-base
algorithm can make the automated signature-generation method [10] Jacob, G., Debar, H., Filiol, E.: Behavioral detection of
to consider malicious data. malware: from a survey towards an established taxonomy.
In order to cope with these fully obfuscated malware Journal in Computer Virology 4(3) (2008)
on the packet easily on the high-speed deep packet inspection
devices. We believe that P-DPL should be implemented for [11] Singh, S., Estan, C., Varghese, G., Savage, S.: Automated
filtering most of the malware by using high-speed malware worm fingerprinting. In: OSDI’04: Proceedings of the 6th
filtering devices. We plan for detecting and extracting the conference on Symposium on Operating Systems Design
binary code as well as selecting the best signature out of the & Implementation, Berkeley, CA, USA, USENIX
collection of candidates using probability and variance method. Association (2004) 4–4
In regular expressions defined by two or more distinct
signatures can be used in order to minimize the risk of [12] Newsome, J., Song, D.: Dynamic taint analysis for
malwares. automatic detection, analysis, and signature generation of
exploits on commodity software. In: Proceedings of the
References Network and Distributed System Security Symposium
(NDSS 2005). (2005)
[1] S. B. Cho, ―Incorporating soft computing techniques into
a probabilistic intrusion detection system,‖ IEEE Trans. [13] Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou,
Syst., Man, Cybern.—Part C, vol. 32, no. 2, pp. 154–160, L., Zhang, L., Barham, P.: Vigilante: end-to-end
May 2002. containment of internet worms. In: SOSP ’05: Proceedings
of the twentieth ACM symposium on Operating systems
[2] ssel, and P. Laskov, principles, New York, NY, USA, ACM (2005) 133–147
―Learning and u classification of malware behavior,‖
in Proc. Conf. Detect. Intrusions Malware Vulnerability [14 ] Xu, J., Ning, P., Kil, C., Zhai, Y., Bookholt, C.: Automatic
Assessment, Springer Press, 2008, pp. 108–125. diagnosis and response to memory corruption
vulnerabilities. In: CCS ’05: Proceedings of the 12th ACM
[3] M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. conference on Computer and communications security,
Jahanian, and J. Nazario, ―Automated classification and New York, NY, USA, ACM (2005) 223–234
analysis of internet malware,‖ in Proc. 12th Int. Symp.
Recent Adv. Intrusion Detect., Springer Press, 2007, pp. [15] M. Damashek. Gauging similarity with n-grams: language
178–197. independent categorization of text. Science, 267(5199)
:843--848, 1995
[4] K. Griffin, S. Schneider, X. Hu, and T. Chiueh, ―Automatic
generation of string signatures for malware detection,‖ in [16] R. Lippmann, et al. The 1999 DARPA Off-Line Intrusion
Proc. 12th Int. Symp. Recent Adv. Intrusion Detect., Detection Evaluation, Computer Networks 34(4) 579-595,
Springer Press, 2009, pp. 101–120. 2000.
[5] G. Jacob, H. Debar, and E. Filiol, ―Behavioral detection of [17] M. Locasto, J. Parekh, S. Stolfo, A. Keromytis, T. Malkin
malware: From a survey towards an established and V. Misra. Collaborative Distributed Intrusion
taxonomy,‖ J. Comput. Virol. vol. 4, pp. 251–266, 2008. Detection, Columbia University Tech Report CUCS-012-
04, 2004.
[6] D. Gryaznov, ―Scanners of the year 2000: Heuristics,‖ in
Proc. 5th Int. Virus Bull., 1999, pp. 225–234. [18] K. Wang and S. Stolfo. Anomalous payload-based network
[7] J. Zhang, M. Zulkernine, and A. Haque, ―Random-forests- intrusion detection, in Proceedings of Recent Advance in
Intrusion Detection (RAID), Sept. 2004.
15. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology
Volume 1, Issue 4, June 2012
AUTHOR DETAILS
1
N.Kannaiya Raja received MCA degree from Alagappa
University and ME degree in Computer Science and
Engineering from Anna University Chennai in 2007 and
he is pursing PhD degree in Manonmaniam Sundranar
University from 2008 and joined assistant professor
in various engineering collages in Tamil Nadu affiliated
to Anna University and has eight years teaching
experience his research work in deep packet inspection.
He has been
session chair in major conference and workshops in computer vision on
algorithm, network, mobile communication, image processing papers and
pattern reorganization. His current primary areas of research are
packet inspection and network. He is interested to conduct guest lecturer in
various engineering in Tamil Nadu.
2
Dr.K.Arulanandam received Ph.D. doctorate degree in
2010 from Vinayaka Missions University. He has twelve
years teaching experience in various engineering colleges
in Tamil Nadu which are affiliated to Anna University and
his research experience network, mobile
communication networks, image processing papers and
algorithm papers. Currently working in
Ganadipathy Tulasi’s Jain Engineering College
Vellore.
3
M.Balaji received B.Tech degree in Information
Technology from Anna University Chennai in 2008
and now pursuing ME degree in Computer Science
and Engineering in Arulmigu Meenakshi Amman
College of Engineering affiliated to Anna University
Chennai.