Enabling and Supporting the Debugging of Field Failures (Job Talk)

ENABLING AND SUPPORTING
THE DEBUGGING
OF FIELD FAILURES
James Clause
Georgia Institute of Technology

Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
soffa@cs.virginia.edu
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
Categories and Subject Descriptors
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
College of Computing
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
clause@cc.gatech.edu
Alessandro Orso
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
Alessandro Orso
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
Alessandro Orso
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept
RESEARCH OVERVIEW

Efficient instrumentation
Jazz: A Tool for Demand-Driven Structural
Testing
Jonathan Misurda1
, Jim Clause1
, Juliya Reed1
, Bruce R. Childers1
, and Mary
Lou So a2
1
University of Pittsburgh, Pittsburgh PA 15260, USA,
{jmisurda,clausej,juliya,childers}@cs.pitt.edu
2
University of Virginia, Charlottesville VA 22904, USA,
Abstract. Software testing to produce reliable and robust software has
become vitally important. Testing is a process by which quality can be
assured through the collection of information about software. While test-
ing can improve software quality, current tools typically are inflexible
and have high overheads, making it a challenge to test large projects.
We describe a new scalable and flexible tool, called Jazz, that uses a
demand-driven structural testing approach. Jazz has a low overhead of
only 17.6% for branch testing.
1 Introduction
In the last several years, the importance of producing high quality and robust
software has become paramount. Testing is an important process to support
quality assurance by gathering information about the software being developed
or modified. It is, in general, extremely labor and resource intensive, accounting
for 50-60% of the total cost of software development [1]. The increased emphasis
on software quality and robustness mandates improved testing methodologies.
To test software, a number of techniques can be applied. One class of tech-
niques is structural testing, which checks that a given coverage criterion is sat-
isfied. For example, branch testing checks that a certain percentage of branches
are executed. Other structural tests include def-use testing in which pairs of
variable definitions and uses are checked for coverage and node testing in which
nodes in a program’s control flow graph are checked.
Unfortunately, structural testing is often hindered by the lack of scalable
and flexible tools. Current tools are not scalable in terms of both time and
memory, limiting the number and scope of the tests that can be applied to large
programs. These tools often modify the software binary to insert instrumentation
for testing. In this case, the tested version of the application is not the same
version that is shipped to customers and errors may remain. Testing tools are
usually inflexible and only implement certain types of testing. For example, many
tools implement branch testing, but do not implement node or def-use testing.
In this paper, we describe a new tool for structural testing, called Jazz, that
addresses these problems. Jazz uses a novel demand-driven technique to apply
ABSTRACT
Producing reliable and robust software has become one
of the most important software development concerns in
recent years. Testing is a process by which software
quality can be assured through the collection of infor-
mation. While testing can improve software reliability,
current tools typically are inflexible and have high over-
heads, making it challenging to test large software
projects. In this paper, we describe a new scalable and
flexible framework for testing programs with a novel
demand-driven approach based on execution paths to
implement test coverage. This technique uses dynamic
instrumentation on the binary code that can be inserted
and removed on-the-fly to keep performance and mem-
ory overheads low. We describe and evaluate implemen-
tations of the framework for branch, node and def-use
testing of Java programs. Experimental results for
branch testing show that our approach has, on average, a
1.6 speed up over static instrumentation and also uses
less memory.
D.2.5. [Software Engineering]: Testing and Debug-
ging—Testing tools; D.3.3. [Programming Lan-
guages]: Language Constructs and Features—Program
instrumentation, run-time environments
General Terms
Experimentation, Measurement, Verification
Keywords
Testing, Code Coverage, Structural Testing, Demand-
Driven Instrumentation, Java Programming Language
1. INTRODUCTION
In the last several years, the importance of produc-
ing high quality and robust software has become para-
mount [15]. Testing is an important process to support
quality assurance by gathering information about the
behavior of the software being developed or modified. It
is, in general, extremely labor and resource intensive,
accounting for 50-60% of the total cost of software
development [17]. Given the importance of testing, it is
imperative that there are appropriate testing tools and
frameworks. In order to adequately test software, a
number of different testing techniques must be per-
formed. One class of testing techniques used extensively
is structural testing in which properties of the software
code are used to ensure a certain code coverage.Struc-
tural testing techniques include branch testing, node
testing, path testing, and def-use testing [6,7,8,17,19].
Typically, a testing tool targets one type of struc-
tural test, and the software unit is the program, file or
particular methods. In order to apply various structural
testing techniques, different tools must be used. If a tool
for a particular type of structural testing is not available,
the tester would need to either implement it or not use
that testing technique. The tester would also be con-
strained by the region of code to be tested, as deter-
mined by the tool implementor. For example, it may not
be possible for the tester to focus on a particular region
of code, such as a series of loops, complicated condi-
tionals, or particular variables if def-use testing is
desired. The user may want to have higher coverage on
frequently executed regions of code. Users may want to
define their own way of testing. For example, all
branches should be covered 10 times rather than once in
all loops.
In structural testing, instrumentation is placed at
certain code points (probes). Whenever such a program
point is reached, code that performs the function for the
test (payload) is executed. The probes in def-use testing
are dictated by the definitions and uses of variables and
the payload is to mark that a definition or use in a def-
use pair has been covered. Thus for each type of struc-
tural testing, there is a testing “plan”. A test plan is a
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee.
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA.
Copyright 2005 ACM 1-58113-963-2/05/0005...$5.00.
Demand-Driven Structural Testing with Dynamic
Instrumentation
Jonathan Misurda†, James A. Clause†, Juliya L. Reed†, Bruce R. Childers†, and
Mary Lou Soffa‡
†Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{jmisurda, clausej, juliya, childers}@cs.pitt.edu
‡Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
156
A Technique for Enabling and Supporting Debugging of Field Failures
James Clause and Alessandro Orso
{clause, orso}@cc.gatech.edu
Abstract
It is difficult to fully assess the quality of software in-
house, outside the actual time and context in which it will
execute after deployment. As a result, it is common for
software to manifest field failures, failures that occur on
user machines due to untested behavior. Field failures are
typically difficult to recreate and investigate on developer
platforms, and existing techniques based on crash report-
ing provide only limited support for this task. In this pa-
per, we present a technique for recording, reproducing, and
minimizing failing executions that enables and supports in-
house debugging of field failures. We also present a tool
that implements our technique and an empirical study that
evaluates the technique on a widely used e-mail client.
1. Introduction
Quality-assurance activities, such as software testing and
analysis, are notoriously difficult, expensive, and time-
consuming. As a result, software products are often re-
leased with faults or missing functionality. In fact, real-
world examples of field failures experienced by users be-
cause of untested behaviors (e.g., due to unforeseen us-
ages), are countless. When field failures occur, it is im-
portant for developers to be able to recreate and investigate
them in-house. This pressing need is demonstrated by the
emergence of several crash-reporting systems, such as Mi-
crosoft’s error reporting systems [13] and Apple’s Crash
Reporter [1]. Although these techniques represent a first
important step in addressing the limitations of purely in-
house approaches to quality assurance, they work on lim-
ited data (typically, a snapshot of the execution state) and
can at best identify correlations between a crash report and
data on other known failures.
In this paper, we present a novel technique for reproduc-
ing and investigating field failures that addresses the limita-
tions of existing approaches. Our technique works in three
phases, intuitively illustrated by the scenario in Figure 1. In
the recording phase, while users run the software, the tech-
nique intercepts and logs the interactions between applica-
tion and environment and records portions of the environ-
ment that are relevant to these interactions. If the execution
terminates with a failure, the produced execution recording
is stored for later investigation. In the minimization phase,
using free cycles on the user machines, the technique re-
plays the recorded failing executions with the goal of au-
tomatically eliminating parts of the executions that are not
relevant to the failure. In the replay and debugging phase,
developers can use the technique to replay the minimized
failing executions and investigate the cause of the failures
(e.g., within a debugger). Being able to replay and debug
real field failures can give developers unprecedented insight
into the behavior of their software after deployment and op-
portunities to improve the quality of their software in ways
that were not possible before.
To evaluate our technique, we implemented it in a proto-
type tool, called ADDA (Automated Debugging of Deployed
Applications), and used the tool to perform an empirical
study. The study was performed on PINE [19], a widely-
used e-mail client, and involved the investigation of failures
caused by two real faults in PINE. The results of the study
are promising. Our technique was able to (1) record all ex-
ecutions of PINE (and two other subjects) with a low time
and space overhead, (2) completely replay all recorded exe-
cutions, and (3) perform automated minimization of failing
executions and obtain shorter executions that manifested the
same failures as the original executions. Moreover, we were
able to replay the minimized executions within a debugger,
which shows that they could have actually been used to in-
vestigate the failures.
The contributions of this paper are:
• A novel technique for recording and later replaying exe-
cutions of deployed programs.
• An approach for minimizing failing executions and gen-
erating shorter executions that fail for the same reasons.
• A prototype tool that implements our technique.
• An empirical study that shows the feasibility and effec-
tiveness of the approach.
29th International Conference on Software Engineering (ICSE'07)
0-7695-2828-7/07 $20.00 © 2007
Dytan: A Generic Dynamic Taint Analysis Framework
James Clause, Wanchun Li, and Alessandro Orso
{clause|wli7|orso}@cc.gatech.edu
ABSTRACT
Dynamic taint analysis is gaining momentum. Techniques based
on dynamic tainting have been successfully used in the context of
application security, and now their use is also being explored in dif-
ferent areas, such as program understanding, software testing, and
debugging. Unfortunately, most existing approaches for dynamic
tainting are defined in an ad-hoc manner, which makes it difficult
to extend them, experiment with them, and adapt them to new con-
texts. Moreover, most existing approaches are focused on data-flow
based tainting only and do not consider tainting due to control flow,
which limits their applicability outside the security domain. To
address these limitations and foster experimentation with dynamic
tainting techniques, we defined and developed a general framework
for dynamic tainting that (1) is highly flexible and customizable, (2)
allows for performing both data-flow and control-flow based taint-
ing conservatively, and (3) does not rely on any customized run-
time system. We also present DYTAN, an implementation of our
framework that works on x86 executables, and a set of preliminary
studies that show how DYTAN can be used to implement different
tainting-based approaches with limited effort. In the studies, we
also show that DYTAN can be used on real software, by using FIRE-
FOX as one of our subjects, and illustrate how the specific char-
acteristics of the tainting approach used can affect efficiency and
accuracy of the taint analysis, which further justifies the use of our
framework to experiment with different variants of an approach.
Categories and Subject Descriptors: D.2.5 [Software Engineer-
ing]: Testing and Debugging;
General Terms: Experimentation, Security
Keywords: Dynamic tainting, information flow, general framework
1. INTRODUCTION
Dynamic taint analysis (also known as dynamic information flow
analysis) consists, intuitively, in marking and tracking certain data
in a program at run-time. This type of dynamic analysis is be-
coming increasingly popular. In the context of application secu-
rity, dynamic-tainting approaches have been successfully used to
prevent a wide range of attacks, including buffer overruns (e.g., [8,
17]), format string attacks (e.g., [17, 21]), SQL and command in-
jections (e.g., [7, 19]), and cross-site scripting (e.g., [18]). More
recently, researchers have started to investigate the use of tainting-
based approaches in domains other than security, such as program
understanding, software testing, and debugging (e.g., [11, 13]).
ISSTA’07, July 9–12, 2007, London, England, United Kingdom.
Copyright 2007 ACM 978-1-59593-734-6/07/0007 ...$5.00.
Unfortunately, most existing techniques and tools for dynamic
taint analysis are defined in an ad-hoc manner, to target a specific
problem or a small class of problems. It would be difficult to ex-
tend or adapt such techniques and tools so that they can be used in
other contexts. In particular, most existing approaches are focused
on data-flow based tainting only, and do not consider tainting due
to the control flow within an application, which limits their general
applicability. Also, most existing techniques support either a sin-
gle taint marking or a small, fixed number of markings, which is
problematic in applications such as debugging. Finally, almost no
existing technique handles the propagation of taint markings in a
truly conservative way, which may be appropriate for the specific
applications considered, but is problematic in general. Because de-
veloping support for dynamic taint analysis is not only time con-
suming, but also fairly complex, this lack of flexibility and gener-
ality of existing tools and techniques is especially limiting for this
type of dynamic analysis.
To address these limitations and foster experimentation with dy-
namic tainting techniques, in this paper we present a framework for
dynamic taint analysis. We designed the framework to be general
and flexible, so that it allows for implementing different kinds of
techniques based on dynamic taint analysis with little effort. Users
can leverage the framework to quickly develop prototypes for their
techniques, experiment with them, and investigate trade-offs of dif-
ferent alternatives. For a simple example, the framework could be
used to investigate the cost effectiveness of considering different
types of taint propagation for an application.
Our framework has several advantages over existing approaches.
First, it is highly flexible and customizable. It allows for easily
specifying which program data should be tainted and how, how taint
markings should be propagated at run-time, and where and how
taint markings should be checked. Second, it allows for performing
data-flow and both data-flow and control-flow based tainting. Third,
from a more practical standpoint, it works on binaries, does not
need access to source code, and does not rely on any customized
hardware or operating system, which makes it broadly applicable.
We also present DYTAN, an implementation of our framework
that works on x86 binaries, and a set of preliminary studies per-
formed using DYTAN. In the first set of studies, we report on our
experience in using DYTAN to implement two tainting-based ap-
proaches presented in the literature. Although preliminary, our ex-
perience shows that we were able to implement these approaches
completely and with little effort. The second set of studies illus-
trates how the specific characteristics of a tainting approach can
affect efficiency and accuracy of the taint analysis. In particular, we
investigate how ignoring control-flow related propagation and over-
looking some data-flow aspects can lead to unsafety. These results
further justify the usefulness of experimenting with different varia-
tions of dynamic taint analysis and assessing their tradeoffs, which
can be done with limited effort using our framework. The second
set of studies also shows the practical applicability of DYTAN, by
successfully running it on the FIREFOX web browser.
196
Effective Memory Protection Using Dynamic Tainting
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic
{clause|idoud|orso|milos}@cc.gatech.edu
ABSTRACT
Programs written in languages that provide direct access to memory
through pointers often contain memory-related faults, which may
cause non-deterministic failures and even security vulnerabilities.
In this paper, we present a new technique based on dynamic taint-
ing for protecting programs from illegal memory accesses. When
memory is allocated, at runtime, our technique taints both the mem-
ory and the corresponding pointer using the same taint mark. Taint
marks are then suitably propagated while the program executes and
are checked every time a memory address m is accessed through a
pointer p; if the taint marks associated with m and p differ, the ex-
ecution is stopped and the illegal access is reported. To allow for a
low-overhead, hardware-assisted implementation of the approach,
we make several key technical and engineering decisions in the
definition of our technique. In particular, we use a configurable,
low number of reusable taint marks instead of a unique mark for
each area of memory allocated, which reduces the overhead of the
approach without limiting its flexibility and ability to target most
memory-related faults and attacks known to date. We also define
the technique at the binary level, which lets us handle the (very)
common case of applications that use third-party libraries whose
source code is unavailable. To investigate the effectiveness and
practicality of our approach, we implemented it for heap-allocated
memory and performed a preliminary empirical study on a set of
programs. Our results show that (1) our technique can identify a
large class of memory-related faults, even when using only two
unique taint marks, and (2) a hardware-assisted implementation of
the technique could achieve overhead in the single digits.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Test-
ing and Debugging; C.0 [General]: Hardware/Software Interfaces;
General Terms: Performance, Security
Keywords: Illegal memory accesses, dynamic tainting, hardware support
1. INTRODUCTION
Memory-related faults are a serious problem for languages that
allow direct memory access through pointers. An important class
of memory-related faults are what we call illegal memory accesses.
ASE’07, November 5–9, 2007, Atlanta, Georgia, USA.
Copyright 2007 ACM 978-1-59593-882-4/07/0011 ...$5.00.
In languages such as C and C++, when memory allocation is re-
quested, a currently-free area of memory m of the specified size
is reserved. After m has been allocated, its initial address can be
assigned to a pointer p, either immediately (e.g., in the case of
heap allocated memory) or at a later time (e.g., when retrieving
and storing the address of a local variable). From that point on,
the only legal accesses to m through a pointer are accesses per-
formed through p or through other pointers derived from p. (In
Section 3, we clearly define what it means to derive a pointer from
another pointer.) All other accesses to m are Illegal Memory Ac-
cesses (IMAs), that is, accesses where a pointer is used to access
memory outside the bounds of the memory area with which it was
originally associated.
IMAs are especially relevant for several reasons. First, they are
caused by typical programming errors, such as array-out-of-bounds
accesses and NULL pointer dereferences, and are thus widespread
and common. Second, they often result in non-deterministic fail-
ures that are hard to identify and diagnose; the specific effects of an
IMA depend on several factors, such as memory layout, that may
vary between executions. Finally, many security concerns such as
viruses, worms, and rootkits use IMAs as their injection vectors.
In this paper, we present a new dynamic technique for protecting
programs against IMAs that is effective against most known types
of illegal accesses. The basic idea behind the technique is to use
dynamic tainting (or dynamic information flow) [8] to keep track
of which memory areas can be accessed through which pointers,
as follows. At runtime, our technique taints both allocated mem-
ory and pointers using taint marks. Dynamic taint propagation, to-
gether with a suitable handling of memory-allocation and deallo-
cation operations, ensures that taint marks are appropriately prop-
agated during execution. Every time the program accesses some
memory through a pointer, our technique checks whether the ac-
cess is legal by comparing the taint mark associated with the mem-
ory and the taint mark associated with the pointer used to access it.
If the marks match, the access is considered legitimate. Otherwise,
the execution is stopped and an IMA is reported.
In defining our approach, our final goal is the development of a
low-overhead, hardware-assisted tool that is practical and can be
used on deployed software. A hardware-assisted tool is a tool that
leverages the benefits of both hardware and software. Typically,
some performance critical aspects are moved to the hardware to
achieve maximum efficiency, while software is used to perform op-
erations that would be too complex to implement in hardware.
There are two main characteristics of our approach that were de-
fined to help achieve our goal of a hardware-assisted implementa-
tion. The first characteristic is that our technique only uses a small,
configurable number of reusable taint marks instead of a unique
mark for each area of memory allocated. Using a low number of
283
Penumbra: Automatically Identifying Failure-Relevant
Inputs Using Dynamic Tainting
James Clause
Alessandro Orso
orso@cc.gatech.edu
ABSTRACT
Most existing automated debugging techniques focus on re-
ducing the amount of code to be inspected and tend to ig-
nore an important component of software failures: the in-
puts that cause the failure to manifest. In this paper, we
present a new technique based on dynamic tainting for au-
tomatically identifying subsets of a program’s inputs that
are relevant to a failure. The technique (1) marks program
inputs when they enter the application, (2) tracks them as
they propagate during execution, and (3) identifies, for an
observed failure, the subset of inputs that are potentially
relevant for debugging that failure. To investigate feasibil-
ity and usefulness of our technique, we created a prototype
tool, penumbra, and used it to evaluate our technique on
several failures in real programs. Our results are promising,
as they show that penumbra can point developers to inputs
that are actually relevant for investigating a failure and can
be more practical than existing alternative approaches.
General Terms
Algorithms, Experimentation, Reliability
Keywords
Failure-relevant inputs, automated debugging, dynamic in-
formation flow, dynamic tainting
1. INTRODUCTION
Debugging is known to be a labor-intensive, time-consum-
ing task that can be responsible for a large portion of soft-
ware development and maintenance costs [21,23]. Common
characteristics of modern software, such as increased con-
figurability, larger code bases, and increased input sizes, in-
troduce new challenges for debugging and exacerbate exist-
ing problems. In response, researchers have proposed many
ISSTA’09, July 19–23, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-338-9/09/07 ...$5.00.
semi- and fully-automated techniques that attempt to re-
duce the cost of debugging (e.g., [8,9,11–13,18,24,25,27]).
The majority of these techniques are code-centric in that
they focus exclusively on one aspect of debugging—trying
to identify the faulty statements responsible for a failure.
Although code-centric approaches can work well in some
cases (e.g., for isolated faults that involve a single state-
ment), they are often inadequate for more complex faults [4].
Faults of omission, for instance, where part of a specification
has not been implemented, are notoriously problematic for
debugging techniques that attempt to identify potentially
faulty statements. The usefulness of code-centric techniques
is also limited in the case of long-running programs and pro-
grams that process large amounts of information; failures in
these types of programs are typically di⌅cult to understand
without considering the data involved in such failures.
To debug failures more e ectively, it is necessary to pro-
vide developers with not only a relevant subset of state-
ments, but also a relevant subset of inputs. There are only
a few existing techniques that attempt to identify relevant
inputs [3, 17, 25], with delta debugging [25] being the most
known of these. Although delta debugging has been shown
to be an e ective technique for automatic debugging, it also
has several drawbacks that may limit its usefulness in prac-
tice. In particular, it requires (1) multiple executions of the
program being debugged, which can involve a long running
time, and (2) complex oracles and setup, which can result
in a large amount of manual e ort [2].
In this paper, we present a novel debugging technique that
addresses many of the limitations of existing approaches.
Our technique can complement code-centric debugging tech-
niques because it focuses on identifying program inputs that
are likely to be relevant for a given failure. It also overcomes
some of the drawbacks of delta debugging because it needs
a single execution to identify failure-relevant inputs and re-
quires minimal manual e ort.
Given an observable faulty behavior and a set of failure-
inducing inputs (i.e., a set of inputs that cause such behav-
ior), our technique automatically identifies failure-relevant
inputs (i.e., a subset of failure-inducing inputs that are ac-
tually relevant for investigating the faulty behavior). Our
approach is based on dynamic tainting. Intuitively, the tech-
nique works by tracking the flow of inputs along data and
control dependences at runtime. When a point of failure
is reached, the tracked information is used to identify and
present to developers the failure-relevant inputs. At this
point, developers can use the identified inputs to investigate
the failure at hand.
LEAKPOINT: Pinpointing the Causes of Memory Leaks
James Clause
Alessandro Orso
orso@cc.gatech.edu
ABSTRACT
Most existing leak detection techniques for C and C++ applications
only detect the existence of memory leaks. They do not provide
any help for fixing the underlying memory management errors. In
this paper, we present a new technique that not only detects leaks,
but also points developers to the locations where the underlying
errors may be fixed. Our technique tracks pointers to dynamically-
allocated areas of memory and, for each memory area, records sev-
eral pieces of relevant information. This information is used to
identify the locations in an execution where memory leaks occur.
To investigate our technique’s feasibility and usefulness, we devel-
oped a prototype tool called LEAKPOINT and used it to perform
an empirical evaluation. The results of this evaluation show that
LEAKPOINT detects at least as many leaks as existing tools, reports
zero false positives, and, most importantly, can be effective at help-
ing developers fix the underlying memory management errors.
General Terms
Performance, Reliability
Keywords
Leak detection, Dynamic tainting
1. INTRODUCTION
Memory leaks are a type of unintended memory consumption
that can adversely impact the performance and correctness of an
application. In programs written in languages such as C and C++,
memory is allocated using allocation functions, such as malloc
and new. Allocation functions reserve a currently free area of
memory m and return a pointer p that points to m’s starting ad-
dress. Typically, the program stores and then uses p, or another
This work was supported in part by NSF awards CCF-0725202
and CCF-0541080 to Georgia Tech.
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00.
pointer derived from p, to interact with m. When m is no longer
needed, the program should pass p to a deallocation function (e.g.,
free or delete) to deallocate m. A leak occurs if, due to a
memory management error, m is not deallocated at the appropri-
ate time. There are two types of memory leaks: lost memory and
forgotten memory. Lost memory refers to the situation where m be-
comes unreachable (i.e., the program overwrites or loses p and all
pointers derived from p) without first being deallocated. Forgotten
memory refers to the situation where m remains reachable but is
not deallocated or accessed in the rest of the execution.
Memory leaks are relevant for several reasons. First, they are dif-
ficult to detect. Unlike many other types of failures, memory leaks
do not immediately produce an easily visible symptom (e.g., a crash
or the output of a wrong value); typically, leaks remain unobserved
until they consume a large portion of the memory available to a sys-
tem. Second, leaks have the potential to impact not only the appli-
cation that leaks memory, but also every other application running
on the system; because the overall amount of memory is limited,
as the memory usage of a leaking program increases, less memory
is available to other running applications. Consequently, the per-
formance and correctness of every running application can be im-
pacted by a program that leaks memory. Third, leaks are common,
even in mature applications. For example, in the first half of 2009,
over 100 leaks in the Firefox web-browser were reported [18].
Because of the serious consequences and common occurrence of
memory leaks, researchers have created many static and dynamic
techniques for detecting them (e.g., [1,2,4,7–14,16,17,20–23,25,
27,28]). The adoption of static techniques has been limited by sev-
eral factors, including the lack of scalable, precise heap modeling.
Dynamic techniques are therefore more widely used in practice. In
general, dynamic techniques provide one main piece of informa-
tion: the location in an execution where a leaked area of memory is
allocated. This location is supposed to serve as a starting point for
investigating the leak. However, in many situations, this informa-
tion does not provide any insight on where or how to fix the mem-
ory management error that causes the leak: the allocation location
and the location of the memory management error are typically in
completely different parts of the application’s code.
To address this limitation of existing approaches, we propose
a new memory leak detection technique. Our technique provides
the same information as existing techniques but also identifies the
locations in an execution where leaks occur. In the case of lost
memory, the location is defined as the point in an execution where
the last pointer to an unallocated memory area is lost or overwritten.
In the case of forgotten memory, the location is defined as the last
point in an execution where a pointer to a leaked area of memory
was used (e.g., when it is dereferenced to read or write memory,
passed as a function argument, returned from a function, or used as
Camouflage: Automated Sanitization of Field Data
James Clause
Alessandro Orso
orso@cc.gatech.edu
ABSTRACT
Privacy and security concerns have adversely a ected the
usefulness of many types of techniques that leverage infor-
mation gathered from deployed applications. To address this
issue, we present a new approach for automatically sanitiz-
ing failure-inducing inputs. Given an input I that causes
a failure f, our technique can generate a sanitized input I
that is di erent from I but still causes f. I can then be sent
to the developers to help them debug f, without revealing
the possibly sensitive information contained in I. We im-
plemented our approach in a prototype tool, camouflage,
and performed an empirical evaluation. In the evaluation,
we applied camouflage to a large set of failure-inducing
inputs for several real applications. The results of the eval-
uation are promising; they show that camouflage is both
practical and e ective at generating sanitized inputs. In par-
ticular, for the inputs that we considered, I and I shared
no sensitive information.
1. INTRODUCTION
Investigating techniques that capture data from deployed
applications to support in-house software engineering tasks
is an increasingly active and successful area of research (e.g.,
[1,3–5,13,14,17,21,22,26,27,29]). However, privacy and se-
curity concerns have prevented widespread adoption of many
of these techniques and, because they rely on user partici-
pation, have ultimately limited their usefulness. Many of
the earlier proposed techniques attempt to sidestep these
concerns by collecting only limited amounts of information
(e.g., stack traces and register dumps [1, 3, 5] or sampled
branch profiles [26,27]) and providing a privacy policy that
specifies how the information will be used (e.g., [2,8]). Be-
cause the types of information collected by these techniques
are unlikely to be sensitive, users are more willing to trust
developers. Moreover, because only a small amount of infor-
mation is collected, it is feasible for users to manually inspect
and sanitize such information before it is sent to developers.
Unfortunately, recent research has shown that the e ec-
tiveness of these techniques increases when they can lever-
age large amounts of detailed information (e.g., complete
execution recordings [4, 14] or path profiles [13, 24]). Since
more detailed information is bound to contain sensitive data,
users will most likely be unwilling to let developers collect
such information. In addition, collecting large amounts of
information would make it infeasible for users to sanitize
the collected information by hand. To address this prob-
lem, some of these techniques suggest using an input mini-
mization approach (e.g., [6, 7, 35]) to reduce the number of
failure-inducing inputs and, hopefully, eliminate some sensi-
tive information. Input-minimization techniques, however,
were not designed to specifically reduce sensitive inputs, so
they can only eliminate sensitive data by chance. In or-
der for techniques that leverage captured field information
to become widely adopted and achieve their full potential,
new approaches for addressing privacy and security concerns
must be developed.
In this paper, we present a novel technique that addresses
privacy and security concerns by sanitizing information cap-
tured from deployed applications. Our technique is designed
to be used in conjunction with an execution capture/replay
technique (e.g., [4, 14]). Given an execution recording that
contains a captured failure-inducing input I = i1, i2, . . . in⇥
and terminates with a failure f, our technique replays the
execution recording and leverages a specialized version of
symbolic-execution to automatically produce I , a sanitized
version of I, such that I (1) still causes f and (2) reveals as
little information about I as possible. A modified execution
recording where I replaces I can then be constructed and
sent to the developers, who can use it to debug f.
It is, in general, impossible to construct I such that it
does not reveal any information about I while still caus-
ing the same failure f. Typically, the execution of f would
depend on the fact that some elements of I have specific
values (e.g., i1 must be 0 for the failing path to be taken).
However, this fact does not prevent the technique from be-
ing useful in practice. In our evaluation, we found that the
information revealed by the sanitized inputs was not sensi-
tive and tended to be structural in nature (e.g., a specific
portion of the input must be surrounded by double quotes).
Conversely, the parts of the inputs that were more likely to
be sensitive (e.g., values contained inside the double quotes)
were not revealed (see Section 4).
To evaluate the e ectiveness of our technique, we imple-
mented it in a prototype tool, called camouflage, and car-
ried out an empirical evaluation of 170 failure-inducing in-
1
CC 05 ICSE 05 ICSE 07 ISSTA 07 ASE 07 ISSTA 09 ICSE 10 Tech Rept
RESEARCH OVERVIEW

Enabling and Supporting the Debugging of Field Failures (Job Talk)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Enabling and Supporting the Debugging of Field Failures (Job Talk)

Ähnlich wie Enabling and Supporting the Debugging of Field Failures (Job Talk) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Enabling and Supporting the Debugging of Field Failures (Job Talk)