Automating the Generation of Benchmark Suites

SOFTWARE
TECHNIK
Automating the Generation
of Benchmark Suites
Creation, Assessment, and Management of Effective Test
Corpora
Ben Hermann
@benhermann
Joint work Lisa Nguyen Quang Do, Michael Eichberg, Karim Ali, and Eric Bodden
National Java Resource Workshop @ SPLASH, Vancouver
October 23rd, 2017

@benhermannABM @ NJR 2017
Evaluation of Code
Analyses
2

Evaluation of Code Analyses
• Compare results of an analysis against
• A ground truth show soundness
• A previous analysis show improvement (e.g., in precision)
3
New analysis Ground truthPrevious analyses

Evaluation of Code Analyses
• Compare results of an analysis against
• A ground truth show soundness
• A previous analysis show improvement (e.g., in precision)
3
New analysis Ground truthPrevious analyses
Evaluation corpus
analyzesanalyzes is based on

Construction of a Corpus
4

4
Size

4
Size
Content

4
Size
Content
Representativeness

4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010

4
Size
Content
Representativeness
Permanence
Sources

4
Size
Content
Representativeness
Permanence
Sources
Purpose

4
Size
Content
Representativeness
Permanence
Sources
Purpose
How to determine this?

4
Size
Content
Representativeness
Permanence
Sources
Purpose
How to determine this?
How to achieve this?

Sourcing Projects  
for the Corpus
5
ABM
Size
Content

for the Corpus
5
ABM
GitHub
BitBucket
…
collect
Size
Content

for the Corpus
5
ABM
GitHub
BitBucket
…
collect
Criteria such as size,
license, or
programming
language apply
Size
Content

for the Corpus
5
ABM
GitHub
BitBucket
…
collect build
Compiled
Projects
license, or
programming
language apply
Size
Content

for the Corpus
5
ABM
GitHub
BitBucket
…
collect build
Compiled
Projects
license, or
programming
language apply
We currently support
maven and sbt, but are
expanding (e.g., gradle)
Size
Content

How can we achieve
representativeness for a
corpus?
6

Representativeness in
Custom Collections
7

Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.

Custom Collections
7
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).

Custom Collections
7
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface deﬁnitions and a few classes with
static methods.
(e.g., osgi.core).

Custom Collections
7
static methods.
(e.g., osgi.core).
Lastly, the set also contains libraries that are written in other
languages, such as Scala (e.g., ScalaTest), whose compilers
only use a subset of the JVM’s concepts. The Scala
compiler, e.g., does not use package and protected visibility.
This signiﬁcantly limits our possibilities to identify the
library-private implementation (recall that LibCHACPA
identiﬁes a library’s private implementation based on the
evaluation of the code elements’ visibilities). For each
library, we also downloaded all of its dependencies to build
complete class hierarchies for them.

Custom Collections
7
static methods.
(e.g., osgi.core).
Lastly, the set also contains libraries that are written in other
languages, such as Scala (e.g., ScalaTest), whose compilers
only use a subset of the JVM’s concepts. The Scala
compiler, e.g., does not use package and protected visibility.
This signiﬁcantly limits our possibilities to identify the
library-private implementation (recall that LibCHACPA
identiﬁes a library’s private implementation based on the
evaluation of the code elements’ visibilities). For each
library, we also downloaded all of its dependencies to build
complete class hierarchies for them.
Michael Reif, Michael Eichberg, Ben Hermann, Johannes Lerch, and Mira Mezini. 2016. Call graph construction for
Java libraries. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software
Engineering (FSE 2016)
Description of the Darmstadt Library Corpus (DLC) from:

Representativeness  
in ABM
8
ABM
build
Compiled
Projects
Representativeness

Representativeness  
in ABM
8
ABM
build
Compiled
Projects
Representativeness
Hermes
inspect
select

How Hermes Works
9
Corpus candidates

How Hermes Works
9
Corpus candidates Hermes

How Hermes Works
9
Corpus candidates Hermes Optimal corpus

How Hermes Works
9
Feature Queries

How Hermes Works
9
Feature Queries
Manual or Automatic
Selection

OPAL
How Hermes Works
9
Feature Queries
Manual or Automatic
Selection

OPAL
How Hermes Works
9
Feature Queries
Manual or Automatic
Selection
Introduced at  
SOAP 2014
Introduced at  
SOAP 2017

Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}

Feature Queries
10
// …
def apply[S](
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics

Feature Queries
10
// …
def apply[S](
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)

Feature Queries
10
// …
def apply[S](
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
Raw class file information
(e.g., for extracting
information from the
constant pool)

Feature Queries
10
// …
def apply[S](
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
Raw class file information
(e.g., for extracting
information from the
constant pool)List of detected features in
the codebase (id, frequency
of occurrence, (opt.)
locations)

Already Implemented
Queries
11

Already Implemented
Queries
11
Existence of  
Bytecode Instructions
Class File Versions
Class Types
Trivial Reflection
Fan-In/Fan-Out
Field Access
Method w/o Returns
Method Types
Various Metrics
Recursive  
Data Structures
Size of 
Inheritance Tree
API Usage

Feature Queries for  
API Usage
12

Feature Queries for  
API Usage
12
Bytecode  
Instrumentation
Class Loader
GUI
Crypto
JDBC
Reflection
System
Thread
Unsafe

Constructing a Minimal
Corpus
• Dead-Path Analysis [FSE15]
• Original evaluation conducted on the complete Qualitas
Corpus
• Minimal corpus only consists of 5 out of the 100
projects in the Qualitas Corpus
• Evaluation cut down from 16.77 minutes to 2.82
minutes (~6x faster) while coverage is only 1.06% below
the original corpus
13

Collection Permanence
14
Permanence
ABM

14
Permanence
ABM
We store and retain
collection definitions

14
Permanence
ABM
Download corpus and  
provide on your
infrastructure
Collected
Projects
We store and retain

14
Permanence
ABM
Publish  
complete corpus
provide on your
infrastructure
Collected
Projects
We store and retain

14
Permanence
ABM
Publish  
complete corpus
use DOI  
for papers
provide on your
infrastructure
Collected
Projects
We store and retain

14
Permanence
ABM
Publish  
complete corpus
use DOI  
for papers
provide on your
infrastructure
Collected
Projects
We store and retain
We would love to see
more services like this

Bringing it all together
15
ABM Hermes
inspect
GitHub
BitBucket
…
collect
build
publish  
complete corpus
use DOI  
for papers

SOFTWARE
TECHNIK
Automating the Generation of
Benchmark Suites
Creation, Assessment, and Management of Effective Test Corpora
Ben Hermann
@benhermann
Joint work Michael Reif, Michael Eichberg, and Mira Mezini
Thank you!

Automating the Generation of Benchmark Suites

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (11)

Ähnlich wie Automating the Generation of Benchmark Suites

Ähnlich wie Automating the Generation of Benchmark Suites (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Automating the Generation of Benchmark Suites