Why Teams call analytics are critical to your entire business
Automating the Generation of Benchmark Suites
1. SOFTWARE
TECHNIK
Automating the Generation
of Benchmark Suites
Creation, Assessment, and Management of Effective Test
Corpora
Ben Hermann
@benhermann
Joint work Lisa Nguyen Quang Do, Michael Eichberg, Karim Ali, and Eric Bodden
National Java Resource Workshop @ SPLASH, Vancouver
October 23rd, 2017
3. @benhermannABM @ NJR 2017
Evaluation of Code Analyses
• Compare results of an analysis against
• A ground truth show soundness
• A previous analysis show improvement (e.g., in precision)
3
New analysis Ground truthPrevious analyses
4. @benhermannABM @ NJR 2017
Evaluation of Code Analyses
• Compare results of an analysis against
• A ground truth show soundness
• A previous analysis show improvement (e.g., in precision)
3
New analysis Ground truthPrevious analyses
Evaluation corpus
analyzesanalyzes is based on
8. @benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
9. @benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
10. @benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
11. @benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
Purpose
12. @benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
Purpose
How to determine this?
13. @benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
Purpose
How to determine this?
How to achieve this?
15. @benhermannABM @ NJR 2017
Sourcing Projects
for the Corpus
5
ABM
GitHub
BitBucket
…
collect
Size
Content
16. @benhermannABM @ NJR 2017
Sourcing Projects
for the Corpus
5
ABM
GitHub
BitBucket
…
collect
Criteria such as size,
license, or
programming
language apply
Size
Content
17. @benhermannABM @ NJR 2017
Sourcing Projects
for the Corpus
5
ABM
GitHub
BitBucket
…
collect build
Compiled
Projects
Criteria such as size,
license, or
programming
language apply
Size
Content
18. @benhermannABM @ NJR 2017
Sourcing Projects
for the Corpus
5
ABM
GitHub
BitBucket
…
collect build
Compiled
Projects
Criteria such as size,
license, or
programming
language apply
We currently support
maven and sbt, but are
expanding (e.g., gradle)
Size
Content
21. @benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
22. @benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
23. @benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface definitions and a few classes with
static methods.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
24. @benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface definitions and a few classes with
static methods.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
Lastly, the set also contains libraries that are written in other
languages, such as Scala (e.g., ScalaTest), whose compilers
only use a subset of the JVM’s concepts. The Scala
compiler, e.g., does not use package and protected visibility.
This significantly limits our possibilities to identify the
library-private implementation (recall that LibCHACPA
identifies a library’s private implementation based on the
evaluation of the code elements’ visibilities). For each
library, we also downloaded all of its dependencies to build
complete class hierarchies for them.
25. @benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface definitions and a few classes with
static methods.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
Lastly, the set also contains libraries that are written in other
languages, such as Scala (e.g., ScalaTest), whose compilers
only use a subset of the JVM’s concepts. The Scala
compiler, e.g., does not use package and protected visibility.
This significantly limits our possibilities to identify the
library-private implementation (recall that LibCHACPA
identifies a library’s private implementation based on the
evaluation of the code elements’ visibilities). For each
library, we also downloaded all of its dependencies to build
complete class hierarchies for them.
Michael Reif, Michael Eichberg, Ben Hermann, Johannes Lerch, and Mira Mezini. 2016. Call graph construction for
Java libraries. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software
Engineering (FSE 2016)
Description of the Darmstadt Library Corpus (DLC) from:
30. @benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
31. @benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
32. @benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual or Automatic
Selection
33. @benhermannABM @ NJR 2017
OPAL
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual or Automatic
Selection
34. @benhermannABM @ NJR 2017
OPAL
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual or Automatic
Selection
Introduced at
SOAP 2014
Introduced at
SOAP 2017
38. @benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)
Raw class file information
(e.g., for extracting
information from the
constant pool)
39. @benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)
Raw class file information
(e.g., for extracting
information from the
constant pool)List of detected features in
the codebase (id, frequency
of occurrence, (opt.)
locations)
41. @benhermannABM @ NJR 2017
Already Implemented
Queries
11
Existence of
Bytecode Instructions
Class File Versions
Class Types
Trivial Reflection
Fan-In/Fan-Out
Field Access
Method w/o Returns
Method Types
Various Metrics
Recursive
Data Structures
Size of
Inheritance Tree
API Usage
43. @benhermannABM @ NJR 2017
Feature Queries for
API Usage
12
Bytecode
Instrumentation
Class Loader
GUI
Crypto
JDBC
Reflection
System
Thread
Unsafe
44. @benhermannABM @ NJR 2017
Constructing a Minimal
Corpus
• Dead-Path Analysis [FSE15]
• Original evaluation conducted on the complete Qualitas
Corpus
• Minimal corpus only consists of 5 out of the 100
projects in the Qualitas Corpus
• Evaluation cut down from 16.77 minutes to 2.82
minutes (~6x faster) while coverage is only 1.06% below
the original corpus
13
46. @benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
We store and retain
collection definitions
47. @benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Download corpus and
provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
48. @benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Publish
complete corpus
Download corpus and
provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
49. @benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Publish
complete corpus
use DOI
for papers
Download corpus and
provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
50. @benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Publish
complete corpus
use DOI
for papers
Download corpus and
provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
We would love to see
more services like this
51. @benhermannABM @ NJR 2017
Bringing it all together
15
ABM Hermes
inspect
GitHub
BitBucket
…
collect
build
publish
complete corpus
use DOI
for papers
52. SOFTWARE
TECHNIK
Automating the Generation of
Benchmark Suites
Creation, Assessment, and Management of Effective Test Corpora
Ben Hermann
@benhermann
Joint work Michael Reif, Michael Eichberg, and Mira Mezini
Thank you!