SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 1 Stage 2
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 1 Stage 2
Stage 1
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 1 Stage 2
• Establish signi
fi
cance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
In-principle Accepted!
Go to Stage 2.
Outcomes of Stage 1:
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 1 Stage 2
• Establish signi
fi
cance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
In-principle Accepted!
Go to Stage 2.
Major / Minor Revision.
Back to Stage 1.
Outcomes of Stage 1:
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 1 Stage 2
• Establish signi
fi
cance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
In-principle Accepted!
Go to Stage 2.
Major / Minor Revision.
Back to Stage 1.
Rejected.
Outcomes of Stage 1:
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 1 Stage 2
• Establish signi
fi
cance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
• Establish conformity.
• Execute agreed exp. protocol.
• Explain small deviations fr. protocol.
• Investigate unexpected results.
• Establish reproducibility.
• Submit evidence towards
the key claims in the paper.
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to
fi
x this.
Preregistration
Stage 2
• Establish conformity.
• Execute agreed exp. protocol.
• Explain small deviations fr. protocol.
• Investigate unexpected results.
• Establish reproducibility.
• Submit evidence towards
the key claims in the paper.
Outcomes of Stage 2:
Accept
Major / Minor Revision
Explain deviations / unexpected results.
Improve artifact / reproducibility.
Reject
Severe deviations from experimental protocol.
Why Preregistration
• To get you fuzzing paper published, you need strong positive results.
• We believe, this unhealthy focus is a substantial inhibitor of scienti
fi
c progress.
• Duplicated E
ff
orts: Important investigations are never published.
Why Preregistration
• To get you fuzzing paper published, you need strong positive results.
• We believe, this unhealthy focus is a substantial inhibitor of scienti
fi
c progress.
• Duplicated E
ff
orts: Important investigations are never published.
• Hypothesis / approach perfectly reasonable and scienti
fi
c appealing,
If hypothesis proves to be invalid or approach ine
ff
ective, other groups will never now.
Why Preregistration
• To get you fuzzing paper published, you need strong positive results.
• We believe, this unhealthy focus is a substantial inhibitor of scienti
fi
c progress.
• Duplicated E
ff
orts: Important investigations are never published.
• Overclaims: Incentive to overclaim the bene
fi
ts of an approach.
Why Preregistration
• To get you fuzzing paper published, you need strong positive results.
• We believe, this unhealthy focus is a substantial inhibitor of scienti
fi
c progress.
• Duplicated E
ff
orts: Important investigations are never published.
• Overclaims: Incentive to overclaim the bene
fi
ts of an approach.
• Di
ffi
cult to reproduce the results and misinforms future investigations by the community.
• Authors are uncomfortable sharing their research prototypes.
In 2020 only 35 of 60 fuzzing papers we surveyed published code w/ paper.
Why Preregistration
• Sound fuzzer evaluation imposes high barrier to entry for newcomers.
Why Preregistration
• Sound fuzzer evaluation imposes high barrier to entry for newcomers.
1. Well-designed experiment methodology.
2. Substantial computation resources.
• Huge variance due to randomness
• Repeat 20x, 24hrs, X fuzzers, Y programs
• Statistical Signi
fi
cance, e
ff
ect size
• CPU centuries.
On the Reliability of Coverage-Based Fuzzer Benchmarking
Marcel Böhme
MPI-SP, Germany
Monash University, Australia
László Szekeres
Google, USA
Jonathan Metzman
Google, USA
ABSTRACT
Given a program where none of our fuzzers �nds any bugs, how do
we know which fuzzer is better? In practice, we often look to code
coverage as a proxy measure of fuzzer e�ectiveness and consider
the fuzzer which achieves more coverage as the better one.
Indeed, evaluating 10 fuzzers for 23 hours on 24 programs, we
�nd that a fuzzer that covers more code also �nds more bugs. There
is a very strong correlation between the coverage achieved and the
number of bugs found by a fuzzer. Hence, it might seem reasonable
to compare fuzzers in terms of coverage achieved, and from that
derive empirical claims about a fuzzer’s superiority at �nding bugs.
Curiously enough, however, we �nd no strong agreement on
which fuzzer is superior if we compared multiple fuzzers in terms
of coverage achieved instead of the number of bugs found. The
fuzzer best at achieving coverage, may not be best at �nding bugs.
ACM Reference Format:
Marcel Böhme, László Szekeres, and Jonathan Metzman. 2022. On the Relia-
bility of Coverage-Based Fuzzer Benchmarking. In 44th International Confer-
ence on Software Engineering (ICSE ’22), May 21–29, 2022, Pittsburgh, PA, USA.
ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3510003.3510230
1 INTRODUCTION
In the recent decade, fuzzing has found widespread interest. In
industry, we have large continuous fuzzing platforms employing
100k+ machines for automatic bug �nding [23, 24, 46]. In academia,
in 2020 alone, almost 50 fuzzing papers were published in the top
conferences for Security and Software Engineering [62].
Imagine, we have several fuzzers available to test our program.
Hopefully, none of them �nds any bugs. If indeed they don’t, we
might have some con�dence in the correctness of the program.
Then again, even a perfectly non-functional fuzzer would �nd no
bugs in our program. So, how do we know which fuzzer has the
highest “potential” of �nding bugs? A widely used proxy measure
of fuzzer e�ectiveness is the code coverage that is achieved. After
all, a fuzzer cannot �nd bugs in code that it does not cover.
Indeed, in our experiments we identify a very strong positive
correlation between the coverage achieved and the number of bugs
found by a fuzzer. Correlation assesses the strength of the associa-
tion between two random variables or measures. We conduct our
empirical investigation on 10 fuzzers ⇥ 24 C programs ⇥ 20 fuzzing
campaigns of 23 hours (⇡ 13 CPU years). We use three measures of
coverage and two measures of bug �nding, and our results suggest:
As the fuzzer covers more code, it also discovers more bugs.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for pro�t or commercial advantage and that copies bear this notice and the full citation
on the �rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ICSE ’22, May 21–29, 2022, Pittsburgh, PA, USA
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9221-1/22/05.
https://doi.org/10.1145/3510003.3510230
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10
Fuzzer Ranks by avg. #branches covered
Fuzzer
Ranks
by
avg.
#bugs
discovered
0 2 4 6 8 10
#benchmarks
(a) 1 hour fuzzing campaigns (d = 0.38).
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10
Fuzzer Ranks by avg. #branches covered
Fuzzer
Ranks
by
avg.
#bugs
discovered
0 2 4 6 8 10
#benchmarks
(b) 1 day fuzzing campaigns (d = 0.49).
Figure 1: Scatterplot of the ranks of 10 fuzzers applied to 24
programs for (a) 1 hour and (b) 23 hours, when ranking each
fuzzer in terms of the avg. number of branches covered (x-
axis) and in terms of the avg. number of bugs found (y-axis).
Hence, it might seem reasonable to conjecture that the fuzzer
which is better in terms of code coverage is also better in terms
of bug �nding—but is this really true? In Figure 1, we show the
ranking of these fuzzers across all programs in terms of the average
coverage achieved and the average number of bugs found in each
benchmark. The ranks are visibly di�erent. To be sure, we also
conducted a pair-wise comparison between any two fuzzers where
the di�erence in coverage and the di�erence in bug �nding are
statistically signi�cant. The results are similar.
We identify no strong agreement on the superiority or ranking
of a fuzzer when compared in terms of mean coverage versus mean
bug �nding. Inter-rater agreement assesses the degree to which
two raters, here both types of benchmarking, agree on the superi-
ority or ranking of a fuzzer when evaluated on multiple programs.
Indeed, two measures of the same construct are likely to exhibit a
high degree of correlation but can at the same time disagree sub-
stantially [41, 55]. We evaluate the agreement on fuzzer superiority
when comparing any two fuzzers where the di�erences in terms of
coverage and bug �nding are statistically signi�cant. We evaluate
the agreement on fuzzer ranking when comparing all the fuzzers.
Concretely, our results suggest a moderate agreement. For fuzzer
pairs, where the di�erences in terms of coverage and bug �nding
is statistically signi�cant, the results disagree for 10% to 15% of
programs. Only when measuring the agreement between branch
coverage and the number of bugs found and when we require the
di�erences to be statistically signi�cant at ?  0.0001 for coverage
and bug �nding, do we �nd a strong agreement. However, statistical
signi�cance at ?  0.0001 only in terms of coverage is not su�cient;
we again �nd only weak agreement. The increase in agreement
with statistical signi�cance is not observed when we measure bug
�nding using the time-to-error. We also �nd that the variance of the
agreement reduces as more programs are used, and that results of
1h campaigns do not strongly agree with results of 23h campaigns.
ICSE’22
Why Preregistration
• Sound fuzzer evaluation imposes high barrier to entry for newcomers.
1. Well-designed experiment methodology.
2. Substantial computation resources.
• Huge variance due to randomness
• Repeat 20x, 24hrs, X fuzzers, Y programs
• Statistical Signi
fi
cance, e
ff
ect size
• CPU centuries.
Many pitfalls of experimental design! Newcomers find out
only when receiving the reviews and after conducting
costly experiments following a flawed methodology.
Symptomatic plus-one comments.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
• Minimizes incentive to overclaim (while not reducing quality of evaluation).
• Allow publication of interesting ideas and investigations irrespective of results.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
• Early feedback for newcomers.
• On signi
fi
cance and novelty of the problem/approach/hypothesis.
• On soundness and reproducibility of experimental methodology.
• Further lower barrier, Google pledges help with fuzzer evaluation via FuzzBench.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
• Early feedback for newcomers.
• We hope our initiative will turn the focus of the peer-reviewing process
back to the innovation and key claims in a paper, while leaving the burden of
evidence until after the in-principle acceptance.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
• Early feedback for newcomers.
• We hope our initiative will turn the focus of the peer-reviewing process
back to the innovation and key claims in a paper, while leaving the burden of
evidence until after the in-principle acceptance.
• Reviewers go from gate-keeping to productive feedback.
Authors and reviewers work to ensure best study design possible.
Why Preregistration
Why Preregistration
Your thoughts
or experience?
Why Preregistration
• What do you see as the main strengths of the model?
• More reproduciblity.
• Less overclaims, mitigates publication bias, less unhealthy focus on positive results.
• Publications are more sound. Publication process is more fair.
• Allows interesting negative results, no forced positive result, less duplicated e
ff
ort.
• Ideas and methodology above positive results.
Why Preregistration
• What do you see as the main strengths of the model?
The main draws for me are the removal of the unhealthy focus on positive results
(bad for students, bad for reproducibility, bad for impact) as well as the fact that
the furthering of the
fi
eld is pushed forward with negative results regarding newly
attempted studies that have already been performed by others. Lastly, it removes
the questionable aspect of changing the approach until something working
appears, with no regard for a validation step. In ML lingo, we only have a test set,
no validation set, and are implicitly over
fi
tting to it with our early stopping.
“
“
Why Preregistration
• What do you see as the main weaknesses of the model?
Why Preregistration
• What do you see as the main weaknesses of the model?
• Time to publish is too long. Increased author / reviewing load.
Why Preregistration
• What do you see as the main weaknesses of the model?
• Time to publish is too long. Increased author / reviewing load.
At
fi
rst hand maybe longer publication process because of the pre-registration,
but overall it could be even faster, when someone also includes the time for
rejection and re-work etc.
“ “
Why Preregistration
• What do you see as the main weaknesses of the model?
• Time to publish is too long. Increased author / reviewing load.
• Sound experimental designs may be hard to create and vet / review.
• For the
fi
rst time, preregistration enables conversations about the soundness of
experimental design. It naturally creates and communicates community standards.
• Previously, experimental design was either accepted as is
or criticized with a high cost to authors.
Why Preregistration
• What do you see as the main weaknesses of the model?
• Time to publish is too long. Increased author / reviewing load.
• Sound experimental designs may be hard to create and vet / review.
• Is the model
fl
exible enough to accommodate changes in experimental design?
Why Preregistration
• What do you see as the main weaknesses of the model?
• Time to publish is too long. Increased author / reviewing load.
• Sound experimental designs may be hard to create and vet / review.
• Is the model
fl
exible enough to accommodate changes in experimental design?
• Yes. Deviations from the agreed protocol are allowed but must be explained.
Why Preregistration
• What do you see as the main weaknesses of the model?
• Time to publish is too long. Increased author / reviewing load.
• Sound experimental designs may be hard to create and vet / review.
• Is the model
fl
exible enough to accommodate changes in experimental design?
• Ideas that look bad theoretically may work well in practice.
• Without performing the experiment, we can't say if it could be useful or not.
• The model is not meant to substitute the traditional publication model, but to augment it.
• This model might not work very well for exploratory research (hypothesis generation).
• This model might work better for con
fi
rmatory research (hypothesis testing).
Why Preregistration
• In your opinion, how could this publication model be improved?
Why Preregistration
• In your opinion, how could this publication model be improved?
• Stage 2 publication in conference, instead of a journal.
Why Preregistration
• In your opinion, how could this publication model be improved?
• Stage 2 publication in conference, instead of a journal.
• We see conference as a forum for discussion (which happens in this workshop).
• Maybe Stage 1 in conference, Stage 2 in journal (+ conference presentation)?
Why Preregistration
• In your opinion, how could this publication model be improved?
• Stage 2 publication in conference, instead of a journal.
• Fast-track through Stage 1 and Stage 2 when results exist.
• Sounds like a more traditional publication, not preregistration :)
Why Preregistration
• In your opinion, how could this publication model be improved?
• Stage 2 publication in conference, instead of a journal.
• Fast-track through Stage 1 and Stage 2 when results exist.
Why Preregistration
• In your opinion, how could this publication model be improved?
• Stage 2 publication in conference, instead of a journal.
• Fast-track through Stage 1 and Stage 2 when results exist.
• Flexible author-list within reason, to incentivize post-announcement collaboration.
• Preregistration (where Stage 1 is published) would also allow early decon
fl
icting or
lead to increased collaboration between people with similar ideas and goals.
Why Preregistration
• In your opinion, how could this publication model be improved?
• Stage 2 publication in conference, instead of a journal.
• Fast-track through Stage 1 and Stage 2 when results exist.
• Flexible author-list within reason, to incentivize post-announcement collaboration.
Why Preregistration
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to fix this.
Preregistration
Stage 1 Stage 2
• Establish significance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
• Establish conformity.
• Execute agreed exp. protocol.
• Explain small deviations fr. protocol.
• Investigate unexpected results.
• Establish reproducibility.
• Submit evidence towards
the key claims in the paper.
Why Preregistration
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to fix this.
Preregistration
Stage 1 Stage 2
• Establish significance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
• Establish conformity.
• Execute agreed exp. protocol.
• Explain small deviations fr. protocol.
• Investigate unexpected results.
• Establish reproducibility.
• Submit evidence towards
the key claims in the paper.
Why Preregistration
• Sound fuzzer evaluation imposes high barrier to entry for newcomers.
1. Well-designed experiment methodology.
2. Substantial computation resources.
• Huge variance due to randomness
• Repeat 20x, 24hrs, X fuzzers, Y programs
• Statistical Significance, effect size
• CPU centuries.
Many pitfalls of experimental design! Newcomers find out
only when receiving the reviews and after conducting
costly experiments following a flawed methodology.
Symptomatic plus-one comments.
Why Preregistration
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to fix this.
Preregistration
Stage 1 Stage 2
• Establish significance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
• Establish conformity.
• Execute agreed exp. protocol.
• Explain small deviations fr. protocol.
• Investigate unexpected results.
• Establish reproducibility.
• Submit evidence towards
the key claims in the paper.
Why Preregistration
• Sound fuzzer evaluation imposes high barrier to entry for newcomers.
1. Well-designed experiment methodology.
2. Substantial computation resources.
• Huge variance due to randomness
• Repeat 20x, 24hrs, X fuzzers, Y programs
• Statistical Significance, effect size
• CPU centuries.
Many pitfalls of experimental design! Newcomers find out
only when receiving the reviews and after conducting
costly experiments following a flawed methodology.
Symptomatic plus-one comments.
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
• Early feedback for newcomers.
• We hope our initiative will turn the focus of the peer-reviewing process
back to the innovation and key claims in a paper, while leaving the burden of
evidence until after the in-principle acceptance.
• Reviewers go from gate-keeping to productive feedback.
Authors and reviewers work to ensure best study design possible.
Why Preregistration
We think that the incentive structure for fuzzing research is broken;
so we would like to introduce preregistration to fix this.
Preregistration
Stage 1 Stage 2
• Establish significance.
• Motivate the problem.
• Establish novelty.
• Discuss hypothesis for solution.
• Discuss related work.
• Establish soundness.
• Experimental design.
• Research questions & claims.
• Benchmarks & baselines.
• Establish conformity.
• Execute agreed exp. protocol.
• Explain small deviations fr. protocol.
• Investigate unexpected results.
• Establish reproducibility.
• Submit evidence towards
the key claims in the paper.
Why Preregistration
• Sound fuzzer evaluation imposes high barrier to entry for newcomers.
1. Well-designed experiment methodology.
2. Substantial computation resources.
• Huge variance due to randomness
• Repeat 20x, 24hrs, X fuzzers, Y programs
• Statistical Significance, effect size
• CPU centuries.
Many pitfalls of experimental design! Newcomers find out
only when receiving the reviews and after conducting
costly experiments following a flawed methodology.
Symptomatic plus-one comments.
Why Preregistration
Your thoughts
or experience?
Why Preregistration
• Address both issues by switching to a 2-stage publication process that
separates the review of (i) the methodology & ideas and (ii) the evidence.
• If Registered Report is in-principle accepted and proposed exp. design is
followed without unexplained deviations, results will be accepted as they are.
• Early feedback for newcomers.
• We hope our initiative will turn the focus of the peer-reviewing process
back to the innovation and key claims in a paper, while leaving the burden of
evidence until after the in-principle acceptance.
• Reviewers go from gate-keeping to productive feedback.
Authors and reviewers work to ensure best study design possible.

Weitere ähnliche Inhalte

Ähnlich wie An Implementation of Preregistration

Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites ijseajournal
 
Software tetsing paper related to industry
Software tetsing paper related to industrySoftware tetsing paper related to industry
Software tetsing paper related to industryJavedKhan524377
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLBryan Bischof
 
Design Fixation and conformity with examples
Design Fixation and conformity with examplesDesign Fixation and conformity with examples
Design Fixation and conformity with examplesBaskar Rethinasabapathi
 
Software testing primer nick jenkins
Software testing primer nick jenkinsSoftware testing primer nick jenkins
Software testing primer nick jenkinsSachin MK
 
Act5 08 Hajos Schenk Webseite
Act5 08 Hajos Schenk WebseiteAct5 08 Hajos Schenk Webseite
Act5 08 Hajos Schenk WebseiteAntalHajos
 
No estimates - 10 new principles for testing
No estimates  - 10 new principles for testingNo estimates  - 10 new principles for testing
No estimates - 10 new principles for testingVasco Duarte
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)Javier Gonzalez-Sanchez
 
Leaping over the Boundaries of Boundary Value Analysis
Leaping over the Boundaries of Boundary Value AnalysisLeaping over the Boundaries of Boundary Value Analysis
Leaping over the Boundaries of Boundary Value AnalysisTechWell
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityDelft University of Technology
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESKula Sekhar Reddy Yerraguntla
 
Key Measurements For Testers
Key Measurements For TestersKey Measurements For Testers
Key Measurements For TestersQA Programmer
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBJenny Nguyen
 
Question ISTQB foundation 3
Question ISTQB foundation 3Question ISTQB foundation 3
Question ISTQB foundation 3Jenny Nguyen
 
Replication and Benchmarking in Software Analytics
Replication and Benchmarking in Software AnalyticsReplication and Benchmarking in Software Analytics
Replication and Benchmarking in Software AnalyticsUniversity of Zurich
 
Exploring Exploratory Testing
Exploring Exploratory TestingExploring Exploratory Testing
Exploring Exploratory Testingnazeer pasha
 

Ähnlich wie An Implementation of Preregistration (20)

Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites
 
Software tetsing paper related to industry
Software tetsing paper related to industrySoftware tetsing paper related to industry
Software tetsing paper related to industry
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in ML
 
Design Fixation and conformity with examples
Design Fixation and conformity with examplesDesign Fixation and conformity with examples
Design Fixation and conformity with examples
 
Software testing primer nick jenkins
Software testing primer nick jenkinsSoftware testing primer nick jenkins
Software testing primer nick jenkins
 
Testing primer
Testing primerTesting primer
Testing primer
 
Testing primer
Testing primerTesting primer
Testing primer
 
Act5 08 Hajos Schenk Webseite
Act5 08 Hajos Schenk WebseiteAct5 08 Hajos Schenk Webseite
Act5 08 Hajos Schenk Webseite
 
No estimates - 10 new principles for testing
No estimates  - 10 new principles for testingNo estimates  - 10 new principles for testing
No estimates - 10 new principles for testing
 
Reliability Vs. Testing
Reliability Vs. TestingReliability Vs. Testing
Reliability Vs. Testing
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)
 
Leaping over the Boundaries of Boundary Value Analysis
Leaping over the Boundaries of Boundary Value AnalysisLeaping over the Boundaries of Boundary Value Analysis
Leaping over the Boundaries of Boundary Value Analysis
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASESA PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
 
Key Measurements For Testers
Key Measurements For TestersKey Measurements For Testers
Key Measurements For Testers
 
Ôn tập kiến thức ISTQB
Ôn tập kiến thức ISTQBÔn tập kiến thức ISTQB
Ôn tập kiến thức ISTQB
 
Question ISTQB foundation 3
Question ISTQB foundation 3Question ISTQB foundation 3
Question ISTQB foundation 3
 
Software testing
Software testingSoftware testing
Software testing
 
Replication and Benchmarking in Software Analytics
Replication and Benchmarking in Software AnalyticsReplication and Benchmarking in Software Analytics
Replication and Benchmarking in Software Analytics
 
Exploring Exploratory Testing
Exploring Exploratory TestingExploring Exploratory Testing
Exploring Exploratory Testing
 

Mehr von mboehme

On the Reliability of Coverage-based Fuzzer Benchmarking
On the Reliability of Coverage-based Fuzzer BenchmarkingOn the Reliability of Coverage-based Fuzzer Benchmarking
On the Reliability of Coverage-based Fuzzer Benchmarkingmboehme
 
Statistical Reasoning About Programs
Statistical Reasoning About ProgramsStatistical Reasoning About Programs
Statistical Reasoning About Programsmboehme
 
The Curious Case of Fuzzing for Automated Software Testing
The Curious Case of Fuzzing for Automated Software TestingThe Curious Case of Fuzzing for Automated Software Testing
The Curious Case of Fuzzing for Automated Software Testingmboehme
 
On the Surprising Efficiency and Exponential Cost of Fuzzing
On the Surprising Efficiency and Exponential Cost of FuzzingOn the Surprising Efficiency and Exponential Cost of Fuzzing
On the Surprising Efficiency and Exponential Cost of Fuzzingmboehme
 
Foundations Of Software Testing
Foundations Of Software TestingFoundations Of Software Testing
Foundations Of Software Testingmboehme
 
DS3 Fuzzing Panel (M. Boehme)
DS3 Fuzzing Panel (M. Boehme)DS3 Fuzzing Panel (M. Boehme)
DS3 Fuzzing Panel (M. Boehme)mboehme
 
Fuzzing: On the Exponential Cost of Vulnerability Discovery
Fuzzing: On the Exponential Cost of Vulnerability DiscoveryFuzzing: On the Exponential Cost of Vulnerability Discovery
Fuzzing: On the Exponential Cost of Vulnerability Discoverymboehme
 
Boosting Fuzzer Efficiency: An Information Theoretic Perspective
Boosting Fuzzer Efficiency: An Information Theoretic PerspectiveBoosting Fuzzer Efficiency: An Information Theoretic Perspective
Boosting Fuzzer Efficiency: An Information Theoretic Perspectivemboehme
 
Fuzzing: Challenges and Reflections
Fuzzing: Challenges and ReflectionsFuzzing: Challenges and Reflections
Fuzzing: Challenges and Reflectionsmboehme
 
AFLGo: Directed Greybox Fuzzing
AFLGo: Directed Greybox FuzzingAFLGo: Directed Greybox Fuzzing
AFLGo: Directed Greybox Fuzzingmboehme
 
NUS SoC Graduate Outreach @ TU Dresden
NUS SoC Graduate Outreach @ TU DresdenNUS SoC Graduate Outreach @ TU Dresden
NUS SoC Graduate Outreach @ TU Dresdenmboehme
 

Mehr von mboehme (11)

On the Reliability of Coverage-based Fuzzer Benchmarking
On the Reliability of Coverage-based Fuzzer BenchmarkingOn the Reliability of Coverage-based Fuzzer Benchmarking
On the Reliability of Coverage-based Fuzzer Benchmarking
 
Statistical Reasoning About Programs
Statistical Reasoning About ProgramsStatistical Reasoning About Programs
Statistical Reasoning About Programs
 
The Curious Case of Fuzzing for Automated Software Testing
The Curious Case of Fuzzing for Automated Software TestingThe Curious Case of Fuzzing for Automated Software Testing
The Curious Case of Fuzzing for Automated Software Testing
 
On the Surprising Efficiency and Exponential Cost of Fuzzing
On the Surprising Efficiency and Exponential Cost of FuzzingOn the Surprising Efficiency and Exponential Cost of Fuzzing
On the Surprising Efficiency and Exponential Cost of Fuzzing
 
Foundations Of Software Testing
Foundations Of Software TestingFoundations Of Software Testing
Foundations Of Software Testing
 
DS3 Fuzzing Panel (M. Boehme)
DS3 Fuzzing Panel (M. Boehme)DS3 Fuzzing Panel (M. Boehme)
DS3 Fuzzing Panel (M. Boehme)
 
Fuzzing: On the Exponential Cost of Vulnerability Discovery
Fuzzing: On the Exponential Cost of Vulnerability DiscoveryFuzzing: On the Exponential Cost of Vulnerability Discovery
Fuzzing: On the Exponential Cost of Vulnerability Discovery
 
Boosting Fuzzer Efficiency: An Information Theoretic Perspective
Boosting Fuzzer Efficiency: An Information Theoretic PerspectiveBoosting Fuzzer Efficiency: An Information Theoretic Perspective
Boosting Fuzzer Efficiency: An Information Theoretic Perspective
 
Fuzzing: Challenges and Reflections
Fuzzing: Challenges and ReflectionsFuzzing: Challenges and Reflections
Fuzzing: Challenges and Reflections
 
AFLGo: Directed Greybox Fuzzing
AFLGo: Directed Greybox FuzzingAFLGo: Directed Greybox Fuzzing
AFLGo: Directed Greybox Fuzzing
 
NUS SoC Graduate Outreach @ TU Dresden
NUS SoC Graduate Outreach @ TU DresdenNUS SoC Graduate Outreach @ TU Dresden
NUS SoC Graduate Outreach @ TU Dresden
 

Kürzlich hochgeladen

Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Denish Jangid
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online PresentationGDSCYCCE
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Celine George
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptxmanishaJyala2
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Mohamed Rizk Khodair
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17Celine George
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Mark Carrigan
 
The Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdfThe Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdfdm4ashexcelr
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...Nguyen Thanh Tu Collection
 
Behavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfBehavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfaedhbteg
 
Open Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointOpen Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointELaRue0
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resourcesaileywriter
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticspragatimahajan3
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfmstarkes24
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...Nguyen Thanh Tu Collection
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
 
The Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdfThe Ultimate Guide to Social Media Marketing in 2024.pdf
The Ultimate Guide to Social Media Marketing in 2024.pdf
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
Behavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfBehavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdf
 
Open Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointOpen Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPoint
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 

An Implementation of Preregistration

  • 1. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 1 Stage 2
  • 2. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 1 Stage 2 Stage 1
  • 3. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 1 Stage 2 • Establish signi fi cance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. In-principle Accepted! Go to Stage 2. Outcomes of Stage 1:
  • 4. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 1 Stage 2 • Establish signi fi cance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. In-principle Accepted! Go to Stage 2. Major / Minor Revision. Back to Stage 1. Outcomes of Stage 1:
  • 5. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 1 Stage 2 • Establish signi fi cance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. In-principle Accepted! Go to Stage 2. Major / Minor Revision. Back to Stage 1. Rejected. Outcomes of Stage 1:
  • 6. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 1 Stage 2 • Establish signi fi cance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. • Establish conformity. • Execute agreed exp. protocol. • Explain small deviations fr. protocol. • Investigate unexpected results. • Establish reproducibility. • Submit evidence towards the key claims in the paper.
  • 7. We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fi x this. Preregistration Stage 2 • Establish conformity. • Execute agreed exp. protocol. • Explain small deviations fr. protocol. • Investigate unexpected results. • Establish reproducibility. • Submit evidence towards the key claims in the paper. Outcomes of Stage 2: Accept Major / Minor Revision Explain deviations / unexpected results. Improve artifact / reproducibility. Reject Severe deviations from experimental protocol.
  • 8. Why Preregistration • To get you fuzzing paper published, you need strong positive results. • We believe, this unhealthy focus is a substantial inhibitor of scienti fi c progress. • Duplicated E ff orts: Important investigations are never published.
  • 9. Why Preregistration • To get you fuzzing paper published, you need strong positive results. • We believe, this unhealthy focus is a substantial inhibitor of scienti fi c progress. • Duplicated E ff orts: Important investigations are never published. • Hypothesis / approach perfectly reasonable and scienti fi c appealing, If hypothesis proves to be invalid or approach ine ff ective, other groups will never now.
  • 10. Why Preregistration • To get you fuzzing paper published, you need strong positive results. • We believe, this unhealthy focus is a substantial inhibitor of scienti fi c progress. • Duplicated E ff orts: Important investigations are never published. • Overclaims: Incentive to overclaim the bene fi ts of an approach.
  • 11. Why Preregistration • To get you fuzzing paper published, you need strong positive results. • We believe, this unhealthy focus is a substantial inhibitor of scienti fi c progress. • Duplicated E ff orts: Important investigations are never published. • Overclaims: Incentive to overclaim the bene fi ts of an approach. • Di ffi cult to reproduce the results and misinforms future investigations by the community. • Authors are uncomfortable sharing their research prototypes. In 2020 only 35 of 60 fuzzing papers we surveyed published code w/ paper.
  • 12. Why Preregistration • Sound fuzzer evaluation imposes high barrier to entry for newcomers.
  • 13. Why Preregistration • Sound fuzzer evaluation imposes high barrier to entry for newcomers. 1. Well-designed experiment methodology. 2. Substantial computation resources. • Huge variance due to randomness • Repeat 20x, 24hrs, X fuzzers, Y programs • Statistical Signi fi cance, e ff ect size • CPU centuries. On the Reliability of Coverage-Based Fuzzer Benchmarking Marcel Böhme MPI-SP, Germany Monash University, Australia László Szekeres Google, USA Jonathan Metzman Google, USA ABSTRACT Given a program where none of our fuzzers �nds any bugs, how do we know which fuzzer is better? In practice, we often look to code coverage as a proxy measure of fuzzer e�ectiveness and consider the fuzzer which achieves more coverage as the better one. Indeed, evaluating 10 fuzzers for 23 hours on 24 programs, we �nd that a fuzzer that covers more code also �nds more bugs. There is a very strong correlation between the coverage achieved and the number of bugs found by a fuzzer. Hence, it might seem reasonable to compare fuzzers in terms of coverage achieved, and from that derive empirical claims about a fuzzer’s superiority at �nding bugs. Curiously enough, however, we �nd no strong agreement on which fuzzer is superior if we compared multiple fuzzers in terms of coverage achieved instead of the number of bugs found. The fuzzer best at achieving coverage, may not be best at �nding bugs. ACM Reference Format: Marcel Böhme, László Szekeres, and Jonathan Metzman. 2022. On the Relia- bility of Coverage-Based Fuzzer Benchmarking. In 44th International Confer- ence on Software Engineering (ICSE ’22), May 21–29, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3510003.3510230 1 INTRODUCTION In the recent decade, fuzzing has found widespread interest. In industry, we have large continuous fuzzing platforms employing 100k+ machines for automatic bug �nding [23, 24, 46]. In academia, in 2020 alone, almost 50 fuzzing papers were published in the top conferences for Security and Software Engineering [62]. Imagine, we have several fuzzers available to test our program. Hopefully, none of them �nds any bugs. If indeed they don’t, we might have some con�dence in the correctness of the program. Then again, even a perfectly non-functional fuzzer would �nd no bugs in our program. So, how do we know which fuzzer has the highest “potential” of �nding bugs? A widely used proxy measure of fuzzer e�ectiveness is the code coverage that is achieved. After all, a fuzzer cannot �nd bugs in code that it does not cover. Indeed, in our experiments we identify a very strong positive correlation between the coverage achieved and the number of bugs found by a fuzzer. Correlation assesses the strength of the associa- tion between two random variables or measures. We conduct our empirical investigation on 10 fuzzers ⇥ 24 C programs ⇥ 20 fuzzing campaigns of 23 hours (⇡ 13 CPU years). We use three measures of coverage and two measures of bug �nding, and our results suggest: As the fuzzer covers more code, it also discovers more bugs. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro�t or commercial advantage and that copies bear this notice and the full citation on the �rst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). ICSE ’22, May 21–29, 2022, Pittsburgh, PA, USA © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9221-1/22/05. https://doi.org/10.1145/3510003.3510230 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Fuzzer Ranks by avg. #branches covered Fuzzer Ranks by avg. #bugs discovered 0 2 4 6 8 10 #benchmarks (a) 1 hour fuzzing campaigns (d = 0.38). 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Fuzzer Ranks by avg. #branches covered Fuzzer Ranks by avg. #bugs discovered 0 2 4 6 8 10 #benchmarks (b) 1 day fuzzing campaigns (d = 0.49). Figure 1: Scatterplot of the ranks of 10 fuzzers applied to 24 programs for (a) 1 hour and (b) 23 hours, when ranking each fuzzer in terms of the avg. number of branches covered (x- axis) and in terms of the avg. number of bugs found (y-axis). Hence, it might seem reasonable to conjecture that the fuzzer which is better in terms of code coverage is also better in terms of bug �nding—but is this really true? In Figure 1, we show the ranking of these fuzzers across all programs in terms of the average coverage achieved and the average number of bugs found in each benchmark. The ranks are visibly di�erent. To be sure, we also conducted a pair-wise comparison between any two fuzzers where the di�erence in coverage and the di�erence in bug �nding are statistically signi�cant. The results are similar. We identify no strong agreement on the superiority or ranking of a fuzzer when compared in terms of mean coverage versus mean bug �nding. Inter-rater agreement assesses the degree to which two raters, here both types of benchmarking, agree on the superi- ority or ranking of a fuzzer when evaluated on multiple programs. Indeed, two measures of the same construct are likely to exhibit a high degree of correlation but can at the same time disagree sub- stantially [41, 55]. We evaluate the agreement on fuzzer superiority when comparing any two fuzzers where the di�erences in terms of coverage and bug �nding are statistically signi�cant. We evaluate the agreement on fuzzer ranking when comparing all the fuzzers. Concretely, our results suggest a moderate agreement. For fuzzer pairs, where the di�erences in terms of coverage and bug �nding is statistically signi�cant, the results disagree for 10% to 15% of programs. Only when measuring the agreement between branch coverage and the number of bugs found and when we require the di�erences to be statistically signi�cant at ?  0.0001 for coverage and bug �nding, do we �nd a strong agreement. However, statistical signi�cance at ?  0.0001 only in terms of coverage is not su�cient; we again �nd only weak agreement. The increase in agreement with statistical signi�cance is not observed when we measure bug �nding using the time-to-error. We also �nd that the variance of the agreement reduces as more programs are used, and that results of 1h campaigns do not strongly agree with results of 23h campaigns. ICSE’22
  • 14. Why Preregistration • Sound fuzzer evaluation imposes high barrier to entry for newcomers. 1. Well-designed experiment methodology. 2. Substantial computation resources. • Huge variance due to randomness • Repeat 20x, 24hrs, X fuzzers, Y programs • Statistical Signi fi cance, e ff ect size • CPU centuries. Many pitfalls of experimental design! Newcomers find out only when receiving the reviews and after conducting costly experiments following a flawed methodology. Symptomatic plus-one comments.
  • 15. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence.
  • 16. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are.
  • 17. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are. • Minimizes incentive to overclaim (while not reducing quality of evaluation). • Allow publication of interesting ideas and investigations irrespective of results.
  • 18. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are. • Early feedback for newcomers. • On signi fi cance and novelty of the problem/approach/hypothesis. • On soundness and reproducibility of experimental methodology. • Further lower barrier, Google pledges help with fuzzer evaluation via FuzzBench.
  • 19. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are. • Early feedback for newcomers. • We hope our initiative will turn the focus of the peer-reviewing process back to the innovation and key claims in a paper, while leaving the burden of evidence until after the in-principle acceptance.
  • 20. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are. • Early feedback for newcomers. • We hope our initiative will turn the focus of the peer-reviewing process back to the innovation and key claims in a paper, while leaving the burden of evidence until after the in-principle acceptance. • Reviewers go from gate-keeping to productive feedback. Authors and reviewers work to ensure best study design possible.
  • 23. Why Preregistration • What do you see as the main strengths of the model? • More reproduciblity. • Less overclaims, mitigates publication bias, less unhealthy focus on positive results. • Publications are more sound. Publication process is more fair. • Allows interesting negative results, no forced positive result, less duplicated e ff ort. • Ideas and methodology above positive results.
  • 24. Why Preregistration • What do you see as the main strengths of the model? The main draws for me are the removal of the unhealthy focus on positive results (bad for students, bad for reproducibility, bad for impact) as well as the fact that the furthering of the fi eld is pushed forward with negative results regarding newly attempted studies that have already been performed by others. Lastly, it removes the questionable aspect of changing the approach until something working appears, with no regard for a validation step. In ML lingo, we only have a test set, no validation set, and are implicitly over fi tting to it with our early stopping. “ “
  • 25. Why Preregistration • What do you see as the main weaknesses of the model?
  • 26. Why Preregistration • What do you see as the main weaknesses of the model? • Time to publish is too long. Increased author / reviewing load.
  • 27. Why Preregistration • What do you see as the main weaknesses of the model? • Time to publish is too long. Increased author / reviewing load. At fi rst hand maybe longer publication process because of the pre-registration, but overall it could be even faster, when someone also includes the time for rejection and re-work etc. “ “
  • 28. Why Preregistration • What do you see as the main weaknesses of the model? • Time to publish is too long. Increased author / reviewing load. • Sound experimental designs may be hard to create and vet / review. • For the fi rst time, preregistration enables conversations about the soundness of experimental design. It naturally creates and communicates community standards. • Previously, experimental design was either accepted as is or criticized with a high cost to authors.
  • 29. Why Preregistration • What do you see as the main weaknesses of the model? • Time to publish is too long. Increased author / reviewing load. • Sound experimental designs may be hard to create and vet / review. • Is the model fl exible enough to accommodate changes in experimental design?
  • 30. Why Preregistration • What do you see as the main weaknesses of the model? • Time to publish is too long. Increased author / reviewing load. • Sound experimental designs may be hard to create and vet / review. • Is the model fl exible enough to accommodate changes in experimental design? • Yes. Deviations from the agreed protocol are allowed but must be explained.
  • 31. Why Preregistration • What do you see as the main weaknesses of the model? • Time to publish is too long. Increased author / reviewing load. • Sound experimental designs may be hard to create and vet / review. • Is the model fl exible enough to accommodate changes in experimental design? • Ideas that look bad theoretically may work well in practice. • Without performing the experiment, we can't say if it could be useful or not. • The model is not meant to substitute the traditional publication model, but to augment it. • This model might not work very well for exploratory research (hypothesis generation). • This model might work better for con fi rmatory research (hypothesis testing).
  • 32. Why Preregistration • In your opinion, how could this publication model be improved?
  • 33. Why Preregistration • In your opinion, how could this publication model be improved? • Stage 2 publication in conference, instead of a journal.
  • 34. Why Preregistration • In your opinion, how could this publication model be improved? • Stage 2 publication in conference, instead of a journal. • We see conference as a forum for discussion (which happens in this workshop). • Maybe Stage 1 in conference, Stage 2 in journal (+ conference presentation)?
  • 35. Why Preregistration • In your opinion, how could this publication model be improved? • Stage 2 publication in conference, instead of a journal. • Fast-track through Stage 1 and Stage 2 when results exist. • Sounds like a more traditional publication, not preregistration :)
  • 36. Why Preregistration • In your opinion, how could this publication model be improved? • Stage 2 publication in conference, instead of a journal. • Fast-track through Stage 1 and Stage 2 when results exist.
  • 37. Why Preregistration • In your opinion, how could this publication model be improved? • Stage 2 publication in conference, instead of a journal. • Fast-track through Stage 1 and Stage 2 when results exist. • Flexible author-list within reason, to incentivize post-announcement collaboration. • Preregistration (where Stage 1 is published) would also allow early decon fl icting or lead to increased collaboration between people with similar ideas and goals.
  • 38. Why Preregistration • In your opinion, how could this publication model be improved? • Stage 2 publication in conference, instead of a journal. • Fast-track through Stage 1 and Stage 2 when results exist. • Flexible author-list within reason, to incentivize post-announcement collaboration.
  • 39. Why Preregistration We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fix this. Preregistration Stage 1 Stage 2 • Establish significance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. • Establish conformity. • Execute agreed exp. protocol. • Explain small deviations fr. protocol. • Investigate unexpected results. • Establish reproducibility. • Submit evidence towards the key claims in the paper.
  • 40. Why Preregistration We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fix this. Preregistration Stage 1 Stage 2 • Establish significance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. • Establish conformity. • Execute agreed exp. protocol. • Explain small deviations fr. protocol. • Investigate unexpected results. • Establish reproducibility. • Submit evidence towards the key claims in the paper. Why Preregistration • Sound fuzzer evaluation imposes high barrier to entry for newcomers. 1. Well-designed experiment methodology. 2. Substantial computation resources. • Huge variance due to randomness • Repeat 20x, 24hrs, X fuzzers, Y programs • Statistical Significance, effect size • CPU centuries. Many pitfalls of experimental design! Newcomers find out only when receiving the reviews and after conducting costly experiments following a flawed methodology. Symptomatic plus-one comments.
  • 41. Why Preregistration We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fix this. Preregistration Stage 1 Stage 2 • Establish significance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. • Establish conformity. • Execute agreed exp. protocol. • Explain small deviations fr. protocol. • Investigate unexpected results. • Establish reproducibility. • Submit evidence towards the key claims in the paper. Why Preregistration • Sound fuzzer evaluation imposes high barrier to entry for newcomers. 1. Well-designed experiment methodology. 2. Substantial computation resources. • Huge variance due to randomness • Repeat 20x, 24hrs, X fuzzers, Y programs • Statistical Significance, effect size • CPU centuries. Many pitfalls of experimental design! Newcomers find out only when receiving the reviews and after conducting costly experiments following a flawed methodology. Symptomatic plus-one comments. Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are. • Early feedback for newcomers. • We hope our initiative will turn the focus of the peer-reviewing process back to the innovation and key claims in a paper, while leaving the burden of evidence until after the in-principle acceptance. • Reviewers go from gate-keeping to productive feedback. Authors and reviewers work to ensure best study design possible.
  • 42. Why Preregistration We think that the incentive structure for fuzzing research is broken; so we would like to introduce preregistration to fix this. Preregistration Stage 1 Stage 2 • Establish significance. • Motivate the problem. • Establish novelty. • Discuss hypothesis for solution. • Discuss related work. • Establish soundness. • Experimental design. • Research questions & claims. • Benchmarks & baselines. • Establish conformity. • Execute agreed exp. protocol. • Explain small deviations fr. protocol. • Investigate unexpected results. • Establish reproducibility. • Submit evidence towards the key claims in the paper. Why Preregistration • Sound fuzzer evaluation imposes high barrier to entry for newcomers. 1. Well-designed experiment methodology. 2. Substantial computation resources. • Huge variance due to randomness • Repeat 20x, 24hrs, X fuzzers, Y programs • Statistical Significance, effect size • CPU centuries. Many pitfalls of experimental design! Newcomers find out only when receiving the reviews and after conducting costly experiments following a flawed methodology. Symptomatic plus-one comments. Why Preregistration Your thoughts or experience? Why Preregistration • Address both issues by switching to a 2-stage publication process that separates the review of (i) the methodology & ideas and (ii) the evidence. • If Registered Report is in-principle accepted and proposed exp. design is followed without unexplained deviations, results will be accepted as they are. • Early feedback for newcomers. • We hope our initiative will turn the focus of the peer-reviewing process back to the innovation and key claims in a paper, while leaving the burden of evidence until after the in-principle acceptance. • Reviewers go from gate-keeping to productive feedback. Authors and reviewers work to ensure best study design possible.