Performing quantitative software analytics studies can be an immensely rewarding activity for scientists performing empirical research. However, such studies often pose numerous engineering challenges. The researcher must hunt down appropriate data sets, devise bespoke collection and processing tools, and optimise performance to match the size of the collected data. I will discuss principles and strategies that can be used to deal with these problems, and present examples of associated tools and techniques. Some particularly effective strategies associated with data set construction involve recursion, web searching, synthesis, probing, instrumentation, and the nurturing of alliances. On the processing front approaches include the opportunistic scavenging of tool front-ends, the exploratory development of pipelines, as well as the exploitation of tool interoperability, scripting languages, and their rich libraries. The required performance can be obtained through parallelism, stream processing, the judicious use of low-level facilities, and the choice of appropriate samples. I will finish the presentation with an overview of open problems and challenges in software analytics in vertical domains, data analysis, and under-represented stakeholders.
1. 1
Engineering
Software Analytics Studies
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
http://www.spinellis.gr/
@CoolSWEng
18/9/2014 1
29. 29
$ # Print author names
git log --format='%an' |
# Order them by author
sort |
# Count number of commits for each author
uniq --count |
# Order them by number of commits
sort --numeric-sort --reverse |
# Print top ten
head -10
20131 Linus Torvalds
8445 David S. Miller
7692 Andrew Morton
5156 Greg Kroah-Hartman
5116 Mark Brown
4723 Russell King
4584 Takashi Iwai
4385 Al Viro
4220 Ingo Molnar
3276 Tejun Heo
30. 30
100
80
60
40
20
0
0 10 20 30 40 50 60 70 80 90
Number of developers in the annotated file
46. 46
Tool front ends
find /usr/ports/ -name Makefile
-maxdepth 3 |
sed 's,/Makefile,,' |
while read dir
do
cd $dir
make -V PORTNAME
-V RUN_DEPENDS -V BUILD_DEPENDS
-V LIB_DEPENDS -V FETCH_DEPENDS
-V DEPENDS
done
66. 66
class RangeMap {
private BitSet active;
public static final int NMONTH = (2009 - 2001) * 12;
public RangeMap() {
active = new BitSet(NMONTH);
}
}
class EntryDetails {
RangeMap defined, stub;
ArrayList <Integer> nrefs;
String definer, referrer, firstRef;
Date firstDef;
int numReferences, numContributors;
int numRevisions, numReverts;
public EntryDetails() {
defined = new RangeMap();
stub = new RangeMap();
nrefs = new ArrayList<Integer>(RangeMap.NMONTH);
}
}
MB
7,212
12,287
C++ Java