2. Requirements Traceability
• Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and
backwards direction” (Gotel, 1994)
2
3. What’s Requirements Traceability Good For?
Program Comprehension
Discover what code needs to change to
handle a new requirement
Aid in determining whether a specification is
completely implemented
3
4. IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)
• Latent Semantic Indexing (Marcus and Maletic, 2003)
• Jensen Shannon Divergence (Abadi et al. 2008)
• Latent Dirichlet Allocation (Asuncion, 2010)
4
6. Goal
• Mining software repository to improve
recovery traceability links
• Using software repository links to improve
expert’s trust in an automatically recovered
link
6
7. Inspiration
• Web trust model (Palmer, 2000, McKnight, 2002)
• Initial Trust
• Reputation Trust
7
15. Case Studies
15
Pooka SIP Communicator
Version 2.0 1.0
Number of Classes 298 1,771
Number of Methods 20,868 31,502
LOC 244K 487K
SVN History 2000 – 2010 2005-2010
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
16. Hypotheses
16
H01: There is no statistical difference in the precision of the
recovered traceability links when using Trustrace or a VSM-
based approach
H02: There is no statistical difference in the recall of the
recovered traceability links when using Trustrace or a VSM-
based approach
19. SVN Logs Preprocessing
19
We extract CVS/SVN commits and discards those that:
1. Are tagged as “delete”
2. Does not concern source code (e.g., changed manual pages or
documentation only)
3. Have messages of length shorter or equal to two words.
21. Information Retrieval (IR) Methods
• Vector Space Model (VSM) (Salton et al., 1975)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors
21
28. Discussion
• Using different source of information reduces an
experts effort up to 50%
• Using temporal information with IR-based
approaches yields better results
• The results tend to improve when increasing the SVN
commit log size
• Trustrace also improves LSI results at k=50 and k=200
values for Pooka and SIP respectively
28
29. Threats to Validity
• External validity:
• We analyzed only two systems
• Construct validity:
• The two researchers built both oracles
• Oracles were validated by other two experts
• Internal validity: Different ʎ value may lead to different results
• Reliability validity: replication package is available online at
www.ptidej.net
• Tool is online at www.factrace.net
29
30. Ongoing work
More IR approaches and datasets
Empirical study
Including other friends (bug reports etc.)
Determine heuristics to identify the best ʎ
30
31. Summary
• Only similarity value is not enough to trust a
link
• Other source of information is required to
increase trust of a link
31