Oleksandr ZAITSEV obtaining PhD in Informatics from the University of Lille.
Title: Data Mining-based Tools to Support Library Update
Date: October 28, 2022
Location: Inria Lille - Nord Europe. Park Plaza, Parc scientifique de la Haute-Borne, 6 Rue Heloïse Bât B, 59650 Villeneuve-d'Ascq, France
Composition of the jury:
Supervisor: Stéphane DUCASSE
Co-supervisor: Nicolas ANQUETIL
Industrial advisor: Arnaud THIEFAINE
Reviewers: Romain ROBBES, Coen de ROOVER
Examiner: Olga KOUCHNARENKO
This was a Cifre PhD between Inria research institute and Arolla software company. Oleksandr ZAITSEV is grateful to Arolla for sponsoring his research.
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr ZAITSEV
1. Data Mining-based Tools to Support Library Update
Oleksandr ZAITSEV
PhD Thesis Defence
Composition of the jury:
Reviewers:
Olga KOUCHNARENKO
Examiner:
Romain ROBBES
Coen DE ROOVER
Stéphane DUCASSE
Nicolas ANQUETIL
Supervisors:
Arnaud THIEFAINE
Co-supervisor:
2. 2
(software company) (research institute)
Advisor: Arnaud THIEFAINE Supervisors: Stéphane DUCASSE
Nicolas ANQUETIL
Paris Lille
Cifre PhD
3. 3
Arolla is a consulting company specialised in
the advanced techniques of software development:
Clean Code, TDD, BDD, Legacy Remediation, etc.
https://www.arolla.fr/
9. 9
How to change client code
to use the new version of a library?
Supporting developers during library update
by building tools
My thesis is about:
Library update problem:
Scope of my Thesis
Part 1 / 6
10. 10
How to use the new version of the same library?
How to use a different (although similar) library?
Library Update
Disambiguation
Library Migration
v1.0 v2.0
A B
(e.g., update Struts v1.0 to Struts v1.2)
(e.g., replace EasyMock with Mockito)
Part 1 / 6
11. 11
Motivating Example
from ailib.models import LinearRegression
from ailib.data import readData
data = readData(‘dataset.csv’, type=‘CSV’)
model = LinearRegression()
model.train(data, ycolumn=‘salary’)
salary = model.predict([26, ‘female’])
ailib v1.0
Client
Developer
depends
Part 1 / 6
12. 12
Motivating Example
from ailib.models import LinearRegression
from ailib.data import readData
data = readData(‘dataset.csv’, type=‘CSV’)
model = LinearRegression()
model.train(data, ycolumn=‘salary’)
salary = model.predict([26, ‘female’])
ailib v2.0
Client
Developer
depends
Part 1 / 6
! Error: LinearRegression not found
! Error: readData() not found
! Error: wrong argument passed to predict()
?
13. 13
Motivating Example
ailib v2.0
Client
Developer
depends
Part 1 / 6
1. Rename LinearRegression to
AILinearRegression
2. Replace readData(file, type)
with readCsv(data)
3. predict() should now accept a
list of rows instead of a single row
Library
Developer
from ailib.models import AILinearRegression
from ailib.data import readCsv
data = readCsv(‘dataset.csv’)
model = AILinearRegression()
model.train(data, ycolumn=‘salary’)
salary = model.predict([[26, ‘female’]])
communicate
14. 14
Understand library
update in practice:
‣ What problems do
developers face?
‣ What support do
they need?
Propose a language to
express code
transformations
Automatic
transformations
discovery
3 Aspects of the Problem
Empirical Language Automation
Part 1 / 6
15. 15
Pharo is a pure object-oriented
dynamically-typed programming
language.
We focus on Pharo because:
1. We have access to its
core developers
2. Pharo is convenient for
manipulating source code
Pharo Programming Language
Part 1 / 6
16. 1. Introduction
2. State of the Art
3. Developer Survey of Library Update
4. Deprewriter: Smart deprecation rewriting
5. DepMiner: Recommending transformation rules
6. Conclusion
Plan
19. 19
Paper Survey Code Analysis Client Dev. Library Dev.
[Robbes 2012a] X ✓ ✓ X
[Jezek 2015] X ✓ ✓ ✓
[Hora 2015] X ✓ ✓ X
[Bogart 2016] ✓ X ✓ ✓
[Sawant 2016] X ✓ ✓ X
[Xavier 2017a] X ✓ ✓ ✓
[Xavier 2017b] ✓ X X ✓
[Hora 2018] X ✓ ✓ X
[Kula 2018a] ✓ ✓ ✓ X
[Kula 2018b] X ✓ X ✓
[Brito 2019] ✓ X ✓ ✓
Empirical Studies
Part 2 / 6
20. 20
Shortcomings:
1. Most surveys analyse specific cases of breaking changes.
2. No survey has asked client developers about what makes
library update hard and what makes it easy.
Empirical Studies
Part 2 / 6
22. 22
Sources of Information Technique
exp — developer expertise
doc — documentation
hist — commit history
2v — two versions of source code
TS — textual similarity
SS — structural similarity
CD — call dependency
Tools to Support Library Update
Part 2 / 6
L — library
C — already updated clients
T — unit tests
23. 23
Paper Update Library dev. Source Technique Dynamic Rewriting
[Chow 1996] ✓ ✓ exp(L) — X
[Henkel 2005] ✓ ✓ exp(L) — X
[Kim 2007] ✓ — 2v(L) TS X
[Xing 2007] ✓ X 2v(L) SS, TS X
[Dagenais 2008] ✓ X hist(L) CD X
[Schäfer 2008] ✓ X 2v(C,T) CD X
[Wu 2010] ✓ X 2v(L) CD, TS X
[Nguyen 2010] ✓ X 2v(L,C,T) CD, TS, SS X
[Meng 2012] ✓ X hist(L) CD X
[Teyton 2013] X X hist(L) CD X
[Hora 2014] ✓ X hist(L) CD ✓
[Pandita 2015] X X 2v(L) TS X
[Alrubaye 2019] X X hist(C), doc(L) CD, TS X
[Alrubaye 2020] X X doc(L) CD, TS X
Tools to Support Library Update
Part 2 / 6
24. 24
Shortcomings:
1. No studies propose automated tools to support library developers.
2. Most studies do not consider dynamic rewriting
3. Most studies focus on statically-typed programming languages.
Tools to Support Library Update
Part 2 / 6
25. (empirical aspect)
Developer Survey
Part 3:
O. Zaitsev, S. Ducasse, N. Anquetil,
and A. Thiefaine. How Libraries Evolve:
A Survey of Two Industrial Companies
and an Open-Source Community.
APSEC (industrial track), 2022.
29. 29
Three times a year or more often
Twice a year
Once a year
Less often
We do not do it regularly
Q: How often do they face the problem of library update?
Selected Findings
17
10
3
3
3
Part 3 / 6
Client
Developers
30. 30
Q: What makes library update easy? Q: What makes it hard?
Selected Findings
Part 3 / 6
Factor dev.
Documentation 15
Absence of breaking changes 11
Test coverage 6
Tool support 5
Deprecations 4
Simple breaking changes 4
Community support 3
Factor dev.
Breaking changes 11
Absent or bad documentation 10
Indirect dependencies 7
Big changes to the API 7
Poor test coverage 4
Removed functionality 3
Changed hooks or abs. classes 3
Client
Developers
31. 31
Very small impact
Small impact
Moderate impact
Big impact
Very big impact
Q: What is the impact of breaking changes
on their clients?
Selected Findings
1
0
9
6
2
Part 3 / 6
Library
Developers
32. 32
Q: Is it important for library developers to encourage
their clients to update?
Selected Findings
Not important at all
Of little importance
Of average importance
Very important
Absolutely essential
0
1
5
8
4
Part 3 / 6
Library
Developers
33. 33
Library
Developers
Client
Developers
‣ Often have to deal with the
problem of library update
‣ Need documentation and support
for breaking changes
‣ Want to help their clients
to update
Survey Conclusion
Part 3 / 6
37. 37
isSpecial
self
deprecated: ‘Renamed to #needsFullDefinition’
transformWith:
‘`@receiver isSpecial’ -> ’`@receiver needsFullDefinition’
^ self needsFullDefinition
Antecedent
(left hand side)
matches the method calls
that should be replaced
Consequent
(right hand side)
de
fi
nes the replacement
Transformation Rule
Part 4 / 6
40. 40
Pharo
41 %
59 %
Rewriting deprecations
(contain a transformation rule)
Non-rewriting deprecations
(no transformation rule)
Analysis of Deprecations in Pharo 8
(367)
Part 4 / 6
41. 41
Pharo
9 %
32 %
59 %
Rewriting deprecations
(contain a transformation rule)
Non-rewriting deprecations
(no transformation rule)
Obvious opportunity
Analysis of Deprecations in Pharo 8
(367)
Part 4 / 6
42. 42
Java
33 %
67 %
Deprecations with helpful
replacement messages
Deprecations without helpful
replacement messages
C#
22 %
78 %
JS
33 %
67 %
[Brito et al., 2018] [Brito et al., 2018] [Nascimento et al., 2020]
Replacement Messages
Part 4 / 6
43. 43
Deprewriter Summary
Part 4 / 6
✓ First documentation of Deprewriter approach
✓ Validity criteria for the transformation rules
(detected 8 invalid rules in Pharo — merged PR)
✓ Analysis and discussion of non-rewriting deprecations in Pharo 8
✓ Survey of developers who use Deprewriter
Contributions:
44. (automation aspect)
DepMiner: Inferring rules
from the commit history
Part 5:
O. Zaitsev, S. Ducasse, N. Anquetil, and
A. Thiefaine. DepMiner: Automatic
Recommendation of Transformation Rules
for Method Deprecation. ICSR, 2022.
45. 45
The Need for Automation
Deprewriter
Library
Developer
Client
System
rules update
Part 5 / 6
46. 46
The Need for Automation
Deprewriter
Library
Developer
Client
System
update
Tool
Commit
History
rules
Part 5 / 6
47. 47
Missing methods — public methods that were present in the old version
and no longer exist in the new version.
new API
old API
Step 1. Detect Breaking Changes
Part 5 / 6
48. 48
Step 1. Detect Breaking Changes
Part 5 / 6
Absence of method visibility.
De
fi
ne language-speci
fi
c heuristics.
https://github.com/olekscode/VisibilityDeductor
Challenge 1:
How we address it:
49. 49
{
Id: ef4fdd35fb05e74aa12aad4d22a37e17a8d87b5b,
Removed methods: […],
Added methods: […],
Modified methods: [
{
Old source code: …,
New source code: …,
Removed method calls: [smartDescription],
Added method calls: [description],
}],
Added classes: […],
Removed classes: […],
…
}
Line-based diffs High-level commits
Which lines of code were added or removed? Which methods, classes, or packages
were added, removed, or modified?
Q:
Q:
Step 2. Data Representation
Part 5 / 6
50. 50
Customer 1:
Customer 2:
Customer 3:
{ bread, butter, avocado }
{ bread, butter, bananas }
{ bread, butter, milk, cereal }
Customer 4: { bread, milk, cereal }
Customer 5: { butter, milk, cereal }
Transactions: Q1: What are the products that are
frequently purchased together?
Q2: What can we recommend to
people who buy bread?
(frequent itemsets)
(association rules)
Step 3. Market Basket Analysis
Part 5 / 6
51. 51
Customer 1:
Customer 2:
Customer 3:
{ bread, butter, avocado }
{ bread, butter, bananas }
{ bread, butter, milk, cereal }
Customer 4: { bread, milk, cereal }
Customer 5: { butter, milk, cereal }
Transactions: Q1: What are the products that are
frequently purchased together?
Q2: What can we recommend to
people who buy bread?
{ bread } { butter }
Con
fi
dence: 75%
{ bread, butter }
{ milk, cereal }
Support: 60%
Support: 60%
Step 3. Market Basket Analysis
Part 5 / 6
52. 52
Q1: What are the operations that frequently appear together in
method changes?
Q2: What can we recommend as a replacement for next() ?
{ next } { nextNode }
Con
fi
dence: 75%
{ remove(next), add(nextNode) }
Support: 60%
Part 5 / 6
Step 3. Market Basket Analysis
53. 53
Node >> next
self
deprecated: ‘Use #nextNode instead.’
transformWith:
‘`@receiver next’ ->
’`@receiver nextNode’.
^ self nextNode
Missing Method
Node >> next
Association Rule
{next} {nextNode}
Step 4. Generate Deprecations
Part 5 / 6
generate
54. 54
Step 4. Generate Deprecations
Part 5 / 6
Absence of static type information.
Retain only those association rules, where
methods in antecedent and consequent
of the rule are de
fi
ned in the same class.
Challenge 2:
How we address it:
58. 58
Limitations of DepMiner
Limitation 1 Limitation 2 Limitation 3
Simplified recommendations Unused / untested methods Naive search
Developer may not want
to deprecate.
It’s correct
but I will not
deprecate
Ineffective for methods that
are not called internally
test
test
test
Coveragre:
80%
v1.0 v2.0
breaking
change
effect
Search entire
commit history
Part 5 / 6
59. Overcoming Limitation 1
Deprecation with a
transformation rule
Deprecation with a
replacement message
Deprecation without a
replacement message
Library update script
with automatic rules
Documentation with
replacement message
Documentation: list of
breaking changes
Does
replacement
exist?
Is it
automatable?
yes no
no
yes
Does
developer want to
deprecate?
yes
no
Identified with the help of developers
from Pharo open-source community
59
Part 5 / 6
61. 61
1. Developer survey
to understand the
practice of library
update
Contributions
2. First documentation
of the Deprewriter
approach.
3. Study of its adoption
by the community
4. DepMiner — a novel
approach to mine
transformation rules.
5. Generalisation of
DepMiner as a
holistic approach.
Empirical Language Automation
Part 6 / 6
62. Future Work. Improve DepMiner
62
Limitation 1 Limitation 2 Limitation 3
Simplified recommendations Unused / untested methods Naive search
Consider other actions of
library developers
(based on the table)
Combine with existing
approaches to detect
refactoring.
Detect commit that
introduced BC
Search neighbour commits
Part 6 / 6
63. Future Work. Challenging Cases
63
1. Reassigning the existing name
2. Circular renaming
3. Modifying abstract hooks
4. Cleaning up spurious objects
5. String literals used as identifiers
Method-to-method mapping:
Part 6 / 6
Documented challenges:
old API new API
64. 64
2 Journal Papers
‣ N. Anquetil, J. Delplanque, S. Ducasse, O. Zaitsev, C. Fuhrman, and Y.-G. Guéhéneuc.
What Do Developers Consider Magic literals? A Smalltalk Perspective. IST, 2022.
‣ S. Ducasse, G. Polito, O. Zaitsev, M. Denker, and P. Tesone. Deprewriter: On the fly
rewriting method deprecations. JOT, 2021.
3 Conference Papers
‣ O. Zaitsev, S. Ducasse, N. Anquetil, and A. Thiefaine. How Libraries
Evolve: A Survey of Two Industrial Companies and an Open-Source
Community. APSEC (industrial track), 2022.
‣ O. Zaitsev, S. Ducasse, N. Anquetil, and A. Thiefaine. DepMiner:
Automatic Recommendation of Transformation Rules for Method
Deprecation. ICSR, 2022.
‣ O. Zaitsev, S. Ducasse, A. Bergel, and M. Eveillard. Suggesting
Descriptive Method Names: An Exploratory Study of Two Machine
Learning Approaches. QUATIC, 2020.
+ 2 Workshop Papers
& 1 technical report
2nd best paper award at IWST’22
Best poster award at GDR GPL
Publications (7 papers)
Part 6 / 6
65. ✓ Library update survey of developers.
✓ First documentation of Deprewriter
✓ Analysis of its adoption by the community.
✓ DepMiner — a novel approach to mine rules.
✓ DepMiner as holistic approach.
Summary
Contributions:
Updated
Client
System
Client
System
Library
v1.0
Library
v2.0
Client
Developer
Library
Developer
depends
Library Update:
DepMiner tool:
7
2
> 100
papers
awards
merged pull requests
68. Static & Dynamic Analysis
68
Deprewriter
Client
System
Commit
History
DepMiner
Statically
mine rules
from history
Dynamically
apply rules to
client code
69. Mining Method Call Replacements
69
Local
subset of
commits
missing
method m()
history
Method changes
that remove a
call to m()
{remove(m), add(m’), add(x)}
{remove(m), remove(n), add(m’)}
…
{remove(m), add(m’)}
{remove(m), add(m’)}
count: 15
{remove(m), remove(n), add(x)}
count: 5
{m} → {m’}
support: 15
confidence: 0.5
{m, n} → {x}
support: 5
confidence: 0.3
A-Priori
Transactions
Frequent Itemsets
Association Rules
70. 70
public static LinkedList insert(LinkedList list, int data)
{
Node new_node = new Node(data);
- new_node.setNext(null);
+ new_node.setNextNode(null);
if (list.head() == null) {
list.setHead(new_node);
}
else {
Node last = list.head;
- while (last.next() != null) {
- last = last.next();
+ while (last.nextNode() != null) {
+ last = last.nextNode();
}
last.next = new_node;
}
return list;
}
Method change —
one method modified
by one commit
Method Change
Part 5 / 6
71. 71
public static LinkedList insert(LinkedList list, int data)
{
Node new_node = new Node(data);
- new_node.setNext(null);
+ new_node.setNextNode(null);
if (list.head() == null) {
list.setHead(new_node);
}
else {
Node last = list.head;
- while (last.next() != null) {
- last = last.next();
+ while (last.nextNode() != null) {
+ last = last.nextNode();
}
last.next = new_node;
}
return list;
}
{
remove(setNext),
add(setNextNode),
remove(next),
remove(next),
add(nextNode),
add(nextNode)
}
Transaction — set of
added and removed
method calls in a
method change:
Method Change as Transaction
Part 5 / 6