Powerful Google developer tools for immediate impact! (2023-24 C)
Idea
1. Learning to
Change Projects
Raymond Borges, Tim Menzies
Lane Department of Computer Science & Electrical Engineering
West Virginia University
PROMISE’12: Lund, Sweden
Sept 21, 2012
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18
2. Sound bites
Less predicition, more decision
Data has shape
“Data mining” = “carving” out that shape
To reveal shape, remove irrelvancies
Cut the cr*p
Use reduction operators: dimension, column, row, rule
Show, don’t code
Once you can see shape, inference is superflous.
Implications for other research.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18
3. Decisions, Decisions...
Tom Zimmermann:
“We forget that the original motivation for predictive modeling was
making decisions about software project.”
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
4. Decisions, Decisions...
Tom Zimmermann:
“We forget that the original motivation for predictive modeling was
making decisions about software project.”
ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
5. Decisions, Decisions...
Tom Zimmermann:
“We forget that the original motivation for predictive modeling was
making decisions about software project.”
ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.
Predictive models are useful
They focus an inquiry onto particular issues
but predictions are sub-routines of decision processes
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 2 / 18
6. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
7. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
8. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
9. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now) − Love(past)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
10. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now) − Love(past)
Monitor = what not to do.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
11. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now) − Love(past)
Monitor = what not to do. δ = Hate(next) − Love(now)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
12. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now) − Love(past)
Monitor = what not to do. δ = Hate(next) − Love(now)
Planning = what to do next.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
13. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now) − Love(past)
Monitor = what not to do. δ = Hate(next) − Love(now)
Planning = what to do next. δ = Love(next) − Hate(now)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
14. Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies
Score contexts e.g. Hate, Love; count frequencies of ranges in each:
Diagnosis = what went wrong. δ = Hate(now) − Love(past)
Monitor = what not to do. δ = Hate(next) − Love(now)
Planning = what to do next. δ = Love(next) − Hate(now)
δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc
But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you find the underlying shape of the data.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 3 / 18
15. Q: How to find the underlying shape of the data?
Data mining = data carving
To find the signal in the noise...
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 4 / 18
16. IDEA = Iterative Dichomization on Every Attribute
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
17. IDEA = Iterative Dichomization on Every Attribute
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
18. IDEA = Iterative Dichomization on Every Attribute
Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1
1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction
And in the reduced data, inference is obvious.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 5 / 18
19. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
20. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
21. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
22. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a, b to X,Y then
2 2 2
X projects to a +c −b
2c
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
23. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a, b to X,Y then
2 2 2
X projects to a +c −b
2c
¨
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 6 / 18
24. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
25. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
26. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
27. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
28. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
Sort columns by their diversity
Keep columns that select for fewest clusters
e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5
p(acap = 2) = 0.44 p(acap = 3) = 0.55
p(pcap = 3) = p(pcap = 4) = 0.33 p(pcap = 5) = 0.33
p(acap = 2|c1 ) = 0.25 p(acap = 2|c2 ) = 0.75
p(acap = 3|c1 ) = 0.8 p(acap = 3|c2 ) = 0.2
p(pcap = 3|c1 ) = 0.67 p(pcap = 3|c2 ) = 0.33
p(pcap = 4|c1 ) = 0.33 p(pcap = 4|c2 ) = 0.67
p(pcap = 5|c1 ) = 0.67 p(pcap = 5|c2 ) = 0.33
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 7 / 18
30. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
31. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
32. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
Replace all leaf cluster instances with their centroid
Described only using columns within 50% of min diversity.
e.g. Nasa93 reduces to 12 columns and 13 centroids.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 8 / 18
33. Nasa93 reduces to 12 columns and 13 centroids
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 9 / 18
34. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18
35. IDEA = Iterative Dichomization on Every Attribute
1 Dimensionality reduction (recursive fast PCA)
2 Column reduction (info gain)
3 Row reduction (replace clusters with their mean)
4 Rule reduction (contrast home vs neighbors)
Surprise: after steps 1,2,3...
Further computation is superfluous.
Visuals sufficient for contrast set generation
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 10 / 18
36. Manual Construction of Contrast Sets
Table5 = Your “home” cluster
Table6 = Projects of similar size
Table7 = Nearby project with fearsome effort
Contrast set = delta on last line
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 11 / 18
37. Why Cluster120?
Is it valid that cluter120 costs so much?
Yes, if building core services with cost amortized over N future apps.
No, if racing to get products to a competitive market
We do not know- but at least we are focused on that issue.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 12 / 18
39. Reductions on PROMISE data sets
size of reduces data sets
reduced
25 data set rows columns
Albrecht 4 4
China 66 15
Cocomo81 8 18
Cocomo81e 4 16
20 Cocomo81o 4 16
Cocomo81s 2 16
Desharnais 8 19
Desharnais L1 6 10
Desharnais L2 4 10
15 Desharnais L3 2 10
columns
Finnish 6 2
Kemerer 2 7
Miyazaki’94 6 3
Nasa93 13 12
10 Nasa93 center 5 7 16
Nasa93 center1 2 15
Nasa93 center2 5 16
SDR 4 21
Telcom1 2 1
5
Q: throwing away too much?
0
1 10 100
rows
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 13 / 18
40. Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,
2011 On the Value of Ensemble Learning in Effort Estimation.
pred−actual
Performance measure = MRE = actual
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
41. Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,
2011 On the Value of Ensemble Learning in Effort Estimation.
pred−actual
Performance measure = MRE = actual
9 pre-processors:
1 norm: normalize numerics 0..1, min..max
2 log: replace numerics of the non-class columns with
their logarithms
3 PCA: replace non-class columns with principle
components
4 SWReg: cull uninformative columns with stepwise
regression
5 Width3bin: divide numerics into 3 bins with boundaries
(max-min)/3
6 Wdith5bin: divide numerics into 5 bins with boundaries
(max-min)/5
7 Freq3bins: split numerics into 3 equal size percentiles.
8 Freq5bins: split numerics into 5 equal size percentiles.
9 None: no pre-processor.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
42. Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,
2011 On the Value of Ensemble Learning in Effort Estimation.
pred−actual
Performance measure = MRE = actual
9 pre-processors: 10 learners:
1 norm: normalize numerics 0..1, min..max 1 INN: simple one nearest neighbor
2 log: replace numerics of the non-class columns with 2 ABE0-1nn: analogy-based estimation using nearest
their logarithms neighbor.
3 PCA: replace non-class columns with principle 3 ABE0-5nn: analogy-based estimation using the median
components of the five nearest neighbors.
4 SWReg: cull uninformative columns with stepwise 4 CART(yes): regression trees, with sub-tree postpruning.
regression
5 CART(no): regression trees, no post-pruning.
5 Width3bin: divide numerics into 3 bins with boundaries
(max-min)/3 6 NNet: two-layered neural net.
6 Wdith5bin: divide numerics into 5 bins with boundaries 7 LReg: linear regression
(max-min)/5
8 PLSR: partial least squares regression.
7 Freq3bins: split numerics into 3 equal size percentiles.
9 PCR: principle components regression
8 Freq5bins: split numerics into 5 equal size percentiles.
10 SWReg: Stepwise regressions.
9 None: no pre-processor.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 14 / 18
43. Results
Perennial problem with assessing different effort estimation tools.
MRE not normal: low valley, high hills (injects much variance)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18
44. Results
Perennial problem with assessing different effort estimation tools.
MRE not normal: low valley, high hills (injects much variance)
IDEA’s predictions not better or worse than others, avoids all hills
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 15 / 18
45. Related Work
Cluster using (a) centrality (e.g. k-means);
(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18
46. Related Work
Cluster using (a) centrality (e.g. k-means);
(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)
case- feature
Who based clustering selection task
√
Shepperd (1997) predict
Boley (1998) recursive PCA predict
Bettenburg et al. (MSR’12) recurive regression predict
Posnett et al. (ASE’11) on file/package divisions predict
√
Menzies et al. (ASE’11) FastMap contrast
√ √ √
IDEA contrast
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 16 / 18
47. Back to the Sound bites
Less predicition, more decision
Data has shape
“Data mining” = “carving” out that shape
To reveal shape, remove irrelvancies
Cut the cr*p
IDEA = reduction operators: dimension, column, row, rule
Show, don’t code
Once you can see shape, inference is superflous.
Implications for other research.
tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 17 / 18