Idea

Learning to
Change Projects

Raymond Borges, Tim Menzies

Lane Department of Computer Science & Electrical Engineering
West Virginia University

PROMISE’12: Lund, Sweden
Sept 21, 2012

tim@menzies.us (LCSEE, WVU, USA) Learning toChange Projects PROMISE ’12 1 / 18

Sound bites

Less predicition, more decision
Data has shape
“Data mining” = “carving” out that shape
To reveal shape, remove irrelvancies
Cut the cr*p
Use reduction operators: dimension, column, row, rule
Show, don’t code
Once you can see shape, inference is superﬂous.
Implications for other research.


Decisions, Decisions...

Tom Zimmermann:
“We forget that the original motivation for predictive modeling was
making decisions about software project.”



Tom Zimmermann:

ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.



Tom Zimmermann:

ICSE 2012 Panel on Software Analytics
“Prediction is all well and good, but what about decision making?”.

Predictive models are useful
They focus an inquiry onto particular issues
but predictions are sub-routines of decision processes


Q: How to Build Decision Systems?
1996: T Menzies, Applications of abduction: knowledge-level modeling,
International Journal of Human Computer Studies



Score contexts e.g. Hate, Love; count frequencies of ranges in each:



Diagnosis = what went wrong.



Diagnosis = what went wrong. δ = Hate(now) − Love(past)



Monitor = what not to do.



Monitor = what not to do. δ = Hate(next) − Love(now)



Planning = what to do next.



Planning = what to do next. δ = Love(next) − Hate(now)



Planning = what to do next. δ = Love(next) − Hate(now)

δ = X − Y = contrast set = things frequent X but rare in Y
TAR3 (2003),WHICH (2010),etc

But for PROMISE effort estimation data
Contrast sets are obvious...
... Once you ﬁnd the underlying shape of the data.


Q: How to ﬁnd the underlying shape of the data?

Data mining = data carving

To ﬁnd the signal in the noise...

Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1


IDEA = Iterative Dichomization on Every Attribute

Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1



Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1

1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction



Timm’s algorithm
1 Find some cr*p
2 Throw it away
3 Go to 1

1 Dimensionality reduction
2 Column reduction
3 Row reduction
4 Rule reduction

And in the reduced data, inference is obvious.



1 Dimensionality reduction (recursive fast PCA)




Fastmap (Faloutsos’94)
W = anything
X = furthest from W
Y = furthest from X




W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)




W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a, b to X,Y then
2 2 2
X projects to a +c −b
2c




W = anything
X = furthest from W
Y = furthest from X
Takes time O(2N)
Let c = dist(X,Y)
If Z has distance a, b to X,Y then
2 2 2
X projects to a +c −b
2c

¨
Platt’05: Fastmap = Nystrom algorithm = fast & approximate PCA


2 Column reduction (info gain)



Sort columns by their diversity
Keep columns that select for fewest clusters




e.g. nine rows in two clusters
cluster c1 has acap=2,3,3,3,3; pcap=3,3,4,5,5
cluster c2 has acap=2,2,2,3; pcap=3,4,4,5



3 Row reduction (replace clusters with their mean)




Replace all leaf cluster instances with their centroid
Described only using columns within 50% of min diversity.
e.g. Nasa93 reduces to 12 columns and 13 centroids.


Nasa93 reduces to 12 columns and 13 centroids



4 Rule reduction (contrast home vs neighbors)

Surprise: after steps 1,2,3...
Further computation is superﬂuous.
Visuals sufﬁcient for contrast set generation


Manual Construction of Contrast Sets

Table5 = Your “home” cluster
Table6 = Projects of similar size
Table7 = Nearby project with fearsome effort
Contrast set = delta on last line

Why Cluster120?

Is it valid that cluter120 costs so much?
Yes, if building core services with cost amortized over N future apps.
No, if racing to get products to a competitive market
We do not know- but at least we are focused on that issue.

Reductions on PROMISE data sets
size of reduces data sets
reduced
25 data set rows columns
Albrecht 4 4
China 66 15
Cocomo81 8 18
Cocomo81e 4 16
20 Cocomo81o 4 16
Cocomo81s 2 16
Desharnais 8 19
Desharnais L1 6 10
Desharnais L2 4 10
15 Desharnais L3 2 10
columns

Finnish 6 2
Kemerer 2 7
Miyazaki’94 6 3
Nasa93 13 12
10 Nasa93 center 5 7 16
Nasa93 center1 2 15
Nasa93 center2 5 16
SDR 4 21
Telcom1 2 1
5

0
1 10 100
rows


Reductions on PROMISE data sets
size of reduces data sets
reduced
25 data set rows columns
Albrecht 4 4
China 66 15
Cocomo81 8 18
Cocomo81e 4 16
20 Cocomo81o 4 16
Cocomo81s 2 16
Desharnais 8 19
Desharnais L1 6 10
Desharnais L2 4 10
15 Desharnais L3 2 10
columns

Finnish 6 2
Kemerer 2 7
Miyazaki’94 6 3
Nasa93 13 12
10 Nasa93 center 5 7 16
Nasa93 center1 2 15
Nasa93 center2 5 16
SDR 4 21
Telcom1 2 1
5

Q: throwing away too much?
0
1 10 100
rows


Q: Throwing Away Too Much?
Estimates = class variable of nearest centroid in reduced space
Compare to 90 pre-processors*learners from Kocagueneli et al. TSE,
2011 On the Value of Ensemble Learning in Effort Estimation.
pred−actual
Performance measure = MRE = actual


pred−actual

9 pre-processors:
1 norm: normalize numerics 0..1, min..max

2 log: replace numerics of the non-class columns with
their logarithms

3 PCA: replace non-class columns with principle
components

4 SWReg: cull uninformative columns with stepwise
regression

5 Width3bin: divide numerics into 3 bins with boundaries
(max-min)/3

6 Wdith5bin: divide numerics into 5 bins with boundaries
(max-min)/5

7 Freq3bins: split numerics into 3 equal size percentiles.


9 None: no pre-processor.


pred−actual

9 pre-processors: 10 learners:
1 norm: normalize numerics 0..1, min..max 1 INN: simple one nearest neighbor

2 log: replace numerics of the non-class columns with 2 ABE0-1nn: analogy-based estimation using nearest
their logarithms neighbor.

3 PCA: replace non-class columns with principle 3 ABE0-5nn: analogy-based estimation using the median
components of the ﬁve nearest neighbors.

4 SWReg: cull uninformative columns with stepwise 4 CART(yes): regression trees, with sub-tree postpruning.
regression
5 CART(no): regression trees, no post-pruning.
5 Width3bin: divide numerics into 3 bins with boundaries
(max-min)/3 6 NNet: two-layered neural net.

6 Wdith5bin: divide numerics into 5 bins with boundaries 7 LReg: linear regression
(max-min)/5
8 PLSR: partial least squares regression.
9 PCR: principle components regression
10 SWReg: Stepwise regressions.
9 None: no pre-processor.


Results
Perennial problem with assessing different effort estimation tools.
MRE not normal: low valley, high hills (injects much variance)


Results
Perennial problem with assessing different effort estimation tools.
MRE not normal: low valley, high hills (injects much variance)

IDEA’s predictions not better or worse than others, avoids all hills

Related Work

Cluster using (a) centrality (e.g. k-means);
(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)


Related Work

Cluster using (a) centrality (e.g. k-means);
(b) connectedness (e.g. dbScan) (c) separation (e.g. IDEA)
case- feature
Who based clustering selection task
√
Shepperd (1997) predict
Boley (1998) recursive PCA predict
Bettenburg et al. (MSR’12) recurive regression predict
Posnett et al. (ASE’11) on ﬁle/package divisions predict
√
Menzies et al. (ASE’11) FastMap contrast
√ √ √
IDEA contrast


Back to the Sound bites

Less predicition, more decision
Data has shape
“Data mining” = “carving” out that shape
To reveal shape, remove irrelvancies
Cut the cr*p
IDEA = reduction operators: dimension, column, row, rule
Show, don’t code
Once you can see shape, inference is superﬂous.
Implications for other research.


Questions? Comments?


Idea

Recommended

Recommended

More Related Content

More from CS, NcState

More from CS, NcState (20)

Recently uploaded

Recently uploaded (20)

Idea