This document summarizes a presentation on using statistical methods to improve forecasts of estimated costs at completion (EAC) for programs using earned value management data. It discusses how current EAC calculations do not fully utilize past performance data and risks, and are therefore not statistically sound. The presentation will propose two solutions using predictive analytics techniques on the earned value central repository database to generate more accurate EAC forecasts that account for historical trends and risks. This would help program managers better anticipate and address future cost and schedule growth issues.
Forecasting Program Growth and EAC Using Earned Value Data
1. Thanks
for
a+ending
our
session
today.
We’re
going
to
take
a
quick
tour
through
a
touchy
subject
–
unan;cipated
grown
in
EAC.
We
all
know
about
this,
read
about
it,
and
most
likely
work
program
that
are
in
that
condi;on,
if
not
in
worse
condi;ons
,like
OTB
or
OTS.
We’ll
present
two
solu;ons
to
the
forecas;ng
of
EAC
that
address
the
core
problems
using
today’s
approach
• The
EAC
is
not
sta;s;cally
sound
• Risk
is
not
included
in
the
EAC
• Compliance
with
Technical
Performance
Measures
are
not
considered
in
the
EAC
calcula;on.
These
solu;ons
make
use
of
exis;ng
data
in
the
Earned
Value
Central
Repository
of
the
DOD,
using
tools
available
for
free.
1
2. We
all
know
of
troubled
programs.
Program
that
are
OTB.
Programs
that
are
OTS.
Programs
that
have
failed
to
deliver
their
expected
value
on
;me
and
on
budget.
The
literature
on
Nunn-‐McCurdy
has
detailed
root
causes
of
many
of
these
issues.
But
even
if
the
program
didn’t
go
Nunn
McCurdy,
the
same
Root
Causes
are
likely
in
place.
The
Earned
Value
data
from
the
program
can’t
address
the
technical
aspects
of
program
performance.
EV
data
is
a
secondary
indicator
of
technical
performance
shorQalls.
But
EV
data
can
provide
an
indicator
of
future
EAC
growth.
This
presenta;on
will
speak
to
mathema;cal
methods
of
mining
the
data
in
the
EV-‐
Central
Repository
(EV-‐CR),
in
an
a+empt
to
construct
a
sta;s;cally
sound
EAC
in
support
for
forecas;ng
future
growth.
2
3. We’ve
all
seen
these
pictures
of
unan;cipated
growth.
I
say
unan'cipated,
because
if
we
know
cost
and
schedule
growth
is
coming
we
can
doing
something
about
it.
In
the
current
approach
to
performance
analysis,
much
of
the
growth
is
unan;cipated
for
a
simple
reason:
• The
data
in
the
EV-‐CR
is
used
as
Descrip;ve
data.
That
is
it
is
analyzed
from
the
point
of
view
of
past
performance.
• Of
course
there
are
EAC
calcula;ons.
But
these
calcula;ons
use
the
EV
data
in
ways
that
wipe
out
past
variances
and
use
only
current
period
data
to
make
a
forecast
of
future
performance.
• This
is
sta;s;cally
unsound
at
best,
and
naïve
use
of
the
data
at
worse.
This
is
strong
language
but
it
is
mathema;cally
true.
Time
series
forecas;ng
has
been
around
a
long
;me.
Every
High
School
sta;s;cs
class
has
a
sec;on
on
;me
series
forecas;ng.
Every
biology,
chemistry,
physics
class
does
as
well.
Social
sciences,
marke;ng,
sales,
ecology,
sports
coaching,
nearly
every
topic
has
some
understanding
of
;me
series
forecas;ng.
But
the
EV-‐CR
and
the
reports
use
a
formula
that
is
missing
the
past
sta;s;cally
behaviors,
the
past
risks,
the
future
risk
and
the
;me-‐evolu;on
of
the
underlying
sta;s;cal
processes
driving
the
program
behaviors
3
4. With
data
held
on
the
EV-‐CR
we
have
an
opportunity
to
change
how
we
analysis
the
program’s
performance
using
sta;s;cally
sound
processes.
This
is
currently
called
BIG
DATA
in
commercial,
scien;fic,
and
mathema;cal
domains.
There
are
three
type
of
data
analysis
processes
in
the
BIG
DATA
world
1. Descrip;ve
–
which
is
what
we
do
when
we
are
looking
at
the
IPMR
2. Predic;ve
–
which
is
the
EAC
calcula;ons.
These
are
of
course
naïve
predic;ons
for
the
reasons
men;oned
before
3. Prescrip;ve
–
where
is
where
we
want
to
get
to
eventually
The
descrip;ve
and
current
Predic;ve
forecasts
also
fail
in
one
important
way.
With
the
data
they
don’t
tell
the
Program
Manager
what
to
do
about
the
upcoming
unfavorable
outcomes.
Where
to
look,
how
to
fix
them.
How
to
conduct
what
if
assessments
of
the
program,
given
the
past
performance.
In
other
words
–
nice
report,
what
do
you
expect
me
to
do
about
it?
4
5. Let’s
look
at
the
current
descrip;ve
analy;cs
we
get
from
the
EV-‐CR.
We
gets
lots
of
data.
Many
would
say
too
much
data.
But
in
the
BIG
DATA
paradigm
we
want
more
data.
The
more
data
we
have,
the
be+er
chance
we
have
of
finding
what
we’re
looking
for.
This
is
counter
intui;ve
for
the
non-‐mathema;cal
of
us.
But
it
is
in
fact
true.
This
is
the
basis
of
all
BIG
DATA
ini;a;ves,
from
Google,
to
Safeway,
to
the
science
and
medical
industry
5
6. The
Defense
Acquisi;on
University
Gold
Card
lays
out
the
formulas
for
compu;ng
the
Es;mate
At
Comple;on.
These
formulas
are:
• Linear
–
addi;on
and
,mul;plica;on
of
EV
variables
• Non-‐sta;s;cal
–
use
of
cumula;ve
values
wipes
out
the
variances
informa;on
form
past
performance
• Non-‐risk
adjusted
–
no
forward
impacts
on
performance
from
risk
are
used
• Assume
sta;onary
behaviors
–
as
the
program
moves
from
leb
to
right
the
underlying
sta;s;cal
processes
are
likely
to
change
in
their
behavior.
This
means
the
EAC
does
not
address
• Future
risks
to
performance
• The
non-‐sta;onary
behavior
of
the
underlying
sta;s;cal
processes
that
drive
variance
• The
non-‐sta;onary
behavior
of
the
risk
probability
distribu;on
func;ons
• The
coupling
between
work
elements
and
deliverables
not
visible
in
the
WBS
and
only
visible
in
the
physical
system
architecture,
usually
contained
in
the
the
CAD
system
6
7. We’ve
skipped
over
the
predic;ve
analy;cs
for
now
and
moved
to
the
prescrip;ve
analy;cs.
Prescrip;ve
is
what
we
want.
Improved
Predic;ve
we’ll
come
back
to.
Without
prescrip;ve
analy;cs,
we
may
know
what
is
going
to
happen
but
have
no
way
to
doing
anything
about
it
in
an
analysis
of
alterna;ves
way.
7
8. With
prescrip;ve
analy;cs,
we
can
assess
our
alterna;ves
for
taking
correc;ve
ac;ons.
The
milestone
picture
is
here
for
effect.
When
we
hear
about
assessing
performance
with
Milestones,
we
need
to
think
of
what
the
milestone
actually
represents
–
both
now
and
in
the
Roman
Empire.
Milestones
are
rocks
on
the
side
of
the
road
that
had
the
distance
back
to
Rome.
You
only
knew
the
distance
back
to
Rome
when
you
passed
the
milestone
and
looked
back.
We
can’t
really
manage
the
program
to
success
using
milestones,
because
we
don’t
know
we’re
late
un;l
we’ve
passed
the
milestone.
We
need
be+er
forecas;ng
of
future
performance.
We
have
the
data
on
a
per-‐period
basis
in
the
EV-‐CR.
We
need
to
use
it
to
forecast
further
performance
in
a
sta;s;cally
sound
manner.
8
9. With
the
data
submi+ed
monthly
to
the
EV-‐CR,
using
the
WBS
as
the
primary
key
–
Format
1
–
we
now
have
the
ability
to
start
doing
sta;s;cal
;me
series
forecas;ng
in
ways
not
available
in
the
past.
But
the
current
sta;c
repor;ng
processes
–
essen;ally
looking
at
the
contents
of
Format
1
with
a
viewer
–
offer
li+le
insight
in
the
sta;s;cal
nature
of
the
programs
performance.
It
is
this
sta;s;cal
behavior
that
we
need
to
know
about.
This
comes
from
a
fundamental
principal
of
all
projects
work.
The
variables
of
projects
–
cost,
schedule,
technical
performance
–
are
random
variables.
Some
can
be
controlled
some
can
not.
But
all
the
variables
are
generated
by
underlying
stochas;c
processes.
Some
of
these
processes
are
sta;onary
–
the
don’t
change
with
;me.
Some
are
non-‐sta;onary
–
they
change
with
;me.
This
informa;on
–
for
the
most
part
–
is
currently
in
the
EV-‐CR.
We
need
to
get
at
it
and
applying
our
tools
to
reveal
informa;on
we
currently
don’t
have
access
to.
9
10. This
picture
shows
what
we
all
know.
The
probability
of
being
hit
by
a
hurricane
on
the
eat
coast
of
North
American
depends
on
where
you
live.
The
probability
is
not
uniform.
The
sta;s;cal
processes
that
drive
the
crea;on
of
storms
in
the
Atlan;c
are
actually
well
understood.
But
the
modeling
of
the
mo;on
of
the
storms
requires
that
start
on
a
path
and
have
some
past
performance
before
an
es;mate
of
where
they
will
strike
land
can
be
developed.
In
the
same
way,
our
program
performance
forecas;ng
requires
we
have
some
past
performance
of
the
work
processes,
labor
u;liza;on,
technical
performance,
and
other
variables,
before
we
can
start
making
forecasts
of
where
the
project
is
going.
While
forecas;ng
hurricane's
is
a
complex
process.
Forecas;ng
where
an
indicator
like
Cost
Performance
Index,
is
going
is
rela;vely
straight
forward,
given
the
past
performance
of
the
indicator.
For
the
moment
will
make
some
simplifying
assump;ons
to
show
how
this
can
be
done,
using
the
data
we
already
have
in
the
EV-‐CR.
10
11. Let’s
remind
ourselves
again
of
what
we’re
working
with.
The
EV-‐CR
contains
reported
data
on
the
end
of
the
month.
The
CPI
and
SPI
are
calculated
from
this
data.
The
prior
months
are
cumulated
and
the
current
month
used
to
calculate
our
Es;mate
At
Comple;on.
In
fact
the
system
that
generates
these
numbers
is
a
non-‐sta;onary
stochas;c
process.
This
system
is
genera;ng
random
numbers
from
the
underlying
sta;s;cal
processes.
And
we’re
assuming
they
are
NOT
random
numbers
drawn
from
an
underlying
probability
distribu;on,
but
are
“accoun;ng”
number.
A
point
value
with
no
a+ached
variance
value.
11
12. With
the
EV-‐CR,
we
need
a
simple,
inexpensive
tool
to
start
our
sta;s;cal
assessment.
The
programming
language
is
that
solu;on.
It
provides
the
needed
sta;s;cal
analysis
tools,
including
the
ARIMA
and
PCA
func;ons.
Star;ng
with
the
;me
series
from
the
EV-‐CR,
we
can
forecast
future
values
of
CPI
and
SPI,
given
the
monthly
EV
numbers.
R
is
well
known
in
many
other
domains,
but
is
just
star;ng
in
the
DOD
community.
Lot’s
of
training
materials,
books,
working
code,
and
user
groups
through
“Meet
Up,”
in
nearly
every
major
city
around
the
world.
12
13. With
a
sample
CPI
;me
series
from
an
actual
program,
here’s
an
example
of
the
4
lines
of
R
needed
to
produce
a
forecast.
The
heavy
libing
of
this
approach
starts
with
credible
;me
series
data
from
the
program.
Then
formanng
that
data
into
a
raw
structure
usable
by
R.
From
that
point
one,
it
is
literally
as
simple
as
the
four
lines
of
R
in
this
example.
Of
course
knowledge
of
ARIMA,
and
the
details
of
senng
up
the
parameters
is
necessary.,
so
we’re
not
glossing
over
that.
In
our
paper,
we
speak
to
some
of
the
details
of
;me
series
analysis
and
the
related
Principal
Components
Analysis,
but
for
now
–
in
our
limited
;me
–
we’ll
assume
all
of
this
is
understood
and
applicable
on
your
own
database
of
program
performance
data.
13
14. Let’s
take
a
short
diversion.
There
is
a
dark
secret
of
Earned
Value.
The
units
of
measure
of
Earned
Value
Management
are
dollars.
Not
;me,
not
business
value,
not
technical
performance.
No
other
measure
than
dollars.
And
these
dollars
are
budget
dollars
not
funding
dollars.
The
second
d
secret
is
the
IPMR
reports
wipe
out
the
past
sta;s;cal
variances
of
the
variables
through
the
cumula;ve
collec;on
of
the
past.
This
actually
prevents
a
credible
forecast
of
EAC,
since
there
is
no
past
performance
at
the
detailed
level
to
base
a
forecas;ng
algorithm
on.
In
general
the
non-‐sta;s;cal
nature
of
the
current
EAC
calcula;ons,
lays
the
ground
work
for
unan;cipated
EAC
growth.
This
needs
to
be
fixed
on
both
the
contractor
side
as
well
as
the
government
side.
Since
the
data
of
all
projects
is
actually
a
non-‐
sta;onary
stochas;c
process
model
–
each
work
ac;vity
has
Aleatory
uncertainty
driving
the
actual
dura;on,
the
stochas;c
nature
of
the
program
is
always
presence.
The
next
secret
is
the
EV
reports
have
no
representa;on
of
the
correla;ve
effects
of
the
work
ac;vi;es.
We
all
know
one
late
drives
others
to
be
late,
but
the
sta;s;c
reports
have
no
way
to
represent
the
connec;vity
of
the
work
ac;vi;es
in
a
stochas;c
network
of
ac;vi;es.
The
last
secret
is
none
of
the
forecasts
consider
the
future
risks
to
the
performance
14
15. ARIMA
has
been
around
for
a
long
;me
and
applied
to
forecas;ng
;me
series
for
the
same
long
;me.
Up
to
this
point
the
data
for
EV
has
not
been
available
in
electronic
form.
With
the
EV-‐CR,
the
data
can
be
used
to
forecast
the
future
using
ARIMA.
The
XML
data
stream
provides
this.
Of
course
this
data
has
to
be
well
formed,
and
that
is
s;ll
an
a+ribute
that
needs
to
be
confirmed.
But
for
the
moment,
let’s
assume
it
is.
Them
poin;ng
the
R
tool
at
the
EV-‐CR
can
reveal
a
sta;s;cally
sound
forecast
of
EAC
at
any
level
of
detail
needed.
We
have
to
remember,
Darrell
Huff’s
How
to
Lie
With
Sta's'cs,
statement
about
hiding
the
variance
by
aggrega;ng
those
variances
to
the
top.
Analyzing
the
IPMR
at
the
lowest
level
of
the
WBS
(Format
1),
is
a
daun;ng
task.
So
let’s
hire
a
computer
to
do
that
for
us.
The
current
viewers
provide
a
view
to
the
data,
but
it
is
s;ll
more
of
the
same.
Not
predic;ve
analysis
to
show
the
drivers
of
the
variance.
And
no
prescrip;ve
analy;cs
to
show
what
to
do
about
the
variance
15
16. So
what
do
we
need?
We
need
more
power
Sco=y
It’s
that
simple
and
it’s
that
complex.
The
data
is
available,
we
need
to
access
it.
The
data
has
to
be
well
formed,
and
we
need
the
tools
to
make
use
of
the
that
data.
The
current
approach,
as
men;oned
before,
does
not
address
the
underlying
sta;s;cal
nature
of
the
programs
performance,
adjust
any
future
for
risk
or
this
past
sta;s;cal
behavior,
or
consider
any
future
sta;s;cal
behavior
in
calcula;ng
the
EAC.
As
a
quick
sidebar,
the
underlying
sta;s;cal
processes
of
the
program
change
as
the
program
moves
from
leb
to
right
in
;me.
This
creates
a
non-‐sta;onary
impact
of
the
underlying
processes.
This
non-‐sta;onary
stochas;c
process
is
further
complicated
by
the
coupling
of
these
processes
with
each
other
in
the
network
of
ac;vi;es.
16
17. Our
next
step
beyond
ARIMA
analysis
of
the
;me
series
of
CPI
and
SPI
from
the
program
is
the
use
of
Principal
Component
Analysis.
This
technology
takes
in
a
large
data
set
of
a+ributes
from
the
program
and
reveals
which
of
them
are
the
source
of
the
largest
variance
–
that
is
which
are
the
principal
components
of
this
variance.
It
is
an
exploratory
technique
which
specifies
a
linear
factor
structure
between
variables,
and
is
especially
useful
when
the
data
under
considera;on
are
correlated.
If
underlying
data
are
uncorrelated
then
PCA
has
li+le
u;lity.
In
our
case
we
make
the
assump;ons
that
all
the
variables
on
the
program
are
correlated
in
some
when,
since
the
are
physically
connected
through
the
topology
of
the
products
being
built.
And
logically
connected
through
the
network
of
ac;vi;es
in
the
Integrated
Master
Schedule.
PCA
analyzes
a
data
table
represen;ng
observa;ons
described
by
several
dependent
variables,
which
are,
in
general,
inter-‐correlated.
Its
goal
is
to
extract
the
important
informa;on
from
the
data
table
and
to
express
this
informa;on
as
a
set
of
new
orthogonal
variables
called
principal
components.
Once
the
Principal
Components
have
been
discovered
they
have
the
following
proper;es
• Each
factor
accounts
for
as
much
varia;on
in
the
underlying
data
as
possible.
• Each
factor
is
uncorrelated
with
every
other
factor.
• Principal
components
elucidate
the
dominant
combina;ons
of
variables
within
the
covariance
structure
of
the
data.
17
18. For
PCA
to
have
any
value,
we
need
to
augment
the
EV-‐CR
data
–
CPI/SPI
by
WBS
element
as
a
;me
series
–
with
other
program
data
for
the
same
WBS
elements.
This
starts
with
the
data
produced
from
the
Systems
Engineering
Management
Plan
(SEMP).
Measures
of
Effec;veness,
Measures
of
Performance,
Technical
Performance
Measures,
Key
Performance
Parameters,
Risk
re;rement
or
buy
down
;me
series,
and
other
a=ributes
of
the
program
that
are
assessed
at
the
same
;me
as
the
Earned
Value
data
is
assessed.
This
creates
a
correlated
set
of
informa;on
that
is
the
raw
data
for
the
PCA
process.
It’s
beyond
this
presenta;on
and
even
our
paper
to
delve
into
PCA,
but
the
PCA
process
is
well
developed
in
the
literature.
The
R
programming
system
has
PCA
func;ons
built
in,
and
with
the
EV-‐CR
data
sets
and
with
the
augmented
data
available
from
the
program
–
but
not
in
the
EV-‐CR
–
we
can
start
to
ask
ques;ons
about
the
principal
contributors
to
the
variance
in
the
EAC
in
ways
not
possible
with
the
EV-‐CR
data
alone,
or
the
other
assessment
data
alone.
The
goals
of
PCA
(again)
are:
1. Extract
the
most
important
informa;on
from
the
data
table;
2. Compress
the
size
of
the
data
set
by
keeping
only
this
important
informa;on;
3. Simplify
the
descrip;on
of
the
data
set;
and
4. Analyze
the
structure
of
the
observa;ons
and
the
variables.
18
19. Here’s
a
no'onal
example
of
a
PCS
process
of
two
dimensional
data.
Our
data
sets
will
have
8
or
9
dimensions.
But
the
number
of
dimensions
is
irrelevant,
in
fact
the
more
dimensions
the
be+er.
What
this
example
shows
is
the
principal
component
of
the
data
set
–
the
one
with
the
most
variance
–
and
shows
it
in
a
simple
graphical
form.
When
we
add
more
dimension,
the
result
is
no
longer
a
two
dimension
PCA,
but
a
bar
graph
showing
the
higher
dimensions
as
a
bar
graph
where
the
dimension
are
linear
lay
out.
The
result,
no
ma+er
the
number
of
dimensions,
is
a
reduc;on
of
all
the
data,
to
the
few
Principal
Components
that
represent
the
high
varia;ons
in
the
data
set.
19
20. Here’s
a
sample
list
of
the
components
in
a
program.
This
is
not
all
the
components
of
our,
but
these
are
one
were
are
familiar
with.
• CPI/SPI
are
in
the
EV-‐CR
• Technical
Performance
Measures
–
have
a
;me
series
of
values
with
a
upper
control
limit
and
lower
control
limit,
or
a
outside
the
bounds
assessment,
or
some
other
comparison
of
actual
to
plan.
• The
risk
re;rement
buy
down
assessment
is
similar
to
the
TPMs.
We
have
a
planned
re;rement
value
and
an
actual
assessment
of
the
risk
at
each
point
in
;me
to
create
a
variance
between
planned
and
actual.
• A
similar
;me
series
is
the
cost
or
schedule
margin
burn
down.
A
comparison
between
the
planned
margin
and
the
actual
margin
as
a
func;on
of
;me.
20
21. In
the
short
;me
we’ve
had,
we
covered
a
lot
of
a
ground.
Likely
a
two
semeister
university
course
on
produc;ve
analy;cs
using
Big
Data.
But
here’s
our
call
to
ac'on
for
the
next
steps.
It’s
hopefully
obvious
what
each
of
these
steps
are.
1. The
cleaning
of
the
;me
series
data
in
the
EV-‐CR
in
prepara;on
for
further
analysis.
Without
good
data,
no
algorithm
is
going
to
help
us.
Currently
the
EV-‐CR
has
an
opportunity
for
improvement.
This
is
a
normal
startup
process.
2. With
good
data
–
normalized
using
MIL-‐STD-‐881C
–
for
example,
the
ARIMA
tools
can
be
pointed
to
this
contents
and
we
can
get
outcomes
with
ease.
21