Abstract:- Machine learning models may be very powerful, but many data sets are only released in aggregated form, precluding their use directly. Various heuristics can be used to bridge the gap, but they are typically domain-specific. The data augmentation algorithm, a classic tool from Bayesian computation, can be applied more generally. We will present a brief review of DA and how to apply it to disaggregation problems. We will also discuss a case study on disaggregating daily pricing data, along with a reference implementation R package.
2. â Many data sets are only available in aggregated form
â Precluding use of stock statistics / ML directly.
â Data augmentation, a classic tool from Bayesian computation,
can be applied more generally.
â Disaggregating across and within observations
Executive Summary
4. A Data Set
n Price
42 2.406
33 2.283
10 2.114
10 2.815
2 1.691
1 2.033
1 2.061
1 0.133
1 0.627
5. â Like to use Price ~ LN( , 2
)
â Lognormal has nice interpretation as random walk of ± %
â Also wonât go negative
â Common Alternatives: Exponential, Gamma
â Estimate both parameters for later use
â Actually, we want to do so for 10k items
Modeling Price
6. Log-normal Recap
â If Y ~ N( , 2
), X = exp(Y) ~ LN( , 2
)
â E(X) = exp( + 2
/ 2)
â Var(X) = [exp( 2
) - 1] exp(2 + 2
)
â Standard estimators:
â MoM - uses log of mean of X
â MLE - uses mean of log X
7. Log-normal Recap
â Method of Moments
â s2
= ln(ÎŁ X2
/ N) - 2 ln (ÎŁ X / N)
â m = ln(ÎŁ X / N) - s2
/2
â Maximum Likelihood
â m = ÎŁ ln X / N
â s2
= ÎŁ (ln X - m)2
/ N
8. Estimation v0.1
What if we just ignore n, and plug in hourly averages to our estimators?
=>Gives equal weight to (n=1, $=0.133) as (n=42, $=2.406)
=> Everything biased towards the small obs
9. Estimation v0.2
What if we just plug in weighted sample averages?
â Method of Moments:
â m = 0.342, s2
= 0.996
â Expected Value: exp(.342 + .996/2) = 2.32
â Maximum Likelihood:
â m = 0.811, s2
= 0.105
â Expected Value: exp(.811 + .105/2) = 2.37
10.
11. Are these trustworthy?
To check if these make sense:
â Simulate from both estimates as ground truth
â Apply both estimators
â Inspect bias
12.
13. Why are these not working?
â Many distributions are additive
â N(0,1) + N(1,1) => N(1,3)
â Pois(4) + Pois(5) => Pois(9)
â Log Normal is not!
â So (n=42, $=2.406) is not LN, even if individual prices are
â It is in fact a marginal distribution
â contains 41 integrals :(
â What about CLT?
â Even if (n=42) is approx N, (n=10) and (n=2) are probably not
14. A Data Set
n Price
42 2.406
33 2.283
10 2.114
10 2.815
2 1.691
1 2.033
1 2.061
1 0.133
1 0.627
15. Part 1
Main Points
Violate iid at your own risk!
â Do NOT plug and chug
â Do NOT expect weights will fix your problem
â Do NOT use predictive models
â Do NOT use multi-armed bandits
Get better, unaggregated data!
⊠but if you canât ...
19. Estimation
â MCMC using stock methods, eg Metropolis-Hastings
â MH requires:
â Target Distribution / probability model
â State transition functions / proposal distributions
â MH outputs:
â Numerical samples from target distribution
20. Proposal Distribution
â Transitions on m and s2
- easy
â Transitions on missing T Prices ?
â hourly constraints on total $
â Donât want to propose out-of-bounds
â Option 1 - draw from dirichlet,
â use that to disaggregate, transition whole hours at once
â Big steps => lots of rejections
â Option 2 - pairwise transitions within group
21.
22.
23. Part 2
Main Points
Switching from aggregated to long format shows
aggregation can be thought of as a form of missing data.
However, group averages => constraints on the missing data.
In our example data, 97/101 points are missing,
but we can still get reasonable estimates via MCMC
25. A Data Set
n Price
42 2.406
33 2.283
10 2.114
10 2.815
2 1.691
1 2.033
1 2.061
1 0.133
1 0.627
26. Additional Challenges
What if aggregation is over multiple heterogeneous groups, and we need
to split the money between the groups (âdisaggregateâ)?
Do we know the split a priori?
What if we donât?
27. A Grouped Data Set
Known Groups
Desktop Mobile Price
38 4 2.406
27 6 2.283
2 8 2.114
6 4 2.815
0 2 1.691
0 1 2.033
1 0 2.061
1 0 0.133
0 1 0.627
28. Common Heuristics
â Linear disaggregation
â Weighted averages by another name
â Doesnât account for variation in other columns
â Iterative Proportional Fitting
â If you have subtotals in all dimension
â Alternates disaggregating by rows/columns
Desktop Mobile Price
38 4 2.406
27 6 2.283
2 8 2.114
6 4 2.815
0 2 1.691
0 1 2.033
1 0 2.061
1 0 0.133
0 1 0.627
30. A Grouped Data Set
Unknown Groups
n Prime Sub Price
42 ? ? 2.406
33 ? ? 2.283
10 ? ? 2.114
10 ? ? 2.815
2 ? ? 1.691
1 ? ? 2.033
1 ? ? 2.061
1 ? ? 0.133
1 ? ? 0.627
31. A Grouped Data Set
Unknown Groups
n Prime Sub Price
42 30 12 2.406
33 23 9 2.283
10 7 3 2.114
10 8 2 2.815
2 2 0 1.691
1 1 0 2.033
1 1 0 2.061
1 0 1 0.133
1 0 1 0.627
32.
33. Part 3
Main Points
By extending the previous model, we can deal with
âheterogeneous aggregatesâ.
If the grouping variable is known, solve like a regression problem.
If not known / latent, solve it like a mixture problem.
Either way, going Bayes letâs you borrow strength between aggregates,
which disaggregation heuristics are not good at.