1. 1/26
Factored MDPs for Optimal Prosumer
Decision-Making
Angelos Angelidakis
aggelos@intelligence.tuc.gr
Georgios Chalkiadakis
gehalk@intelligence.tuc.gr
School of Electronic and Computer Engineering
Technical University of Crete
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
2. 2/26
Outline
1 Introduction
2 Background
3 Our Model
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
6 Experiments and Results
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
3. 3/26
Prosumer
Produces and consumes energy
Single residence, an industry, a neighbourhood
Connected to the electric Grid (or not)
Key role to stabilization of the electricity network
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
4. 4/26
What we do in this paper
Focus on micro-grid prosumers:
– Encompassing, e.g., wind–turbine–generators (WTG),
photovoltaic systems (PVS), batteries and household
neighbourhoods
Optimize prosumer operation decisions:
– buy and sell energy from/to utility companies
– store energy
– select electricity tariffs to subscribe to
while ensuring consumer needs are satisfied
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
5. 5/26
Key concepts and contributions
A complete framework for microgrid–prosumer decision making:
A Factored Markov Decision Process to model the
prosumer decision problem
– 24 hours ahead
Exact optimal solution, works for a microgrid of any size
Consumption and production-predicting submodels
Test on a real–world dataset
Comparison with SPUDD
– a robust method for stochastic planning in large
environments
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
6. 6/26
Outline
1 Introduction
2 Background
3 Our Model
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
6 Experiments and Results
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
7. 7/26
Stochastic Planning Using Decision Diagrams
(SPUDD)
finds (near-)optimal policies in very large problems
combines value iteration with algebraic decision diagrams
In our problem, SPUDD:
produces policies that coincide with ours
but cannot solve the problem in the required 24-hours
– operates over an input script which can grow large
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
8. 8/26
Outline
1 Introduction
2 Background
3 Our Model
FMDPs
Factored Representation
Physical Constraints
Transition Function
Factored Reward Representation
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
6 Experiments and Results
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
9. 9/26
Factored Markov Decision Process (FMDPs)
A compact alternative to standard MDP representation
Set of states correspond to multivariate random variables,
s = si , with the si ∈ DOM(si)
Reward functions used are assumed to be factored into
specific components
FMDP allow for external signals affecting state variables
Various solution methods exist1, e.g.:
– linear value functions
– approximate linear programming
– SPUDD
1
– [Guestrin, Carlos, et al. "Efficient solution algorithms for factored MDPs." Journal of Artificial Intelligence Research
2003]
– [Hoey, Jesse, et al. "SPUDD: Stochastic planning using decision diagrams." Proceedings of the Fifteenth
conference on Uncertainty in artificial intelligence 1999]
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
10. 10/26
A Factored Representation of our model
States
Hour-of-Day, DOM(tms): {1 . . . 24}
Energy stored on batteries, DOM(bat): {0 . . . Batterymax}
Tariff prosumer has subscribed into, DOM(tf): {tf1, · · · , tfK}
Actions
buy energy, DOM(buy):{−RESnom . . . Loadmax}
charge batteries, DOM(chg):{−Batterymax . . . Batterymax}
select tariff by the prosumer, DOM(seltf):{0 . . . K}
External Signals
available price tariffs
- buying–selling prices provided by multiple utility companies,
for each hour of the day
predicted production, DOM(prod):{0 . . . RESnom}
predicted consumption, DOM(cons):{0 . . . Loadmax}
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
11. 11/26
Physical Constraints
electricity energy balance must be maintained
prodt − const − chgt + buyt = 0
storage unit cannot be charged over its capacity
chgt ≤ Batterymax − batt
energy quantity discharged cannot exceed current quantity
stored:
−chgt ≤ batt
the state of charge must be 20% to 100% 2:
0.2 ≤
batt
Batterymax
≤ 1
2
– [Chiasson, John, and Baskar Vairamohan. "Estimating the state of charge of a battery." IEEE Transactions on
Control Systems Technology 2005]
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
12. 12/26
Transition Function
stochastic state transitions in our model:
– successful charge (store c) with probability p:
Pr(batt+1 = batt + c | chgt = c, batt) = p
– unsuccessful charge (store c) with probability 1 − p:
Pr(batt+1 = bat ∈ boundbat | chgt = c, batt) = (1 − p)/N
– while tariff is affected by tariff selection action:
- seltf1
. . . seltfK
Overall transition probability:
Pr(tmst+1, batt+1, tft+1|tmst, batt, tft, chgt, seltf,t) =
Pr(batt+1|batt, chgt) · Pr(tft+1|tft, seltf,t)
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
13. 13/26
Factored Reward Representation
Our rewards correspond to costs:
Cost(st, at, st+1) = Cenergy + Cperiod + Cbl
Cenergy, cost per Wh for buying electricity
Cperiodic, periodic subscription cost of the tariff
Cbl, cost associated with battery life losses
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
15. 15/26
Cbl, costs associated with battery life losses:
Cbl = Lloss · Cinit−bat
with Cinit−bat initial investment cost for the batteries:
Lloss =
Ac
Atotal
with Ac the battery effective throughput and Atotal the total
cumulative throughput 4
4
A battery size of Q Ah will deliver an effective Atotal = 390 · QAh over its lifetime
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
16. 16/26
Ac is then expressed as:
Ac = λsocAc
where λsoc is an effective weighting factor:
λsoc = k · SOC + d
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Time
kWh
empirical datapoints (soc,λ
soc
)
fitted line λ
soc
= k soc + d state of charge of the battery:
SOC =
batt
Batterymax
actual throughput:
Ac =
chgt
Vbattery
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
17. 17/26
Outline
1 Introduction
2 Background
3 Our Model
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
6 Experiments and Results
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
18. 18/26
Solving the Factored MDP
for all instantiations of s do
set VT+1(s) = 0
end
for all time-steps t in descending
order
(i.e., with 1, · · · , T stages-to-go)
do
for all instantiations of st do
Vt(st) ← max
at
st+1
Pr(st+1 |at, st)·
R(st, at, st+1) + Vt+1(st+1)
end
end
for all instantiations of s and all
time-steps t do
π(s, t) =
arg max
a s
Pr(s |a, s) (R(s, a, s ) + Vt+1(s ))
end
Value Iteration
operating on a
finite–horizon
problem
provides the
optimal solution
for a prosumer of
any size
within the
required time
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
19. 19/26
Outline
1 Introduction
2 Background
3 Our Model
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
Production Prediction
Consumption Prediction
6 Experiments and Results
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
20. 20/26
Production Prediction
RENES: a web-based PVS and WTG production prediction
tool
employs free-of-charge weather forecasts
Developed in our lab
5
5
http://www.intelligence.tuc.gr/renes/
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
21. 21/26
Consumption prediction for real households data
Polynomial Degree MSE
1 0.022372
2 0.021312
3 0.020175
4 0.017679
5 0.016861
6 0.017329
7 0.017355
8 0.017167
9 0.017399
10 0.017611
MSE of Bayesian linear regression Φ functions
Polynomial Degree MSE
GP with polynomial kernel
(GP-poly)
0.0173
GP with Gaussian kernel
(GP-G)
0.006943
Bayesian linear Regres-
sion (BLR)
0.0169
MSE of GP & Bayesian Linear Regression
0 5 10 15 20 25
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Time
kWh
variance of trained area
(x,y)
(xtrain
,ytrain
)
(x
test
,y
test
)
GP−poly
GP−G
BLR
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
22. 22/26
Outline
1 Introduction
2 Background
3 Our Model
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
6 Experiments and Results
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
23. 23/26
Experiments and Results
30 households of New
Hampshire
20 PV modules with
nominal power 60kW per
module
2 windturbines with nominal
power 1000kW each
24 deep cycle 12Volts
batteries 212AH C20 /
FMD200 – VRLA/AGM,
with cost e269,00 each,
Battery lifetime: 10-12
years
0 5 10 15 20 25 30
0
100
200
300
400
500
600
700
800
900
RES−Load
Time
kWh
Load
RES
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
25. 25/26
Results VI–SPUDD
Both SPUDD and our method compute the same (optimal)
policies. . .
However. . .
Results
Horizon |S × A| bounded region size
Our method
(hours)
SPUDD (hours)
Script Genera-
tion
Execution
Time
Total
Time
24
664290
15 1.76 13.4992 0.184 13.6832
90 15.84 46.9188 1.19 48.1088
2624490 15 8.7603 36.98 0.73975 37.71975
48 664290 15 3.5 16.8221 0.4271 17.2492
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
26. 26/26
Wrapping–Up
A complete framework for optimal microgrid-prosumer
decision-making
Simple yet effective solution method
Tested on a real-world dataset
Vastly outperforms a known stochastic model (SPUDD) in
terms of solution computation time
In progress: test alternative methods6 and develop novel
techniques for tackling large scale problems
6
– [Munos, Remi, and Csaba Szepesvari. "Finite-time bounds for fitted value iteration." The Journal of Machine
Learning Research 2008]
– [Guestrin, Carlos, et al. "Efficient solution algorithms for factored MDPs." Journal of Artificial Intelligence Research
2003]
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
27. 26/26
Wrapping–Up
A complete framework for optimal microgrid-prosumer
decision-making
Simple yet effective solution method
Tested on a real-world dataset
Vastly outperforms a known stochastic model (SPUDD) in
terms of solution computation time
In progress: test alternative methods6 and develop novel
techniques for tackling large scale problems
Thank you, any questions?
6
– [Munos, Remi, and Csaba Szepesvari. "Finite-time bounds for fitted value iteration." The Journal of Machine
Learning Research 2008]
– [Guestrin, Carlos, et al. "Efficient solution algorithms for factored MDPs." Journal of Artificial Intelligence Research
2003]
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making