PhD defense talk slides

Hierarchical Approximation Methods for
Option Pricing and Stochastic Reaction Networks
PhD Defense by: Chiheb Ben Hammouda
Approved by the Committee Members:
Prof. Ra´ul Tempone (Advisor)
Prof. Ahmed Kebaier (External Examiner)
Prof. Emmanuel Gobet
Prof. Diogo Gomes
Prof. Ajay Jasra
July 2, 2020
1

Publications
1 Chiheb Ben Hammouda, Alvaro Moraes, and Raúl Tempone.
“Multilevel hybrid split-step implicit tau-leap”. In: Numerical
Algorithms 74.2 (2017), pp. 527–560
2 Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Hierarchical adaptive sparse grids and quasi-Monte Carlo for
option pricing under the rough Bergomi model”. In: Quantitative
Finance (2020), pp. 1–17. doi: 10.1080/14697688.2020.1744700
3 Chiheb Ben Hammouda, Nadhir Ben Rached, and Raúl Tempone.
“Importance sampling for a robust and efficient multilevel Monte
Carlo estimator for stochastic reaction networks”. In: arXiv
preprint arXiv:1911.06286 (2019)
4 Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Numerical smoothing and hierarchical approximations for
efficient option pricing and density estimation”. In: arXiv
preprint arXiv:2003.05708 (2020)
2

Outline
1 Part I: Multilevel Monte Carlo for Stochastic Reaction Networks
Introduction
Multilevel Hybrid Split-Step Implicit Tau-Leap (TL)
(Ben Hammouda, Moraes, and Tempone 2017)
Pathwise Importance Sampling for a Robust and Eﬃcient
Multilevel Estimator (Ben Hammouda, Ben Rached, and
Tempone 2019)
2 Part II: Hierarchical Approximations and Smoothing Techniques for
Option Pricing
Introduction
Option Pricing under the Rough Bergomi Model (Bayer,
Ben Hammouda, and Tempone 2020a)
Numerical Smoothing for Eﬃcient Option Pricing and Density
Estimation (Bayer, Ben Hammouda, and Tempone 2020b)
3 Academic Outcomes of My PhD
4 Backup Slides (More Details)
3

1 Part I: Multilevel Monte Carlo for Stochastic Reaction Networks
Introduction
Multilevel Hybrid Split-Step Implicit Tau-Leap (TL)
(Ben Hammouda, Moraes, and Tempone 2017)
Pathwise Importance Sampling for a Robust and Eﬃcient
Multilevel Estimator (Ben Hammouda, Ben Rached, and
Tempone 2019)
2 Part II: Hierarchical Approximations and Smoothing Techniques for
Option Pricing
Introduction
Option Pricing under the Rough Bergomi Model (Bayer,
Ben Hammouda, and Tempone 2020a)
Numerical Smoothing for Eﬃcient Option Pricing and Density
Estimation (Bayer, Ben Hammouda, and Tempone 2020b)
3 Academic Outcomes of My PhD
4 Backup Slides (More Details)

Stochastic Reaction Networks (SRNs): Motivation
Deterministic models describe an average (macroscopic) behavior
and are only valid for large populations.
Species of small population ⇒ Considerable experimental
evidence: Dynamics dominated by stochastic effects1
.
⇒ Discrete state-space/stochastic simulation approaches more
relevant than continuous state-space/deterministic approaches
⇒ Theory of Stochastic Reaction Networks (SRNs).
Figure 1.1: Gene expression is affected
by both extrinsic and intrinsic noise
(Elowitz et al. 2002).
Figure 1.2: Gene expression can be
very noisy (Raj et al. 2006).
1
Populations of cells exhibit substantial phenotypic variation due to i) intrinsic
noise: biochemical process of gene expression and ii) extrinsic noise: fluctuations in
other cellular components. 4

SRNs Applications: Epidemic Processes (Anderson and Kurtz 2015)
and Virus Kinetics (Hensel, Rawlings, and Yin 2009),. . .
Biological Models
In-vivo population control: the
expected number of proteins.
Figure 1.3: DNA transcription and
mRNA translation (Briat, Gupta,
and Khammash 2015)
Chemical reactions
Expected number of molecules.
Sudden extinction of species
Figure 1.4: Chemical reaction
network (Briat, Gupta, and
Khammash 2015)
5

Stochastic Reaction Network (SRNs)
A SRN is a continuous-time discrete space Markov chain, X(t),
deﬁned on a probability space (Ω,F,P)2
X(t) = (X(1)
(t),...,X(d)
(t)) [0,T] × Ω → Nd
described by J reactions channels, Rj = (νj,aj), where
▸ νj ∈ Zd
: stoichiometric (state change) vector.
▸ aj Rd
+ → R+: propensity (jump intensity) function.
aj satisﬁes
Prob(X(t + ∆t) = x + νj X(t) = x) = aj(x)∆t + o(∆t), j = 1,...,J.
(1.1)
Kurtz’s random time-change representation (Ethier and Kurtz
1986)
X(t) = x0 +
J
∑
j=1
Yj (∫
t
t0
aj(X(s))ds)νj,
where Yj are independent unit-rate Poisson processes.
2
In this setting i-th component, X(i)
(t), may describe the abundance of the i-th
species present in the system at time t. 6

Typical Computational Tasks in the Context of SRNs
Estimation of the expected value of a given functional, g, of the
SRN, X, at a certain time T, i.e., E[g(X(T))].
▸ Example: The expected counting number of the i-th species, where
g(X) = X(i)
.
Estimation of the hitting times of X: the elapsed random time
that the process X takes to reach for the first time a certain subset
B of the state space, i.e., τB = inf{t ∈ R+ X(t) ∈ B}.
▸ Example: The time of the sudden extinction of one of the species.
. . .
⇒ One needs to design efficient Monte Carlo (MC) methods for those
tasks and consequently one need to sample efficiently paths of SRNs.
7

Monte Carlo (MC)
Setting: Let X be a stochastic process and g Rd
→ R, a function
of the state of the system which gives a measurement of interest.
Aim: approximate E[g(X(T))] eﬃciently, using Z∆t(T) as an
approximate path of X(T).
MC estimator: Let µM be the MC estimator of E[g(Z∆t(T))]
µM =
1
M
M
∑
m=1
g(Z∆t,[m](T)),
where Z∆t,[m] are independent paths generated via the
approximate algorithm with a step-size of ∆t.
Complexity: Given a pre-selected tolerance, TOL
MC complexity = (cost per path)
≈ T
∆t
=TOL−1
× (#paths)
=M=TOL−2
= O (TOL−3
).
8

Multilevel Monte Carlo (MLMC)
(Kebaier 2005; Giles 2008a)
Aim: Improve MC complexity, when estimating E[g(Z∆tL
(T))].
Setting:
▸ A hierarchy of nested meshes of [0,T], indexed by { }L
=0.
▸ ∆t = K−
∆t0: The time steps size for levels ≥ 1; K>1, K ∈ N.
▸ Z = Z∆t : The approximate process generated using a step size of ∆t .
MLMC idea
E[g(ZL(T))] = E[g(Z0(T))]+
L
∑
=1
E[g(Z (T)) − g(Z −1(T))] (1.2)
Var[g(Z0(T))] ≫ Var[g(Z (T)) − g(Z −1(T))] as
M0 ≫ M as
MLMC estimator: Q =
L
∑
=0
Q ,
Q0 =
1
M0
M0
∑
m0=1
g(Z0,[m0](T)); Q =
1
M
M
∑
m =1
(g(Z ,[m ](T)) − g(Z −1,[m ](T))), 1 ≤ ≤ L
MLMC Complexity (Cliﬀe et al. 2011)
O (TOL
−2−max(0, γ−β
α
)
log (TOL)2×1{β=γ}
) (1.3)
i) Weak rate: E[g (Z (T)) − g (X(T))] ≤ c12−α
ii) Strong rate: Var[g (Z (T)) − g (Z −1(T))] ≤ c22−β
iii) Work rate: W ≤ c32γ
(W : expected cost)
9

Multilevel Hybrid Split-Step Implicit Tau-Leap
for SRNs
Setting:
i) X is a SRN with an initial state X0 = x0,
ii) a scalar observable of X, g Rd
→ R,
iii) a user-selected tolerance, TOL, and
iv) a conﬁdence level 1 − αc with 0 < αc ≪ 1 (typically αc = 0.05).
Goal: Design accurate MLMC estimator Q of E[g(X(T))] such
that
P( E[g(X(T))] − Q < TOL) > 1 − αc, (1.4)
▸ For a class of systems characterized by having simultaneously
fast and slow time scales (stiﬀ systems).
▸ With near-optimal expected computational work.
10

Multilevel Hybrid SSI-TL: Contributions
For SRNs systems with the presence of slow and fast timescales
1 Novel scheme: split-step implicit tau-leap (SSI-TL) to
simulate single paths of SRNs.
▸ The explicit TL scheme (Gillespie 2001; Aparicio and Solari 2001)
suﬀers from numerical instability.
▸ SSI-TL produces values of the process in the lattice.
▸ SSI-TL is easy to couple with explicit TL in the context of MLMC.
2 Novel hybrid multilevel estimator
▸ Uses a novel implicit scheme only at the coarser levels.
▸ Starting from a certain interface level, it switches to the
explicit scheme.
3 Our proposed MLMC estimator achieves a computational
complexity of order O (TOL−2
log(TOL)2
) with a signiﬁcantly
smaller constant than MLMC explicit Tau-leap (Anderson and
Higham 2012).
11

Split Step Implicit Tau-Leap (SSI-TL) Scheme
Random time-change representation (Ethier and Kurtz 1986)
X(t) = x0 +
J
∑
j=1
Yj (∫
t
t0
aj(X(s))ds)νj,
where Yj are independent unit-rate Poisson processes
The explicit-TL (Gillespie 2001): {Pj(.)}J
j=1: independent Poisson rdvs.
Zexp
(t + ∆t) = z +
J
∑
j=1
Pj (aj(z)∆t)νj; Zexp
(t) = z ∈ N
Caveat: Stiﬀ systems: Numerical stability ⇒ ∆texp
≪ 1 ⇒ Expensive cost.
The SSI-TL scheme (Ben Hammouda, Moraes, and Tempone 2017):
z = Zimp
(t)
y = z +
J
∑
j=1
aj (y)∆tνj (Drift-Implicit step) (1.5)
Zimp
(t + ∆t) = y +
J
∑
j=1
(Pj(aj(y)∆t) − aj(y)∆t)νj
= z +
J
∑
j=1
Pj(aj(y)∆t)νj
∈Nd
(Tau-leap step)
12

Multilevel Hybrid SSI-TL: Idea
Our multilevel hybrid SSI-TL estimator
Q = QLimp
c
+
Lint
−1
∑
=Limp
c +1
Q + QLint +
L
∑
=Lint+1
Q , (1.6)
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
QLimp
c
= 1
M
i,L
imp
c
M
i,L
imp
c
∑
m=1
g(Zimp
Limp
c ,[m]
(T))
Q = 1
Mii,
Mii,
∑
m =1
(g(Zimp
,[m ]
(T)) − g(Zimp
−1,[m ]
(T))), Limp
c + 1 ≤ ≤ Lint
− 1
QLint = 1
Mie,Lint
Mie,Lint
∑
m=1
(g(Zexp
Lint,[m]
(T)) − g(Zimp
Lint−1,[m]
(T)))
Q = 1
Mee,
Mee,
∑
m =1
(g(Zexp
,[m ]
(T)) − g(Zexp
−1,[m ]
(T))), Lint
+ 1 ≤ ≤ L.
In (Ben Hammouda, Moraes, and Tempone 2017): Methodology on
how to estimate parameters:
▸ Limp
c : the coarsest discretization level;
▸ Lint
: the interface level;
▸ L: the deepest discretization level;
▸ The number of samples per level:
M = {Mi,Limp
c
,{Mii, }Lint
−1
=Limp
c +1
,Mie,Lint ,{Mee, }L
=Lint+1}.
13

Example (The decaying-dimerizing reaction (Gillespie 2001))
System of three species, S1, S2, and S3, and four reaction channels:
S1
θ1
→ 0, S1 + S1
θ2
→ S2
S2
θ3
→ S1 + S1, S2
θ4
→ S3,
with θ = (1,10,103
,10−1
), T = 0.2, X(t) = (S1(t),S2(t),S3(t)) and
X0 = (400,798,0).
The stability limit of the explicit TL is ∆texp
lim ≈ 2.3 × 10−4
.
The stoichiometric matrix and the propensity functions are
ν =
⎛
⎜
⎜
⎜
⎝
−1 0 0
−2 1 0
2 −1 0
0 −1 1
⎞
⎟
⎟
⎟
⎠
, a(X) =
⎛
⎜
⎜
⎜
⎝
θ1S1
θ2S1(S1 − 1)
θ3S2
θ4S2
⎞
⎟
⎟
⎟
⎠
We are interested in approximating E[X(3)
(T)].
14

Why Hybrid Estimator?
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
Comparison of the expected cost per sample per level for the multilevel explicit TL and SSI−TL methods
Level of discretization (l)
Expectedcostpersample
Multilevel SSI−TL
Multilevel Explicit TL
Figure 1.5: Comparison of the expected cost per sample per level for the
diﬀerent methods using 104
samples. The ﬁrst observation corresponds to the
time of a single path for the coarsest level and the other observations
correspond to the time of the coupled paths per level.
15

Multilevel Hybrid SSI-TL: Results I
Method / TOL 0.02 0.005
Expected work of multilevel explicit-TL: Wexp
MLMC 890 (9) 2.2e+04 (96)
Expected work of multilevel SSI-TL: WSSI
MLMC 24 (0.8) 5.3e+02 (7)
Ratio of expected work in %: WSSI
MLMC/Wexp
MLMC in % 2.7% 2.4%
Table 1.1: Comparison of the expected total work for the different methods (in seconds) using 100
multilevel runs. (⋅) refers to the standard deviation.
10
−3
10
−2
10
−1
10
0
10
2
10
4
10
6
10
8
Comparison of the expected total work for the different methods for different values of tolerance TOL
TOL
Expectedtotalwork(W)(seconds)
Multilevel SSI−TL (Lc
imp
=0)
Linear reference with slope −2.25
Multilevel explicit TL (Lc
exp
=11)
MC SSI−TL
y=TOL−2
(log(TOL))2
y=TOL−3
Figure 1.6: Comparison of the expected total work for the different methods with different values
of tolerance (TOL) using 100 multilevel runs.
16

Multilevel Hybrid SSI-TL: Results II
10
−3
10
−2
10
−1
10
−4
10
−3
10
−2
10
−1
TOL
Totalerror
TOL vs. Global error
1 %
1 %
2 %
5 %
Figure 1.7: TOL versus the actual computational error. The numbers above
the straight line show the percentage of runs that had errors larger than the
required tolerance. We observe that the computational error follows the
imposed tolerance with the expected conﬁdence of 95%.
17

MLMC with Importance Sampling: Motivation
Issue: Catastrophic coupling (characteristic of pure jump processes)
Prob{Y = X (T) − X −1(T) = 0}
∆t →0
= 1 − ∆t , (1.7)
X −1, X : Two coupled tau-leap approximations of the true process X based
on two consecutive grid levels ( − 1, ).
Consequences:
Large kurtosis problem (Moraes, Tempone, and Vilanova 2016).
⇒ Expensive cost for reliable/robust estimates of sample statistics in
MLMC.
▸ Why large kurtosis is bad: σS2(Y ) = Var[Y ]
√
M
√
(κ − 1) + 2
M −1
; M ≫ κ .
▸ Why accurate variance estimates are important:
M∗
∝
√
V W−1
∑
L
=0
√
V W .
Goal: Design a pathwise dependent importance sampling (IS) to improve the
robustness, and the complexity of the MLMC estimator.
Notation
σS2(Y ): Standard deviation of the sample variance of Y ;
κ : the kurtosis; V = Var[Y ]; M : number of samples;
M∗
: Optimal number of samples per level; W : Cost per sample path.
18

Contributions of our Work 3
1 Novel method that combines a pathwise dependent IS with the
MLMC estimator.
2 Our theoretical estimates and numerical analysis show that our
approach
1 Decreases dramatically the high kurtosis (due to catastrophic
coupling) observed at the deep levels of MLMC.
2 Improves the strong convergence rate from β = 1 for the standard
case (without IS), to β = 1 + δ (0 < δ < 1).
3 Improves of the complexity from O (TOL−2
log(TOL)2
) to
O (TOL−2
) (optimal complexity), without steps simulated with an
exact scheme as in (Anderson and Higham 2012; Moraes, Tempone,
and Vilanova 2016).
4 These improvements are achieved with a negligible additional cost.
3
Chiheb Ben Hammouda, Nadhir Ben Rached, and Ra´ul Tempone. “Importance
sampling for a robust and eﬃcient multilevel Monte Carlo estimator for stochastic
reaction networks”. In: arXiv preprint arXiv:1911.06286 (2019)

Standard Paths Coupling:
One Species & One Reaction Channel
Notation
▸ X −1, X : two explicit TL approximations of the process X based on levels ( − 1, );
▸ Nl−1: number of time steps at level − 1;
▸ For 0 ≤ n ≤ Nl−1 − 1
☀ {tn, tn+1}: two consecutive time-mesh points for X −1;
☀ {tn, tn + ∆t , tn+1}: three consecutive time-mesh points for X ;
☀ ∆a1
−1,n = a (X (tn)) − a (X −1(tn)) in the coupling interval [tn, tn + ∆t ];
☀ ∆a2
−1,n = a (X (tn + ∆t )) − a (X −1(tn)) in the coupling interval [tn + ∆t , tn+1];
☀ P
′
n, Q
′
n, P
′′
n , Q
′′
n: independent Poisson rdvs.
Coupling idea illustration
X (tn+1) − X −1(tn+1) = X (tn) − X −1(tn)
+ ν1 (P
′
n (∆a1
−1,n∆t )1∆a1
−1,n
>0 − P
′′
n (−∆a1
−1,n∆t )1∆a1
−1,n
<0)
coupling in [tn,tn+∆t ]
+ ν1 (Q
′
n (∆a2
−1,n∆t )1∆a2
−1,n
>0 − Q
′′
n (−∆a2
−1,n∆t )1∆a2
−1,n
<0)
coupling in [tn+∆t ,tn+1]
.
(1.8)
In the following, we denote, for 0 ≤ n ≤ N −1 − 1
⎧⎪⎪
⎨
⎪⎪⎩
∆a ,2n = ∆a1
−1,n , in [tn,tn + ∆t ].
∆a ,2n+1 = ∆a2
−1,n , in [tn + ∆t ,tn+1].
(1.9)
20

Pathwise Importance Sampling (IS) Idea 4
For 1 ≤ j ≤ J: Instead of ∆aj
,n∆t in (1.8), we propose λj
,n
∆t .
The parameter λj
,n
is determined such that our change of measure
i) the kurtosis of the MLMC estimator at the deep levels.
ii) the strong convergence rate.
For = 1,...,L and 0 ≤ n ≤ N − 1: change measure when
(i) j ∈ J1 = {1 ≤ j ≤ J; g(X + νj) ≠ g(X)} & (ii) ∆aj
,n ≠ 0
& (iii) ∆g (tn) = 0, where g = g (X )
Our analysis suggests that a sub-optimal choice of {λj
,n}j∈J1 is
λj
,n = c ∆aj
,n = ∆t−δ
∆aj
,n, 0 < δ < 1, (1.10)
δ: a scale parameter in our IS algorithm.
4
reaction networks”. In: arXiv preprint arXiv:1911.06286 (2019)

Main Results and Contributions
Our theoretical estimates and numerical experiments in (Ben Hammouda,
Ben Rached, and Tempone 2019) show
Quantity of Interest MLMC Without IS (standard case) MLMC With IS (0 < δ < 1)
κ O (∆t−1
) O (∆tδ−1
)
V O (∆t ) O (∆t1+δ
)
WorkMLMC O (TOL−2
log (TOL)
2
) O (TOL−2
)
W ,sample ≈ 2 × J × Cp × ∆t−1
≈ 2 × J × Cp × ∆t−1
Table 1.2: Main results for the comparison of MLMC combined with our IS algorithm, and
standard MLMC.
Notation
κ : the kurtosis of the coupled MLMC paths at level .
V : the variance of the coupled MLMC paths at level .
W ,sample: the average cost of simulating coupled MLMC paths at level .
TOL: a pre-selected tolerance for the MLMC estimator.
Cp: the cost of generating one Poisson rdv.
22

Example (Michaelis-Menten Enzyme Kinetics (Rao and Arkin
2003))
The catalytic conversion of a substrate, S, into a product, P, via
an enzymatic reaction involving enzyme, E. This is described by
Michaelis-Menten enzyme kinetics with three reactions
E + S
θ1
→ C, C
θ2
→ E + S
C
θ3
→ E + P,
θ = (10−3
,5.10−3
,10−2
), T = 1, X(t) = (E(t),S(t),C(t),P(t)) and
X0 = (100,100,0,0).
The stoichiometric matrix and the propensity functions
ν =
⎛
⎜
⎝
−1 −1 1 0
1 1 −1 0
1 0 −1 1
⎞
⎟
⎠
, a(X) =
⎛
⎜
⎝
θ1ES
θ2C
θ3C
⎞
⎟
⎠
The QoI is E[X(3)
(T)].

Summary of Results of Example 2
Example α β γ κL WorkMLMC
Example 2: MLMC without IS 1.02 1.03 1 1220 O (TOL−2
log (TOL)
2
)
Example 2: MLMC with IS (δ = 1/4) 1.02 1.25 1 215 O (TOL−2
)
Example 2: MLMC with IS (δ = 1/2) 1.02 1.49 1 36.5 O (TOL−2
)
Example 2: MLMC with IS (δ = 3/4) 1.03 1.75 1 5.95 O (TOL−2
)
Table 1.3: Comparison of convergence rates (α, β, γ) and the kurtosis at the deepest
levels of MLMC, κL, for the diﬀerent numerical examples with and without IS algorithm.
α,β,γ are the estimated rates of weak convergence, strong convergence and
computational work, respectively, with a number of samples M = 106
. 0 < δ < 1 is a
parameter in our IS algorithm.
24

Cost Analysis: Illustration
0 1 2 3 4 5 6 7 8 9 10 11 12
level l
10-6
10
-5
10
-4
10
-3
10-2
10
-1
Wl,sample
Without IS
With IS
2
l
Figure 1.8: Example 2: Comparison of the average cost per sample path per level (in CPU
time and estimated with 106
samples). δ = 3
4
for IS.
0 1 2 3 4 5 6 7 8 9 10
level, l
100
10
1
102
103
104
average number of IS steps
N
l
= 2l
= t
l
-1
Figure 1.9: Example 2: Average number of time steps for diﬀerent MLMC levels, with IS
(δ = 3
4
), with 105
samples.
25

Eﬀect of our IS on MLMC Complexity
10
-4
10
-3
10
-2
10
-1TOL
10
-2
10
-1
10
0
10
1
10
2
10
3
10
4
E[W]
MC + SSA
TOL
-2
standard MLMC + TL
TOL
-2
log(TOL)
2
MLMC + TL + IS
TOL
-2
Figure 1.10: Comparison of the numerical complexity of the diﬀerent methods
i) MC with exact scheme (SSA), ii) standard MLMC, and iii) MLMC
combined with importance sampling (δ = 3
4
).
26

Options and Pricing
Option: Financial security that gives the holder the right, but
not the obligation, to buy (Call) or sell (Put) a specified
quantity of a specified underlying instrument (asset) at a
specified price (K: strike) on (European) or before (Bermudan,
American) a specified date (T: maturity).
Why using options
▸ Hedging purposes: Effective hedge instrument against a declining
stock market to limit downside losses.
▸ Speculative purposes such as wagering on the direction of a
stock.
To value (price) the option is to compute the fair price of this
contract.
27

Martingale Representation and Notation
Theorem (Fair Value of Financial Derivatives)
The fair value of a financial derivative which can be exercised at time
T is given by (Harrison and Pliska 1981)
V (S,0) = e−rT
EQ[g(ST )]
where EQ is the expectation under the local martingale measure Q.
{St ∈ Rd
t ≥ 0} stochastic processes that represents the prices of
the underlying assets at time t, defined on a continuous-time
probability space (Ω,F,Q).
r is the risk-free interest rate
Payoff function g Rd
→ R. E.g., European call option,
g(ST ) = max{ST − K,0}, where K is the strike price
d is the number of assets

Some of the Challenges in Option Pricing
Issue 1: S a function of a high-dimensional random vector
▸ Case 1: Time-discretization of a stochastic differential equation
(large N (number of time steps)).
▸ Case 2: A large number of underlying assets (large d).
⇒ Curse of dimensionality5
when using deterministic
quadrature methods.
Issue 2: The payoff function g has typically low regularity ⇒
▸ Deterministic quadrature methods suffer from a slow convergence.
▸ Multilevel Monte Carlo (MLMC) suffers from
☀ High variance of coupled levels and low strong rate of convergence
⇒ Affecting badly the complexity of the MLMC estimator.
☀ High kurtosis at the deep levels ⇒ Expensive cost to get
reliable/robust estimates of sample statistics.
Solutions: In (Bayer, Ben Hammouda, and Tempone 2020a; Bayer,
Ben Hammouda, and Tempone 2020b), we develop novel methods
to effectively address these challenges.
5
Curse of dimensionality: An exponential growth of the work (number of
function evaluations) in terms of the dimension of the integration problem. 29

Sparse Grids (I)
Notation:
▸ Given F Rd
→ R and a multi-index β ∈ Nd
+.
▸ Fβ = Qm(β)
[F] a quadrature operator based on a Cartesian quadrature grid (m(βn)
points along dimension yn).
Issue: Approximating E[F] with Fβ is not an appropriate option due to the
well-known curse of dimensionality.
Alternative Idea: A quadrature estimate of E[F] is
MI [F] = ∑
β∈I
∆[Fβ],
where
▸ The first-order difference operators
∆iFβ = {
Fβ − Fβ−ei , if βi > 1
Fβ if βi = 1
, (2.1)
where ei denotes the ith d-dimensional unit vector
▸ The mixed (first-order tensor) difference operators
∆[Fβ] =
d
⊗
i=1
∆iFβ = ∑
α∈{0,1}d
(−1)∑
d
i=1 αi
Fβ−α. (2.2)
30

Sparse Grids (II)
E[F] ≈ MI [F] = ∑
β∈I
∆[Fβ],
Product approach: I = { β ∞≤ ; β ∈ Nd
+} ⇒ EQ(M) = O (M−r/d
)
(for functions with bounded total derivatives up to order r).
Regular sparse grids:
I = { β 1≤ + d − 1; β ∈ Nd
+} ⇒ EQ(M) = O (M−s
(log M)(d−1)(s+1)
)
(for functions with bounded mixed derivatives up to order s).
Adaptive sparse grids quadrature (ASGQ): I = IASGQ
(deﬁned in next slides)
⇒ EQ(M) = O (M−sw
) (for functions with bounded weighted mixed derivatives
up to order sw).
Notation: M: number of quadrature points; EQ: quadrature error.
Figure 2.1: Left are product
grids ∆β1 ⊗∆β2 for
1 ≤ β1,β2 ≤ 3. Right is the
corresponding sparse grids
construction.
31

ASGQ in Practice
E[F] ≈ MIASGQ [F] = ∑
β∈IASGQ
∆[Fβ],
The construction of IASGQ
is done by profit thresholding
IASGQ
= {β ∈ Nd
+ Pβ ≥ T}.
Profit of a hierarchical surplus Pβ =
∆Eβ
∆Wβ
.
Error contribution: ∆Eβ = MI∪{β}
− MI
.
Work contribution: ∆Wβ = Work[MI∪{β}
] − Work[MI
].
Figure 2.2: A posteriori,
adaptive construction as in
(Beck et al. 2012; Haji-Ali
et al. 2016): Given an index
set Ik, compute the profits
of the neighbor indices and
select the most profitable
one
32

ASGQ in Practice
β∈IASGQ
∆[Fβ],
IASGQ
= {β ∈ Nd
+ Pβ ≥ T}.
∆Eβ
∆Wβ
.
− MI
.
] − Work[MI
].
one
32

Wiener Path Generation Methods
{ti}N
i=0: Grid of time steps, {Bti }N
i=0: Brownian motion increments
Random Walk
▸ Proceeds incrementally, given Bti ,
Bti+1 = Bti +
√
∆tZi, Zi ∼ N(0,1).
▸ All components of Z = (Z1,...,ZN ) have the same scale of importance (in
terms of variance): isotropic.
Hierarchical Brownian Bridge
▸ Given a past value Bti and a future value Btk
, the value Btj (with
ti < tj < tk) can be generated according to (ρ = j−i
k−i
)
Btj = (1 − ρ)Bti + ρBtk
+ Zj
√
ρ(1 − ρ)(k − i)∆t, Zj ∼ N(0,1). (2.3)
▸ The most important values (capture a large part of the total variance) are
the first components of Z = (Z1,...,ZN ).
▸ the effective dimension (# important dimensions) by anisotropy
between different directions ⇒ Faster convergence of deterministic
quadrature methods. 33

Modeling Stock Price Dynamics I
Stock prices St often modeled as
dSt = µtdt + σtStdWt, (2.4)
µt: drift; Wt: Brownian motion; σt: the volatility.
Classical Black-Scholes (BS) model: σt = σ.
Caveat
Figure 2.8: SPX implied volatility surface (June 17, 2020), k = log ( K
S0
).
SPX: index based on the 500 largest traded companies in US.
Log Mone
ness (k)
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
Tim
e
to
M
aturit
(
ear)
0.0
0.2
0.4
0.6
0.8
1.0
ImpliedVolatilit
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
0.5
1.0
1.5

Local and Stochastic Volatility Models
Local volatility models: The volatility is given as a
deterministic function: σt = σ(St,t).
Stochastic volatility models: The volatility is given as a
stochastic process, described by an SDE with respect to a
Brownian motion. (Notation: vt = σ2
t )
▸ The Heston model
dSt = µStdt +
√
vtStdW1
t
dvt = κ(θ − vt)dt + ξ
√
vt (ρdW1
t +
√
1 − ρ2dW2
t ),
κ,θ,ξ > 0 are parameters and ρ ∈ [−1,1] is the correlation between
the two standard Brownian motions W1
and W2
.
35

Rough Volatility 6
6
Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. “Volatility is rough”.
In: Quantitative Finance 18.6 (2018), pp. 933–949 36

The Rough Bergomi Model 7
This model, under a pricing measure, is given by
⎧⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎩
dSt =
√
vtStdZt,
vt = ξ0(t)exp(η̃WH
t − 1
2
η2
t2H
),
Zt = ρW1
t + ¯ρW⊥
t ≡ ρW1
+
√
1 − ρ2W⊥
,
(2.5)
(W1
,W⊥
): two independent standard Brownian motions
̃WH
is Riemann-Liouville process, deﬁned by
̃WH
t = ∫
t
0
KH
(t − s)dW1
s , t ≥ 0,
KH
(t − s) =
√
2H(t − s)H−1/2
, ∀ 0 ≤ s ≤ t. (2.6)
H ∈ (0,1/2] controls the roughness of paths, ρ ∈ [−1,1] and η > 0.
t ↦ ξ0(t): forward variance curve at time 0; ξs(t) = EQ
s [vt],
0 ≤ s ≤ t (ξs(t) is martingale in s).
7
Christian Bayer, Peter Friz, and Jim Gatheral. “Pricing under rough volatility”.
In: Quantitative Finance 16.6 (2016), pp. 887–904 37

Model Challenges
Numerically:
▸ The model is non-Markovian and non-affine ⇒ Standard numerical
methods (PDEs, characteristic functions) seem inapplicable.
▸ The only prevalent pricing method for mere vanilla options is Monte
Carlo (MC) (Bayer, Friz, and Gatheral 2016; McCrickerd and
Pakkanen 2018) still computationally expensive task.
▸ Discretization methods have a poor behavior of the strong error
(strong convergence rate of order H ∈ [0,1/2]) (Neuenkirch and
Shalaiko 2016) ⇒ Variance reduction methods, such as multilevel
Monte Carlo (MLMC), are inefficient for very small values of H.
Theoretically:
▸ No proper weak error analysis done in the rough volatility context,
due to
☀ Non Markovianity ⇒ infinite dimensional state.
☀ Singularity in the Kernel, KH
(.), in (2.6).
38

Methodology 8
We design fast hierarchical pricing methods, for options whose
underlyings follow the rough Bergomi dynamics, based on
1 Analytic smoothing to uncover available regularity.
2 Approximating the resulting integral of the smoothed payoff using
deterministic quadrature methods
▸ Adaptive sparse grids quadrature (ASGQ).
▸ Quasi Monte Carlo (QMC).
3 Combining our methods with hierarchical representations
▸ Brownian bridges as a Wiener path generation method ⇒ the
effective dimension of the problem.
▸ Richardson Extrapolation (Condition: weak error expansion in
∆t) ⇒ Faster convergence of the weak error ⇒ number of time
steps (smaller input dimension).
8
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Hierarchical
adaptive sparse grids and quasi-Monte Carlo for option pricing under the rough
Bergomi model”. In: Quantitative Finance (2020), pp. 1–17. doi:
10.1080/14697688.2020.1744700 39

Contributions 9
1 First to design a fast pricing method based on deterministic
integration methods, for options under rough volatility models.
2 Our proposed methods outperform significantly the MC approach,
which is the only prevalent method in this context.
3 An original way to overcome the high dimensionality of the
integration domain, by
▸ Reducing the total input dimension using Richardson extrapolation.
▸ Combining the Brownian bridge construction with ASGQ or QMC.
4 Our methodology shows a robust performance with respect to the
values of the Hurst parameter H.
5 First to show numerically that both hybrid and exact schemes
have a weak error of order one, in the practical ranges of
discretization, and for a certain class of payoff functions.
9
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Hierarchical
10.1080/14697688.2020.1744700 40

Conditional Expectation for Analytic Smoothing
CrB (T,K) = E [(ST − K)
+
]
= E[E[(ST − K)+
σ(W1
(t),t ≤ T)]]
= E [CBS (S0 = exp(ρ∫
T
0
√
vtdW1
t −
1
2
ρ2
∫
T
0
vtdt),k = K, σ2
= (1 − ρ2
)∫
T
0
vtdt)]
≈ ∫
R2N
CBS (GrB(w(1)
,w(2)
))ρN (w(1)
)ρN (w(2)
)dw(1)
dw(2)
= CN
rB. (2.7)
Idea: Approximate eﬃciently the resulting integral (2.7) using ASGQ and QMC,
combined with hierarchical representations (Brownian bridges and Richardson
Extrapolation).
Notation:
CBS(S0,k,σ2
): the Black-Scholes call price, for initial spot price S0, strike price k,
and volatility σ2
;
GrB maps 2N independent standard Gaussian random inputs to the parameters fed
to Black-Scholes formula;
ρN : the multivariate Gaussian density;
N: number of time steps.
41

Numerical Experiments
Table 2.1: Reference solution (using MC with 500 time steps and number of samples,
M = 8 × 106
) of call option price under the rough Bergomi model, for different
parameter constellations. The numbers between parentheses correspond to the
statistical errors estimates.
Parameters Reference solution
H = 0.07,K = 1,S0 = 1,T = 1,ρ = −0.9,η = 1.9,ξ0 = 0.0552 0.0791
(5.6e−05)
H = 0.02,K = 1,S0 = 1,T = 1,ρ = −0.7,η = 0.4,ξ0 = 0.1 0.1246
(9.0e−05)
H = 0.02,K = 0.8,S0 = 1,T = 1,ρ = −0.7,η = 0.4,ξ0 = 0.1 0.2412
(5.4e−05)
H = 0.02,K = 1.2,S0 = 1,T = 1,ρ = −0.7,η = 0.4,ξ0 = 0.1 0.0570
(8.0e−05)
Set 1 is the closest to the empirical findings (Gatheral, Jaisson, and
Rosenbaum 2018), suggesting that H ≈ 0.1. The choice ν = 1.9 and
ρ = −0.9 is justified by (Bayer, Friz, and Gatheral 2016).
For the remaining three sets, we test the potential of our method for a
very rough case, where variance reduction methods are inefficient.
42

Relative Errors and Computational Gains
Table 2.2: Computational gains achieved by ASGQ and QMC over the MC
method to meet a certain error tolerance. The ratios (ASGQ/MC) and
(QMC/MC) are referred to CPU time ratios in %, and are computed for the
best conﬁguration with Richardson extrapolation for each method.
Parameters Relative CPU time ratio CPU time ratio
Set error (ASGQ/MC) in % (QMC/MC) in %
Set 1 1% 7% 10%
Set 2 0.2% 5% 1%
Set 3 0.4% 4% 5%
Set 4 2% 20% 10%
43

Computational Work of the Different Methods
with their Best Configurations
Figure 2.9: Computational work comparison of the different methods with the
best configurations, for the case of parameter set 1 in Table 2.1.
10−2 10−1 100
Relative Error
10−3
10−2
10−1
100
101
102
103
104
CPUtime
MC+Rich(level 1)
slo e= -2.51
QMC+Rich(level 1)
slope= -1.57
ASGQ+Rich(level 2)
slope= -1.27

Framework in
(Bayer, Ben Hammouda, and Tempone 2020b)
Approximate efficiently E[g(X(t))]
The payoff g Rd
→ R has either jumps or kinks. Given φ Rd
→ R
▸ Hockey-stick functions: g(x) = max(φ(x),0) (put or call payoffs).
▸ Indicator functions: g(x) = 1(φ(x)≥0) (digital option, distribution functions, . . . ).
▸ Dirac Delta functions: g(x) = δ(φ(x)=0) (density estimation, financial Greeks).
∂φ
∂xj
(x) > 0, ∀x ∈ Rd
(Monotonicity condition)
lim
xj →+∞
φ(x) = lim
xj →+∞
φ(xj,x−j) = +∞, ∀x−j ∈ Rd−1
or
∂2
φ
∂x2
j
(x) ≥ 0, ∀x ∈ Rd
(Growth condition).
Notation: x−j denotes the vector of length d − 1 denoting all the variables other than xj in
x.
The process X is approximated (via a discretization scheme) by X
▸ One/multi-dimensional geometric Brownian motion (GBM) process.
▸ Multi-dimensional stochastic volatility model: the Heston model
dXt = µXtdt +
√
vtXtdWX
t
dvt = κ(θ − vt)dt + ξ
√
vtdWv
t ,
(WX
t ,Wv
t ): correlated Wiener processes with correlation ρ.

Our Methodology for Addressing
Option Pricing Challenges10
We design efficient hierarchical pricing and density estimation methods
1 Numerical smoothing: to uncover the available regularity
▸ Root finding for determining the discontinuity location.
▸ Pre-Integration (Conditional expectation): one dimensional
integration with respect to a single well chosen variable.
2 Approximating the resulting integral of the smoothed payoff
▸ Adaptive sparse grids quadrature (ASGQ) combined
hierarchical representations to overcome the high dimensionality
☀ Brownian bridges ⇒ the effective dimension of the problem.
☀ Richardson extrapolation ⇒ smaller total dimension of the
input space.
▸ MLMC estimator combined with numerical smoothing.
10
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Numerical
smoothing and hierarchical approximations for efficient option pricing and density
estimation”. In: arXiv preprint arXiv:2003.05708 (2020)

Contributions in the Context of
Deterministic Quadrature Methods
In (Bayer, Ben Hammouda, and Tempone 2020b)
1 We consider cases where we can not perform an analytic smoothing
(Bayer, Siebenmorgen, and Tempone 2018; Xiao and Wang 2018;
Bayer, Ben Hammouda, and Tempone 2020a) and introduce a novel
numerical smoothing technique, combined with hierarchical
quadrature methods.
2 Our novel approach outperforms substantially the MC method for high
dimensional cases and for dynamics where discretization is needed.
3 A smoothness analysis for the smoothed integrand in the time
stepping setting.
4 We show that traditional schemes of the Heston dynamics have low
regularity, and propose a smooth scheme to simulate the volatility
process based on the sum of Ornstein-Uhlenbeck (OU) processes.

Contributions in the Context of MLMC Methods 11
1 Compared to (Giles 2008b; Giles, Debrabant, and Rößler 2013),
our approach can be applied when analytic smoothing is not
possible.
2 Compared to the case without smoothing
▸ We significantly reduce the kurtosis at the deep levels of MLMC
▸ We improve the strong convergence rate ⇒ improvement of MLMC
complexity from O (TOL−2.5
) to O (TOL−2
log(TOL)2
)
without the need to use higher order schemes such as Milstein
scheme as in (Giles 2008b; Giles, Debrabant, and Rößler 2013)
3 Contrary to (Giles, Nagapetyan, and Ritter 2015), our numerical
smoothing approach
▸ Does not deteriorate the strong convergence behavior.
▸ Easier to apply for any dynamics and QoI (no prior knowledge of
the degree of smoothness of the integrand).
▸ When estimating densities: our pointwise error does not increase
exponentially with respect to the dimension of state vector.
11
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Numerical
smoothing and hierarchical approximations for efficient option pricing and density
estimation”. In: arXiv preprint arXiv:2003.05708 (2020)

Numerical Smoothing and Pre-Integration Steps
E[g(X(T))] ≈ E[g(X
∆t
(T))] = ∫
Rd×N
G(z)ρd×N (z)dz
(1)
1 ...dz
(1)
N ...dz
(d)
1 ...dz
(d)
N
= ∫
RdN−1
I(y−1,z
(1)
−1 ,...,z
(d)
−1 )ρd−1(y−1)dy−1ρdN−d(z
(1)
−1 ,...,z
(d)
−1 )dz
(1)
−1 ...dz
(d)
−1
= E[I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] ≈ E[¯I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )], (2.8)
I(y−1,z
(1)
−1 ,...,z
(d)
−1 ) = ∫
R
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρy1 (y1)dy1
= ∫
y∗
1
−∞
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρy1 (y1)dy1 + ∫
+∞
y∗
1
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρy1 (y1)dy1
≈ ¯I(y−1,z
(1)
−1 ,...,z
(d)
−1 ) =
Nq
∑
k=0
ηkG(ζk (¯y∗
1 ),y−1,z
(1)
−1 ,...,z
(d)
−1 ), (2.9)
Notation
G maps N × d Gaussian random inputs to g(X
∆t
(T));
Y = AZ1 with Z1 = (Z
(1)
1 ,...,Z
(d)
1 ) the most important directions; A: a problem
dependent rotation matrix;
y∗
1 : the exact discontinuity location; ¯y∗
1 : the approximated discontinuity location via root
ﬁnding;
Nq: the number of Laguerre quadrature points ζk ∈ R, and corresponding weights ηk;
ρd×N (z) = 1
(2π)d×N/2 e−1
2
zT z
.
48

Numerical Smoothing and Pre-Integration Steps
E[g(X(T))] ≈ E[g(X
∆t
(T))] = ∫
Rd×N
G(z)ρd×N (z)dz
(1)
1 ...dz
(1)
N ...dz
(d)
1 ...dz
(d)
N
= ∫
RdN−1
I(y−1,z
(1)
−1 ,...,z
(d)
−1 )ρd−1(y−1)dy−1ρdN−d(z
(1)
−1 ,...,z
(d)
−1 )dz
(1)
−1 ...dz
(d)
−1
= E[I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] ≈ E[¯I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )], (2.10)
In (Bayer, Ben Hammouda, and Tempone 2020b) we show
I and ¯I are highly smooth functions.
Our methodology on how to choose the transformation A and how to approximate ¯I.
How ASGQ and MLMC methods eﬃciently approximate the expectation in (2.10).
Notation
G maps N × d Gaussian random inputs to g(X
∆t
(T));
Y = AZ1 with Z1 = (Z
(1)
1 ,...,Z
(d)
1 ) the most important directions; A: a problem
dependent rotation matrix;
y∗
1 : the exact discontinuity location; ¯y∗
1 : the approximated discontinuity location via root
ﬁnding;
Nq: the number of Laguerre quadrature points ζk ∈ R, and corresponding weights ηk;
ρd×N (z) = 1
(2π)d×N/2 e−1
2
zT z
.
49

Density Estimation
Goal: Approximate the density ρX at u, for a stochastic process X
ρX(u) = E[δ(X − u)], δ is the Dirac delta function.
Without any smoothing techniques (regularization, kernel
density,. . . ) MC and MLMC fail due to the inﬁnite variance caused by
the singularity of the distribution δ.
Strategy: in (Bayer, Ben Hammouda, and Tempone 2020b)
1 Exact conditioning with respect to the Brownian bridge
ρX(u) =
1
√
2π
E[exp(−(y∗
1 (u))
2
/2)
dy∗
1
dx
(u) ]
≈
1
√
2π
E[exp(−(¯y∗
1 (u))
2
/2)
d¯y∗
1
dx
(u) ], (2.11)
y∗
1 : the exact discontinuity; ¯y∗
1 : the approximated discontinuity.
2 We use MLMC method to eﬃciently approximate (2.11).
Kernel density techniques or parametric regularization as in (Giles,
Nagapetyan, and Ritter 2015) ⇒ a pointwise error that increases
exponentially with respect to the dimension of the state vector X.
50

Error Discussion for ASGQ
QN : the ASGQ estimator
E[g(X(T)] − QN = E[g(X(T))] − E[g(X
∆t
(T))]
Error I: bias or weak error = O(∆t)
+ E[I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[¯I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
Error II: numerical smoothing error = O(N−s
q )+O(TOLκ+1
Newton
)
+ E[¯I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − QN
Error III: ASGQ error = O(N−p
ASGQ
)
, (2.12)
Etotal, ASGQ = E[g(X(T)] − QN = O (∆t) + O (N−p
ASGQ) + O (N−s
q ) + O (TOLκ+1
Newton).
Under certain conditions for the regularity parameters s and p (p,s ≫ 1), we show
▸ The ASGQ: WorkASGQ = O (TOL−1
) (for the best case).
▸ The MC method: WorkMC = O (TOL−3
Notation
p = p(N,d) > 0: mixed derivatives of ¯I in (2.9), in the dN − 1 dimensional space, are bounded up to p.
s > 0: Derivatives of G, in (2.9), with respect to y1 are bounded up to order s.
NASGQ: the number of quadrature points used by the ASGQ estimator
Nq is the number of points used by the Laguerre quadrature for the one dimensional pre-integration
step.
TOLNewton: tolerance of the Newton method.
κ ≥ 0 (κ = 0: heavy-side payoﬀ (digital option), and κ = 1: call or put payoﬀs).
51

Numerical Results for ASGQ
Example Total relative CPU time ratio
Error (ASGQ/MC) in %
Single digital option (GBM) 0.7% 0.7%
Single call option (GBM) 0.5% 0.8%
4d-Basket call option (GBM) 0.8% 7.4%
Single digital option (Heston) 0.6% 6.2%
Single call option (Heston) 0.5% 17.2%
Table 2.3: Summary of relative errors and computational gains, achieved by
the diﬀerent methods. In this table, we highlight the computational gains
achieved by ASGQ over MC method to meet a certain error tolerance. We
note that the ratios are computed for the best conﬁguration with Richardson
extrapolation for each method.
52

Numerical Results for MLMC
Method κL α β γ Numerical Complexity
Without smoothing + digital under GBM 709 1 1/2 1 O (TOL−2.5
)
With numerical smoothing + digital under GBM 3 1 1 1 O (TOL−2
(log(TOL))
2
)
Without smoothing + digital under Heston 245 1 1/2 1 O (TOL−2.5
)
With numerical smoothing + digital under Heston 7 1 1 1 O (TOL−2
log(TOL)2
)
With numerical smoothing + GBM density 5 1 1 1 O (TOL−2
(log(TOL))
2
)
With numerical smoothing + Heston density 8 1 1 1 O (TOL−2
(log(TOL))
2
)
Table 2.4: Summary of the MLMC numerical results observed diﬀerent examples. κL is the kurtosis at
the deepest levels of MLMC, (α,β,γ) are weak, strong and work rates respectively. TOL is the
user-selected MLMC tolerance.
53

Digital Option under the Heston Model:
Without Smoothing
0 1 2 3 4 5 6
-8
-6
-4
-2
0 1 2 3 4 5 6
-10
-5
0
0 1 2 3 4 5 6
2
4
6
0 1 2 3 4 5 6
50
100
150
200
kurtosis
Figure 2.10: Digital option under Heston: Convergence plots of MLMC
without smoothing, combined with the ﬁxed truncation scheme.
54

Digital Option under the Heston Model
With Numerical Smoothing
0 1 2 3 4 5 6 7
-15
-10
-5
0 1 2 3 4 5 6 7
-10
-5
0
0 1 2 3 4 5 6 7
2
4
6
8
0 1 2 3 4 5 6 7
8
9
10
11
12
kurtosis
Figure 2.11: Digital option under Heston: Convergence plots for MLMC with
numerical smoothing, combined with the Heston OU based scheme.
55

Digital Option under the Heston Model:
Numerical Complexity Comparison
10-4
10-3
10-2
10-1TOL
1e-04
1e-02
10
2
10
3
E[W]
MLMC without smoothing
TOL
-2.5
MLMC+ Numerical smoothing
TOL-2
log(TOL)2
Figure 2.12: Digital option under Heston: Comparison of the numerical
complexity of i) standard MLMC (based on ﬁxed truncation scheme), and ii)
MLMC with numerical smoothing (based on Heston OU based scheme).
56

Academic Outcomes of My PhD
Publications:
1 Chiheb Ben Hammouda, Alvaro Moraes, and Raúl Tempone. “Multilevel hybrid
split-step implicit tau-leap”. In: Numerical Algorithms 74.2 (2017), pp. 527–560
2 Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Hierarchical
10.1080/14697688.2020.1744700
3 Chiheb Ben Hammouda, Nadhir Ben Rached, and Raúl Tempone. “Importance
sampling for a robust and efficient multilevel Monte Carlo estimator for
stochastic reaction networks”. In: arXiv preprint arXiv:1911.06286 (2019)
4 Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Numerical
smoothing and hierarchical approximations for efficient option pricing and
density estimation”. In: arXiv preprint arXiv:2003.05708 (2020)
Prize: Best poster award of SIAM Conference on Financial Mathematics
(FM19), Toronto, Canada, 2019.
Co-supervised two Bachelor and one Master theses:
1 Title 1: Statistical and Numerical Analysis of Rough Volatility Models
2 Title 2: Efficient Option Pricing Using Fourier Techniques.
3 Title 3: Importance Sampling for Pure Jump Processes.
57

Thank you for your attention
57

References I
David F Anderson. “A modified next reaction method for simulating chemical
systems with time dependent propensities and delays”. In: The Journal of chemical
physics 127.21 (2007), p. 214107.
David F. Anderson and Desmond J. Higham. “Multilevel Monte Carlo for continuous
Markov chains, with applications in biochemical kinetics”. In: SIAM Multiscal
Model. Simul. 10.1 (2012).
David F Anderson and Thomas G Kurtz. Stochastic analysis of biochemical systems.
Vol. 1. Springer, 2015.
Juan P Aparicio and Hernán G Solari. “Population dynamics: Poisson approximation
and its relation to the Langevin process”. In: Physical Review Letters 86.18 (2001),
p. 4183.
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Hierarchical adaptive
sparse grids and quasi-Monte Carlo for option pricing under the rough Bergomi
model”. In: Quantitative Finance (2020), pp. 1–17. doi:
10.1080/14697688.2020.1744700.
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone. “Numerical smoothing
and hierarchical approximations for efficient option pricing and density estimation”.
In: arXiv preprint arXiv:2003.05708 (2020).
58

References II
Christian Bayer, Peter Friz, and Jim Gatheral. “Pricing under rough volatility”. In:
Quantitative Finance 16.6 (2016), pp. 887–904.
Christian Bayer, Markus Siebenmorgen, and Rául Tempone. “Smoothing the payoff
for efficient computation of basket option pricing.” In: Quantitative Finance 18.3
(2018), pp. 491–505.
Joakim Beck et al. “On the optimal polynomial approximation of stochastic PDEs by
Galerkin and collocation methods”. In: Mathematical Models and Methods in
Applied Sciences 22.09 (2012), p. 1250023.
reaction networks”. In: arXiv preprint arXiv:1911.06286 (2019).
Chiheb Ben Hammouda, Alvaro Moraes, and Raúl Tempone. “Multilevel hybrid
split-step implicit tau-leap”. In: Numerical Algorithms 74.2 (2017), pp. 527–560.
Mikkel Bennedsen, Asger Lunde, and Mikko S Pakkanen. “Hybrid scheme for
Brownian semistationary processes”. In: Finance and Stochastics 21.4 (2017),
pp. 931–965.
59

References III
Corentin Briat, Ankit Gupta, and Mustafa Khammash. “A Control Theory for
Stochastic Biomolecular Regulation”. In: SIAM Conference on Control Theory and
its Applications. SIAM. 2015.
K Andrew Cliﬀe et al. “Multilevel Monte Carlo methods and applications to elliptic
PDEs with random coeﬃcients”. In: Computing and Visualization in Science 14.1
(2011), p. 3.
Nathan Collier et al. “A continuation multilevel Monte Carlo algorithm”. In: BIT
Numerical Mathematics 55.2 (2014), pp. 399–432.
Michael B Elowitz et al. “Stochastic gene expression in a single cell”. In: Science
297.5584 (2002), pp. 1183–1186.
Stewart N. Ethier and Thomas G. Kurtz. Markov processes : characterization and
convergence. Wiley series in probability and mathematical statistics. New York,
Chichester: J. Wiley & Sons, 1986.
Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. “Volatility is rough”. In:
Quantitative Finance 18.6 (2018), pp. 933–949.
60

References IV
Michael Giles, Kristian Debrabant, and Andreas R¨oßler. “Numerical analysis of
multilevel Monte Carlo path simulation using the Milstein discretisation”. In: arXiv
preprint arXiv:1302.4676 (2013).
Michael B Giles. “Multilevel Monte Carlo methods”. In: Acta Numerica 24 (2015),
pp. 259–328.
Michael B Giles. “Multilevel Monte Carlo path simulation”. In: Operations Research
56.3 (2008), pp. 607–617.
Michael B Giles, Tigran Nagapetyan, and Klaus Ritter. “Multilevel Monte Carlo
approximation of distribution functions and densities”. In: SIAM/ASA Journal on
Uncertainty Quantiﬁcation 3.1 (2015), pp. 267–295.
Mike Giles. “Improved multilevel Monte Carlo convergence using the Milstein
scheme”. In: Monte Carlo and Quasi-Monte Carlo Methods 2006. Springer, 2008,
pp. 343–358.
Daniel T Gillespie. “A general method for numerically simulating the stochastic time
evolution of coupled chemical reactions”. In: Journal of computational physics 22.4
(1976), pp. 403–434.
61

References V
Daniel T Gillespie. “Approximate accelerated stochastic simulation of chemically
reacting systems”. In: The Journal of Chemical Physics 115.4 (2001), pp. 1716–1733.
Abdul-Lateef Haji-Ali et al. “Multi-index stochastic collocation for random PDEs”.
In: Computer Methods in Applied Mechanics and Engineering 306 (2016), pp. 95–122.
J Michael Harrison and Stanley R Pliska. “Martingales and stochastic integrals in
the theory of continuous trading”. In: Stochastic processes and their applications
11.3 (1981), pp. 215–260.
Sebastian C Hensel, James B Rawlings, and John Yin. “Stochastic kinetic modeling
of vesicular stomatitis virus intracellular growth”. In: Bulletin of mathematical
biology 71.7 (2009), pp. 1671–1692.
Ahmed Kebaier et al. “Statistical Romberg extrapolation: a new variance reduction
method and applications to option pricing”. In: The Annals of Applied Probability
15.4 (2005), pp. 2681–2705.
Thomas G. Kurtz. “Representation and approximation of counting processes”. In:
Advances in Filtering and Optimal Stochastic Control. Vol. 42. Lecture Notes in
Control and Information Sciences. Springer Berlin Heidelberg, 1982, pp. 177–191.
62

References VI
Ryan McCrickerd and Mikko S Pakkanen. “Turbocharging Monte Carlo pricing for
the rough Bergomi model”. In: Quantitative Finance (2018), pp. 1–10.
Alvaro Moraes, Raul Tempone, and Pedro Vilanova. “Multilevel hybrid Chernoﬀ
tau-leap”. In: BIT Numerical Mathematics 56.1 (2016), pp. 189–239.
Andreas Neuenkirch and Taras Shalaiko. “The order barrier for strong approximation
of rough volatility models”. In: arXiv preprint arXiv:1606.03854 (2016).
Dirk Nuyens. The construction of good lattice rules and polynomial lattice rules.
2014.
Arjun Raj et al. “Stochastic mRNA synthesis in mammalian cells”. In: PLoS biology
4.10 (2006), e309.
Christopher V Rao and Adam P Arkin. “Stochastic chemical kinetics and the
quasi-steady-state assumption: Application to the Gillespie algorithm”. In: The
Journal of chemical physics 118.11 (2003), pp. 4999–5010.
Ian H Sloan. “Lattice methods for multiple integration”. In: Journal of
Computational and Applied Mathematics 12 (1985), pp. 131–143.
63

References VII
Ye Xiao and Xiaoqun Wang. “Conditional quasi-Monte Carlo methods and
dimension reduction for option pricing and hedging with discontinuous functions”.
In: Journal of Computational and Applied Mathematics 343 (2018), pp. 289–308.
64

Deterministic Vs Stochastic
Figure 5.1: DNA transcription and
mRNA translation (Briat, Gupta,
and Khammash 2015)
Deterministic model
d[mRNA]
dt
= −γr[mRNA] + kr
d[protein]
dt
= −γp[protein] + kp[mRNA]
Stochastic model
Probability a single mRNA is
transcribed in time dt is krdt.
Probability a single mRNA is
degraded in time dt is
(#mRNA)γrdt.
65

Why not Chemical Master Equation (CME)?
Notation: p(x,t) = Prob(x,t x0,t0)
CME:
∂p(x,t)
∂t
=
J
∑
j=1
(aj(x − νj)p(x − νj,t) − aj(x)p(x,t))
=Ap(x,t)
, ∀x ∈ Nd
p(x,0) = p0(x)
Numerical approximations computed on
Ωξ = {x ∈ Nd
,x1 < ξ1,...,xd < ξd} ⊂ Nd
.
Caveat: Curse of dimensionality: A ∈ RN×N
with
N = ∏d
i=1 ξi very large.
66

Simulation of SRNs
Pathwise-exact methods (Exact statistical distribution of the
SRN process)
▸ Stochastic simulation algorithm (SSA) (Gillespie 1976).
▸ Modiﬁed next reaction algorithm (MNRA) (Anderson 2007).
Caveat: Computationally expensive.
Pathwise-approximate methods
▸ Explicit tau-leap (explicit-TL) (Gillespie 2001; Aparicio and Solari
2001).
Caveat: The explicit-TL is not adequate when dealing with stiﬀ
systems (systems characterized by having simultaneously fast and
slow timescales) ⇒ Numerical instability issues
▸ Split step implicit tau-leap (SSI-TL) (Ben Hammouda, Moraes, and
Tempone 2017).
67

The Stochastic Simulation Algorithm (SSA)
(Gillespie 1976)
1 Initialize x ← x0 (The initial number of molecules of each species) and
t ← 0.
2 In state x at time t, compute (aj(x))J
j=1 and the sum a0(x) = ∑J
j=1 aj(x).
3 Generate two independent uniform (0,1) random number r1 and r2.
4 Set the next time for one of the reactions to ﬁre, τ
τ =
1
a0
ln(1/r1)
Caveat: E[τ X(t) = x] = (a0(x))−1
could be very small.
5 Find j ∈ {1,...,J} such that
a−1
0
j−1
∑
k=1
aj < r2 ≤ a−1
0
j
∑
k=1
aj
which is equivalent to choosing from reactions {1,...,J} with the jth
reaction having probability mass function (aj(x)/a0(x))J
j=1.
6 Update: t ← t + τ and x ← x + νj.
7 Record (t,x). Return to step 2 if t < T, otherwise end the simulation.

Coupling Idea in the Context of SRNs
(Kurtz 1982; Anderson and Higham 2012)
To couple two Poisson rdvs, P1(λ1), P2(λ2), with rates λ1 and λ2,
respectively, we deﬁne λ⋆
= min{λ1,λ2} and we consider the decomposition
⎧⎪⎪
⎨
⎪⎪⎩
P1(λ1) = Q(λ⋆
) + Q1(λ1 − λ⋆
)
P2(λ2) = Q(λ⋆
) + Q2(λ2 − λ⋆
)
Q(λ⋆
), Q1(λ1 − λ⋆
) and Q2(λ2 − λ⋆
) are three independent Poisson rdvs.
We have small variance between the coupled rdvs
Var[P1(λ1) − P2(λ2)] = Var[Q1(λ1 − λ⋆
) − Q2(λ2 − λ⋆
)]
= λ1 − λ2 .
Observe: If P1(λ1) and P2(λ2) are independent, then, we have a larger
variance Var[P1(λ1) − P2(λ2)] = λ1 + λ2.

Estimating Limp
c
Coarsest discretization level, Limp
c , is determined by the numerical
stability constraint of our MLMC estimator, two conditions must be
satisﬁed:
The stability of a single path: determined by a linearized stability
analysis of the backward Euler method applied to the
deterministic ODE model corresponding to our system.
The stability of the variance of the coupled paths of our MLMC
estimator: expressed by
Var[g(ZLimp
c +1
)−g(ZLimp
c
)] ≪ Var[g(ZLimp
c
)].
70

Estimating L and M
Total number of levels, L, and the set of the number of
samples per level, M, are selected to satisfy the accuracy
constraint
P( E[g(X(T))] − ˆQ < TOL) > 1 − αc. (5.1)
Assuming Normality of MLMC estimator: The MLMC
algorithm should bound the bias and the statistical error as
follows (Collier et al. 2014):
E[g(X(T)) − ˆQ] ≤ (1 − θ) TOL, (5.2)
Var[ ˆQ] ≤ (
θ TOL
Cαc
)
2
(5.3)
for some given conﬁdence parameter, Cαc , such that
Φ(Cαc ) = 1 − αc/2 ; here, Φ is the cumulative distribution function
of a standard normal rdv.
71

Checking Normality of the MLMC Estimator
95.098 95.1 95.102 95.104 95.106 95.108 95.11 95.112
0
50
100
150
200
250
MLMC estimates
Density
Empirical probability mass function of multilevel SSI−TL estimator for TOL=0.005 (Example 2)
−3 −2 −1 0 1 2 3
95.1
95.102
95.104
95.106
95.108
95.11
95.112
95.114
Standard Normal Quantiles
QuantilesofQoI
QoI vs. Standard Normal for TOL=0.005 (Example 2)
Figure 5.2: Left: Empirical probability mass function for 100 multilevel
SSI-TL estimates. Right: QQ-plot for the multilevel SSI-TL estimates.
72

Estimating L
The deepest discretization level, L, is determined by satisfying
relation (5.2) for θ = 1
2, implying
Bias(L) = E[g(X(T)) − g(ZL(T))] <
TOL
2
.
In our numerical experiments, we use the following approximation (see
Giles 2008a)
Bias(L) ≈ E[g(ZL(T)) − g(ZL−1(T))].
73

Estimating Lint and M
1 The ﬁrst step is to solve (5.4), for a ﬁxed value of the interface level, Lint
⎧⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎩
min
M
WLint (M)
s.t. Cαc
L
∑
=Limp
c
M−1V ≤ TOL
2 ,
(5.4)
▸ V = Var[g(Z (T)) − g(Z −1(T)] is estimated by the extrapolation of the
sample variances obtained from the coarsest levels, due to the presence of
large kurtosis (We provide a solution to this issue in (Ben Hammouda,
Ben Rached, and Tempone 2019)).
▸ WLint is the expected computational cost of the MLMC estimator given by
WLint = Ci,Limp
c
Mi,Limp
c
∆t−1
Limp
c
+
Lint
−1
∑
=Limp
c +1
Cii, Mii, ∆t−1
+ Cie,Lint Mie,Lint ∆t−1
Lint +
L
∑
=Lint+1
Cee, Mee, ∆t−1
, (5.5)
where Ci, Cii, Cie and Cee are, respectively, the expected computational
costs of simulating a single SSI-TL step, a coupled SSI-TL step, a coupled
SSI/explicit TL step and a coupled explicit-TL step.
74

Estimating Lint and M
2 Let us denote M∗
(Lint
) as the solution of (5.4). Then, the optimal value
of the switching parameter, Lint∗
, is chosen by solving
⎧⎪⎪
⎨
⎪⎪⎩
min
Lint
WLint (M∗
(Lint
))
s.t. Lexp
c ≤ Lint
≤ L.
In our numerical examples, we found that the lowest computational cost is
achieved for Lint∗
= Lexp
c , i.e., the same level in which the explicit-TL is stable.
75

Trajectories of Dimer Species of Example 1
0 0.05 0.1 0.15 0.2 0.25
0
200
400
600
800
1000
Dimer species trajectories by the SSA at final time T=0.2
Time (sec)
Molecules
X1
X2
X3
Figure 5.3: Trajectories of dimer species simulated by SSA at the ﬁnal time,
T = 0.2. This system has multiple timescales. The step size, ∆texp
, is
therefore taken to be extremely small to ensure the numerical stability of the
explicit TL method (∆tlim
exp ≈ 2.3 × 10−4
).
76

Empirical probability mass function of
g(Z ) − g(Z −1)
−6 −4 −2 0 2 4 6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
g(Z
l
)−g(Z
l−1
)
Density
Empirical probability mass function of g(Z
l
)−g(Z
l−1
) with the different coulping methods (Example 1, l=11) with 1000 samples
SSIexplicit TL
explicit TL
SSI−TL
Figure 5.4: Empirical probability mass function of g(Z ) − g(Z −1) simulated
by the diﬀerent coupling methods (Example 1, = 11) with 103
samples.
77

Illustration of Catastrophic Coupling
Consider an example where g takes values in {0,1}.
g denotes the corresponding level numerical approximation in
the MLMC estimator.
Y = g − g −1 =
⎧⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎩
1, with probability p
−1, with probability q
0, with probability 1 − p − q .
(5.6)
If p ,q ≪ 1 ⇒ E[Y ] ≈ 0 and κ =
E[(Y −E[Y ])4
]
(Var[Y ])2 ≈ (p + q )−1
≫ 1
⇒
▸ M ≫ κ
→∞
→ ∞ since σS2(Y ) = Var[Y ]
√
M
√
(κ − 1) + 2
M −1
;
▸ Otherwise, we may get all samples Y = 0 ⇒ an estimated variance
of zero.
78

Cost Analysis
Wwithout IS
,sample ≈ 2 × J × Cp × ∆t−1
Wwith IS
,sample ≈ 2 × J × Cp × ∆t−1
+ Clik
≪Cp
× ∑j∈J1
#Is
,j
≪∆t−1
≈ Wwithout IS
,sample .
V with IS
≪ V without IS
⇒ Mwith IS
≪ Mwithout IS
.
Notation
Wwithout IS
,sample , Wwith IS
,sample: Costs of simulating one sample path at level
without importance sampling and with importance sampling;
Cp: Cost of generating one Poisson rdv;
Clik: Cost of computing the likelihood factor.
Is
,j: the set including the time steps at level where we simulate
under the new measure for the jth reaction channel.
∑j∈J1
#Is
,j: the average number of time steps at level , where we
simulate under the new measure the jth reaction channel.
79

Example (Gene Transcription and Translation (Anderson and Higham
2012))
This system consists of ﬁve reactions. We suppose the system starts
with one gene and no other molecules, that is X0 = (0,0,0), where
X(t) = (R(t),P(t),D(t))
G
θ1
→ G + R (transcription of a single gene into mRNA)
R
θ2
→ R + P (mRNA translated into proteins)
2P
θ3
→ D (stable dimers are produced from the proteins)
R
θ4
→ ∅, P
θ5
→ ∅ (decay of mRNA and proteins)
θ = (25,103
,10−3
,10−1
,1), T = 1. The QoI is E[X(1)
(T)].
The stoichiometric matrix and the propensity functions
ν =
⎛
⎜
⎜
⎜
⎜
⎜
⎝
1 0 0
0 1 0
0 −2 1
−1 0 0
0 −1 0
⎞
⎟
⎟
⎟
⎟
⎟
⎠
, a(X) =
⎛
⎜
⎜
⎜
⎜
⎜
⎝
θ1
θ2R
θ3P(P − 1)
θ4R
θ5P
⎞
⎟
⎟
⎟
⎟
⎟
⎠

Summary of Results of Example 3
Example α β γ κL WorkMLMC
Example 3 without IS 0.97 0.96 1 3170 O (TOL−2
log (TOL)
2
)
Example 3 with IS (δ = 1/4) 0.98 1.21 1 386 O (TOL−2
)
Example 3 with IS (δ = 1/2) 1 1.47 1 51 O (TOL−2
)
Example 3 with IS (δ = 3/4) 1 1.72 1 5.83 O (TOL−2
)
Table 5.1: Comparison of convergence rates (α, β, γ) and the kurtosis at the
deepest levels of MLMC, κL, for the diﬀerent numerical examples with and
without importance sampling algorithm. α,β,γ are the estimated rates of weak
convergence, strong convergence and computational work, respectively, with a
number of samples M = 106
. 0 < δ < 1 is a parameter in our importance
sampling algorithm. By IS we refer to importance sampling
81

Eﬀect of our Method on the Catastrophic Coupling:
Example 3 (Without IS)
-2 -1 0 1g3
-g2
0
1
2
3
4
5
6
7
8
9
10
#samples
104
-2 -1 0 1 2g11
-g10
0
1
2
3
4
5
6
7
8
9
10
#samples
104
Figure 5.5: Histogram of g − g −1 (g = X
(1)
(T)) for Example 3 without IS,
and for number of samples M = 105
. a) = 3. b) = 11.
81

Example 3 (With IS)
-3 -2 -1 0 1 2g3
-g2
0
1
2
3
4
5
6
7
#samples
104
-3 -2 -1 0 1 2g11
-g10
0
1
2
3
4
5
6
7
8
9
10
#samples
104
(1)
(T)) for Example 3 with IS
(δ = 3
4
), and for number of samples M = 105
. a) = 3. b) = 11.
82

Example 2 (Without IS)
-3 -2 -1 0 1g3
-g2
0
1
2
3
4
5
6
7
8
9
10
#samples
104
-2 -1 0 1 2g11
-g10
0
1
2
3
4
5
6
7
8
9
10
#samples
10
4
(3)
(T)) for Example 2 without
importance sampling, and for number of samples M = 105
. a) = 3. b) = 11.
83

Example 2 (With IS)
-4 -3 -2 -1 0 1 2g3
-g2
0
1
2
3
4
5
6
#samples
104
-2 -1 0 1 2g11
-g10
0
1
2
3
4
5
6
7
8
9
10
#samples
104
(3)
(T)) for Example 2 with
importance sampling (δ = 3
4
), and for number of samples M = 105
. a) = 3.
b) = 11. 84

Eﬀect of our IS on MLMC Convergence Results:
MLMC without IS
0 2 4 6 8 10
-10
-5
0
5
0 2 4 6 8 10
-10
-5
0
5
0 2 4 6 8 10
0
2
4
6
8
10
0 2 4 6 8 10
10
2
10
3
kurtosis
Figure 5.9: Convergence plots of MLMC without IS. M = 106
.
85

Eﬀect of our IS on MLMC Convergence Results:
MLMC with IS
0 2 4 6 8 10
-20
-15
-10
-5
0
5
10
0 2 4 6 8 10
-10
-5
0
5
0 2 4 6 8 10
0
2
4
6
8
10
0 2 4 6 8 10
1
2
3
4
5
6
kurtosis
Figure 5.10: Convergence plots of MLMC with IS (δ = 3
4
). M = 106
.
86

Eﬀect of our IS on MLMC Complexity
10-3
10-2
10-1TOL
10
-2
100
10
2
10
4
10
6
10
8
E[W] MC + SSA
TOL-2
MLMC + TL + IS
TOL-2
Figure 5.11: MC with SSA VS MLMC combined with IS (δ = 3
4
).
10-3
10-2
10-1
TOL
10-1
100
101
102
103
104
105
E[W]
standard MLMC + TL
TOL-2
log(TOL)2
MLMC+ TL + IS
TOL
-2
Figure 5.12: Standard MLMC VS MLMC combined with IS (δ = 3
4
).
87

Why not Black-Scholes Model?
C: market data for option prices; σimp: implied volatility; BSprice:
price obtained under BS model.
Find σimp such that C = BSprice(S0,K,r,T,σimp(S0,K,r,T))
Figure 5.13: The SPX12
volatility surface (June 17, 2020), time in years, and
log moneyness k = log( K
S0
).
Log Mone
ness (k)
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
Tim
e
to
M
aturit
(
ear)
0.0
0.2
0.4
0.6
0.8
1.0
ImpliedVolatilit
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
0.5
1.0
1.5
12
SPX is a stock index based on the 500 largest companies with shares listed for
trading on the NYSE or NASDAQ 88

Conditional Expectation for Analytic Smoothing
CrB (T,K) = E [(ST − K)
+
]
= E[E[(ST − K)+
σ(W1
(t),t ≤ T)]]
= E [CBS (S0 = exp(ρ∫
T
0
√
vtdW1
t −
1
2
ρ2
∫
T
0
vtdt),k = K, σ2
= (1 − ρ2
)∫
T
0
vtdt)]
≈ ∫
R2N
CBS (GrB(w(1)
,w(2)
))ρN (w(1)
)ρN (w(2)
)dw(1)
dw(2)
= CN
rB. (5.7)
Idea: (5.7) obtained by using the orthogonal decomposition of St into
S1
t = E{ρ∫
t
0
√
vsdW1
s }, S2
t = E{
√
1 − ρ2
∫
t
0
√
vsdW⊥
s },
and then apply conditional log-normality.
Notation:
CBS(S0,k,σ2
): the Black-Scholes call price, for initial spot price S0, strike price k,
and volatility σ2
.
GrB maps 2N independent standard Gaussian random inputs to the parameters fed
to Black-Scholes formula.
ρN : the multivariate Gaussian density, N: number of time steps.
For a continuous (semi)martingale Z, E(Z)t = exp(Zt − Z0 − 1
2[Z,Z]0,t), where
[Z,Z]0,t is the quadratic variation of Z.
89

Error Comparison
Etot: the total error of approximating the expectation in (2.7).
When using ASGQ estimator, QN
Etot ≤ CRB − CN
RB + CN
RB − QN ≤ EB(N) + EQ(TOLASGQ,N),
where EQ is the quadrature error, EB is the bias, TOLASGQ is a user
selected tolerance for ASGQ method.
When using randomized QMC or MC estimator, Q
MC (QMC)
N
Etot ≤ CRB − CN
RB + CN
RB − Q
MC (QMC)
N ≤ EB(N) + ES(M,N),
where ES is the statistical error, M is the number of samples used
for MC or randomized QMC method.
MQMC
and MMC
, are chosen so that ES,QMC(MQMC
) and
ES,MC(MMC
) satisfy
ES,QMC(MQMC
) = ES,MC(MMC
) = EB(N) =
Etot
2
.
90

Randomized QMC
A (rank-1) lattice rule (Sloan 1985; Nuyens 2014) with n points
Qn(f) =
1
n
n−1
∑
k=0
f (
kz mod n
n
),
where z = (z1,...,zd) ∈ Nd
(the generating vector).
A randomly shifted lattice rule
Qn,q(f) =
1
q
q−1
∑
i=0
Q(i)
n (f) =
1
q
q−1
∑
i=0
(
1
n
n−1
∑
k=0
f (
kz + ∆(i)
mod n
n
)), (5.8)
where {∆(i)
}q
i=1: independent random shifts, and MQMC
= q × n.
▸ Unbiased approximation of the integral.
▸ Practical error estimate.
EQ(M) = O (M−1
log(M)d−1
) (for functions with bounded mixed derivatives).
91

Simulation of the Rough Bergomi Dynamics
Goal: Simulate jointly (W1
t , ̃WH
t 0 ≤ t ≤ T), resulting in W1
t1
,...,WtN
and ̃WH
t1
,..., ̃WH
tN
along a given grid t1 < ⋅⋅⋅ < tN
1 Covariance based approach (Bayer, Friz, and Gatheral 2016)
▸ 2N-dimensional Gaussian random as input vector.
▸ Based on Cholesky decomposition of the covariance matrix of the
(2N)-dimensional Gaussian random vector
W1
t1
,...,W1
tN
, ̃WH
t1
,..., ̃WtN
.
▸ Exact method but slow
▸ Work: O (N2
).
2 The hybrid scheme (Bennedsen, Lunde, and Pakkanen 2017)
▸ 2N-dimensional Gaussian random as input vector.
▸ Based on Euler discretization
▸ Approximates the kernel function in (2.6) by a power function near
zero and by a step function elsewhere.
▸ Accurate scheme that is much faster than the Covariance based
approach.
▸ Work: O (N) up to logarithmic factors.
92

On the Choice of the Simulation Scheme
Figure 5.14: The convergence of the relative weak error EB, using MC with
6 × 106
samples, for example parameters:
H = 0.07, K = 1,S0 = 1, T = 1, ρ = −0.9, η = 1.9, ξ0 = 0.0552. The upper and
lower bounds are 95% conﬁdence intervals. a) With the hybrid scheme b)
With the exact scheme.
10−2 10−1
Δt
10−2
10−1
∣E[g∣XΔt)−g∣X)]∣
weak_error
Lb
Ub
rate=Δ1.02
rate=Δ1.00
(a)
10−2 10−1
Δt
10−3
10−2
10−1
∣E[g∣XΔt)−g∣X)]∣
weak_error
Lb
Ub
rate=Δ0.76
rate=Δ1.00
(b)
93

Computational Work of the MC Method
with Different Configurations
Figure 5.15: Computational work of the MC method with the different
configurations in terms of Richardson extrapolation’s level. Case of parameter
set 1 in Table 2.1.
10−2 10−1 100
Relative Error
10−3
10−1
101
103
105
CPUtime
MC
slope= -3.33
slope= -3.33
MC+Rich(level 1)
slope= -2.51
MC+Rich(level 2)

Computational Work of the QMC Method
Figure 5.16: Computational work of the QMC method with the diﬀerent
set 1 in Table 2.1.
10−2 10−1 100
Relative Error
10−2
10−1
100
101
102
103
CPUtime
QMC
lope= -1.98
QMC+Rich(level 1)
slope= -1.57
QMC+Rich(level 2)

Computational Work of the ASGQ Method
Figure 5.17: Computational work of the ASGQ method with the diﬀerent
set 1 in Table 2.1.
10−2 10−1 100
Relative Error
10−2
10−1
100
101
102
103
CPUtime
ASGQ
slo e= -5.18
ASGQ+Rich(level 1)
slope= -1.79
ASGQ+Rich(level 2)
slope= -1.27

Optimal Smoothing Direction in Continuous Time (I)
X = (X(1)
,...,X(d)
) is described by the following SDE
dX
(i)
t = ai(Xt)dt +
d
∑
j=1
bij(Xt)dW
(j)
t , (5.9)
{W(j)
}d
j=1 are standard Brownian motions.
Hierarchical representation of W
W(j)
(t) =
t
T
W(j)
(T) + B(j)
(t)
=
t
√
T
Zj + B(j)
(t),
Zj ∼ N(0,1) (iid coarse factors); {B(j)
}d
j=1: the Brownian bridges.
Hierarchical representation of Z = (Z1,...,Zd); v: the smoothing
direction
Z = P0Z
One dimensional projection
+ P⊥Z
Projection on the complementary
= (Z,v)
=Zv
v + w (5.10)
95

Optimal Smoothing Direction in Continuous Time (II)
Using (5.9) and (5.10), observe (Hv (Zv,w) = g (X(T)))
E[g (X(T))] = E[E[Hv (Zv,w) w]]
Var[g (X(T))] = E[Var[Hv (Zv,w) w]] + Var[E[Hv (Zv,w) w]].
The optimal smoothing direction, v, solves
max
v∈Rd
v =1
E[Var[Hv (Zv,w) w]] ⇐⇒ min
v∈Rd
v =1
Var[E[Hv (Zv,w) w]]. (5.11)
Solving the optimization problem (5.11) is a hard task.
The optimal smoothing direction v is problem dependent.
In (Bayer, Ben Hammouda, and Tempone 2020b), we determine v
heuristically giving the structure of problem at hand.
96

Discrete Time Formulation: GBM Example
Consider the basket option under multi-dimensional GBM model
▸ The payoﬀ function: g(X(T)) = max(∑
d
j=1 cjX(j)
(T) − K,0)
▸ The dynamics of the stock prices: dX
(j)
t = σ(j)
X
(j)
t dW
(j)
t .
The numerical approximation of {X(j)
(T)}d
j=1, with time step ∆t
X
(j)
(T) = X
(j)
0
N−1
∏
n=0
[1 +
σ(j)
√
T
Z
(j)
1 ∆t + σ(j)
∆B(j)
n ], 1 ≤ j ≤ d
▸ (Z
(1)
1 ,...,Z
(1)
N ,Z
(d)
1 ,...,Z
(d)
N ): N × d Gaussian independent rdvs.
▸ {B(j)
}d
j=1 are the Brownian bridges increments.
E[g(X(T))] ≈ E[g (X
(1)
T ,...,X
(d)
T )] = E[g(X
∆t
(T))]
= ∫
Rd×N
G(z)ρN×d(z)dz
(1)
1 ...dz
(1)
N ...z
(d)
1 ...dz
(d)
N ,
▸ z = (z
(1)
1 ,...,z
(1)
N ,...,z
(d)
1 ,...,z
(d)
N ).
▸ ρd×N (z) = 1
(2π)d×N/2 e− 1
2 zT
z
.
97

Numerical Smoothing Step
We consider Z1 = (Z
(1)
1 ,...,Z
(d)
1 ) the most important directions.
Design of a sub-optimal smoothing direction (A: rotation matrix
and it is problem dependent.13
)
Y = AZ1.
The smoothing direction v (in continuous time formulation) is
given by the first row of A.
One dimensional root finding problem to solve for y1
K =
d
∑
j=1
cjX
(j)
0
N−1
∏
n=0
F(j)
n (y1(K)),y−1), (5.12)
F(j)
n (y1,y−1) = 1 +
σ(j)
∆t
√
T
(((A)−1
)j1y1 +
d
∑
i=2
((A)−1
)jiyi) + σ(j)
∆B(j)
n .
13
In this example, a sufficiently good choice of A is a rotation matrix with first
row leading to Y1 = ∑
d
i=1 Z
(i)
1 up to re-scaling. 98

Numerical Smoothing: Motivation
Idea: Assume that the integration domain Ω can be divided into
two parts Ω1 and Ω2.
▸ In Ω1 the integrand G is smooth and positive.
▸ G(x) = 0 in Ω2.
▸ Along the boundary between Ω1 and Ω2, the integrand is
non-diﬀerentiable or non-continuous.
Procedure
1 Determine Ω2 numerically by root ﬁnding algorithm.
2 Compute
∫
Ω
G = ∫
Ω1
G + ∫
Ω2
G
= ∫
Ω1
G
99

Why not Kernel Density Techniques
in High Dimension?
Similar to approaches based on parametric regularization as in Giles 2015.
This class of approaches has a pointwise error that increases exponentially with
respect to the dimension of the state vector X.
For a d-dimensional problem, a kernel density estimator with a bandwidth matrix,
H = diag(h,...,h)
MSE ≈ c1M−1
h−d
+ c2h4
. (5.13)
M is the number of samples, and c1 and c2 are constants.
Our approach in high dimension: For u ∈ Rd
ρX(u) = E[δ(X − u)] = E[ρd (y∗
(u)) det(J(u)) ]
≈ E[ρd (y∗
(u)) det(J(u)) ], (5.14)
J is the Jacobian matrix, with Jij =
∂y∗
i
∂xj
; ρd(.) is the multivariate Gaussian density.
Thanks to the exact conditional expectation with respect to the Brownian bridge,
the error of our approach is only restricted to the error for ﬁnding an approximated
location of the discontinuity ⇒ the error in our approach is insensitive to the
dimension of the problem.

Multilevel Monte Carlo (MLMC) (Giles 2008a)
Aim: Estimate eﬃciently E[¯I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
Setting
▸ A hierarchy of nested meshes of the time interval [0,T], indexed by { }L
=0.
▸ ∆t = K−
∆t0: The size of the subsequent time steps for levels ≥ 1 , where K>1 is a
given integer constant and ∆t0 the step size used at level = 0.
▸ ¯I : the level approximation of ¯I, computed with step size of ∆t , Nq, Laguerre
quadrature points, and TOLNewton, as the tolerance of the Newton method at level
MLMC estimator
E[¯IL] = E[¯I0] +
L
∑
=1
E[¯I − ¯I −1]
Var[¯I0] ≫ Var[¯I − ¯h −1] as
M0 ≫ M as
By deﬁning Q0 = 1
M0
M0
∑
m0=1
¯I0,[m0]; Q = 1
M
M
∑
m =1
(¯I ,[m ] − ¯I −1,[m ]), we arrive at
the unbiased MLMC estimator, Q
Q =
L
∑
=0
Q . (5.15)
101

Error and Work Discussion for ASGQ (I)
QN : the ASGQ estimator
E[g(X(T)] − QN = E[g(X(T))] − E[g(X
∆t
(T))]
Error I: bias or weak error
+ E[I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[¯I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
Error II: numerical smoothing error
+ E[¯I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − QN
Error III: ASGQ error
, (5.16)
Schemes based on forward Euler to simulate asset dynamics
Error I = O (∆t).
Giving the smoothness analysis in (Bayer, Ben Hammouda, and
Tempone 2020b), we have
Error III = O (N−p
ASGQ),
NASGQ: the number of quadrature points used by the ASGQ
estimator, and p > 0 (mixed derivatives of ¯I, in the dN − 1 dimensional
space, are bounded up to order p).
102

Error and Work Discussion for ASGQ (II)
Error II = E[I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[¯I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
≤ sup
y−1,z
(1)
−1 ,...,z
(d)
−1
I (y−1,z
(1)
−1 ,...,z
(d)
−1 ) − ¯I (y−1,z
(1)
−1 ,...,z
(d)
−1 )
= O (Nq
−s
) + O ( y∗
1 − ¯y∗
1
κ+1
)
= O (N−s
q ) + O (TOLκ+1
Newton) (5.17)
y∗
1 : the exact location of the non smoothness.
¯y∗
1 : the approximated location of the non smoothness obtained by
Newton iteration ⇒ y∗
1 − ¯y∗
1 = TOLNewton
κ ≥ 0 (κ = 0: heavy-side payoﬀ (digital option), and κ = 1: call or
put payoﬀs).
Nq is the number of points used by the Laguerre quadrature for
the one dimensional pre-integration step.
s > 0: Derivatives of G with respect to y1 are bounded up to order
s. 103

Error and Work Discussion for ASGQ (III)
An optimal performance of ASGQ is given by
⎧⎪⎪⎪
⎨
⎪⎪⎪⎩
min
(NASGQ,Nq,TOLNewton)
WorkASGQ ∝ NASGQ × Nq × ∆t−1
s.t. Etotal,ASGQ = TOL.
(5.18)
Etotal, ASGQ = E[g(X(T)] − QN
= O (∆t) + O (NASGQ
−p
) + O (Nq
−s
) + O (TOLNewton
κ+1
).
We show in (Bayer, Ben Hammouda, and Tempone 2020b) that
under certain conditions for the regularity parameters s and p
(p,s ≫ 1)
▸ The ASGQ: WorkASGQ = O (TOL−1
▸ The MC method: WorkMC = O (TOL−3
104

Error and Work Discussion for MLMC (I)
ˆQ: the MLMC estimator, as deﬁned in (5.15).
E[g(X(T)] − ˆQ = E[g(X(T))] − E[g(X
∆tL
(T))]
Error I: bias or weak error
+ E[IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[¯IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
Error II: numerical smoothing error (same as in (5.17))
+ E[¯IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − ˆQ
Error III: MLMC statistical error
. (5.19)
Schemes based on forward Euler to simulate asset dynamics
Error I = O (∆tL).
Error III =
L
∑
=L0
M−1V = O
⎛
⎜
⎝
L
∑
=L0
√
Nq, log (TOL−1
Newton, )
⎞
⎟
⎠
.
Notation: V = Var[¯I − ¯I −1]; M : number of samples at level .
105

Error and Work Discussion for MLMC (II)
An optimal performance of MLMC is given by
⎧⎪⎪⎪
⎨
⎪⎪⎪⎩
min
(L,L0,{M }L
=0
,Nq,TOLNewton)
WorkMLMC ∝ ∑L
=L0
M (Nq, ∆t−1
)
s.t. Etotal,MLMC = TOL.
(5.20)
Etotal, MLMC = E[g(X(T)] − ˆQ
= O (∆tL) + O
⎛
⎜
⎝
L
∑
=L0
√
Nq, log (TOLNewton,
−1)
⎞
⎟
⎠
+ O (Nq,L
−s
)
+ O (TOLNewton,L
κ+1
).
106

Error Discussion for MLMC
ˆQ: the MLMC estimator, as deﬁned in (5.15).
E[g(X(T)] − ˆQ = E[g(X(T))] − E[g(X
∆tL
(T))]
Error I: bias or weak error = O(∆tL)
+ E[IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[¯IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
Error II: numerical smoothing error = O(N−s
q )+O(TOLκ+1
Newton
)
+ E[¯IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − ˆQ
Error III: MLMC statistical error = O
⎛
⎝
√
∑L
=L0
√
Nq, log(TOL−1
Newton,
)
⎞
⎠
.
Etotal, MLMC = E[g(X(T)] − ˆQ
= O (∆tL) + O
⎛
⎜
⎝
L
∑
=L0
√
Nq, log (TOL−1
Newton, )
⎞
⎟
⎠
+ O (N−s
q,L) + O (TOLκ+1
Newton,L).
Notation: V = Var[¯I − ¯I −1]; M : number of samples at level .

PhD defense talk slides

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie PhD defense talk slides

Ähnlich wie PhD defense talk slides (20)

Mehr von Chiheb Ben Hammouda

Mehr von Chiheb Ben Hammouda (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

PhD defense talk slides