The document describes using reinforcement learning to implement hedging of derivatives. It discusses:
1) Setting up a hedging model using reinforcement learning, where the state includes the asset price and time to maturity, the action is the hedge position, and the reward minimizes hedging costs.
2) Conducting experiments hedging a short call option using the model, comparing performance under geometric Brownian motion and stochastic volatility.
3) Finding that the reinforcement learning model outperforms delta hedging strategies and achieves lower hedging costs, demonstrating the effectiveness of using reinforcement learning for hedging derivatives.
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
ย
PPT - Deep Hedging OF Derivatives Using Reinforcement Learning
1. Deep Hedging of Derivatives Using
Reinforcement Learning
Hull et al. working paper
๋ฐํ์ : ์ค์ง์
Graduate School of Information. Yonsei Univ.
Machine Learning & Computational Finance Lab.
4. 1. Introduction
When someone conduct risk management, hedging is very common and
important thing to do
But theoretical hedging cannot be fitted to real-world problem exactly because
of market friction
5. 1. Introduction
Hedging is sequential optimal control task
&
RL is sequential optimal control task
Then can we implement RL to hedging task to
reduce total hedging cost?
11. 2. Hedging
Delta-hedging
โ= ๐ ๐1 =
๐๐ถ
๐๐
So when we take position amount of โ, portfolio profit is almost zero
If volatility of underlying asset is very high, or hedging period is too wide, hedge will
not be effective
15. 3. Setting Hedging model
State
1. The holding of the asset
during the previous time period((๐ โ 1)ฮ๐ก~๐ฮ๐ก) : ๐ป๐โ1
2. The asset price at time(๐ฮ๐ก) : ๐๐
3. The time to maturity : (๐ โ ๐)ฮ๐ก
Action
The amount of the asset to be held from time ๐ฮ๐ก to time (๐ + 1)ฮ๐ก : ๐ป๐
State & Action
โข Time-step : ฮ๐ก
โข The life of the option : ๐ฮ๐ก
16. 3. Setting Hedging model
Accounting P&L formulation
๐ ๐+1 = ๐๐+1 โ ๐๐ + ๐ป๐ ๐๐+1 โ ๐๐ โ ๐ |๐๐+1 ๐ป๐+1 โ ๐ป๐ |
When we derive reward function as accounting P&L formulation,
reward function to minimize can be:
where
โข ๐๐ : Derivatives value at time-step ๐ฮ๐ก
โข ๐๐ : Underlying asset value at time-step ๐ฮ๐ก
โข ๐ป๐: Position of underlying asset relative to position of derivatives
โข ๐ : Trading cost parameter
In addition, there are an initial reward โ๐ |๐0๐ป0| and final reward โ๐ |๐๐๐ป๐|
to set up(liquidate) the hedge position at first(last) time-step
if long, positive value
if short, negative value
17. 3. Setting Hedging model
Cash Flow Formulation
๐ ๐+1 = ๐๐+1 ๐ป๐ โ ๐ป๐+1 โ ๐ |๐๐+1 ๐ป๐+1 โ ๐ป๐ |
When we derive reward function as cash flow formulation,
reward function to minimize can be:
where
โข ๐๐ : Underlying asset value at time-step ๐ฮ๐ก
โข ๐ป๐: Position of underlying asset relative to position of derivatives
โข ๐ : Trading cost parameter
In addition, there are other rewards
โข Initial rewards : โ๐0๐ป0 โ ๐ |๐0๐ป0| at first time-step
โข final rewards : ๐๐๐ป๐ โ ๐ ๐๐๐ป๐ + ๐๐๐ฆ๐๐๐ ๐๐ ๐๐๐๐๐ฃ๐๐ก๐๐ฃ๐๐ at last time-step
if long, positive value
if short, negative value
19. 3. Setting Hedging model
Approach Comparison
๐๐+1 ๐ป๐ โ ๐ป๐+1 โ ๐ |๐๐+1 ๐ป๐+1 โ ๐ป๐ |
๐๐+1 โ ๐๐ + ๐ป๐ ๐๐+1 โ ๐๐ โ ๐ |๐๐+1 ๐ป๐+1 โ ๐ป๐ |
Accounting P&L approach reward
โ๐ |๐0๐ป0|
โ๐ |๐๐๐ป๐|
At time-step 1
At time-step 2~(๐ โ 1)
At time-step ๐
Cash Flow approach reward
At time-step 1
At time-step 2~(๐ โ 1)
At time-step ๐
โ๐0๐ป0 โ ๐ |๐0๐ป0|
๐๐๐ป๐ โ ๐ ๐๐๐ป๐ + ๐
๐
When we use Accounting P&L approach
reward, we should know derivatives
pricing model
20. 3. Setting Hedging model
Approach Comparison
โฆ
โฆ
Time-step
Time-step
reward
reward
๏ง Accounting P&L approach rewards are almost zero-near value.
โ to minimize cost (reward), model just train to make rewards at every time step equal zero
๏ง However, Cash Flow approach rewards are not similar each other.
โ to minimize cost, model should learn pricing model and is hard to converge because of
credit assignment problem
21. 3. Setting Hedging model
๐ ๐ก = ๐ผ ๐ถ๐ก + ๐ ๐ผ ๐ถ๐ก
2
โ ๐ผ ๐ถ๐ก
2
Model in this work
๐น ๐๐ก, ๐ = ๐1(๐๐ก, ๐) + ๐ ๐2(๐๐ก, ๐) โ ๐1 ๐๐ก, ๐ 2
Two Q-values are introduced,
๐1 estimates the expected cost for state-action combinations
๐1 โ ๐ผ ๐ถ๐ก
๐1 estimates the expected value of the square of the cost for state-action combinations
๐2 โ ๐ผ ๐ถ๐ก
2
Expectation of
hedging cost
volatility of
hedging cost
Set cost equation ๐ ๐ก to minimize
where ๐ผ ๐ถ๐ก is expectation of hedging cost for time ๐ก ~ maturity
22. 3. Setting Hedging model
Model in this work
Critic ๐1& ๐2 update with loss function:
๐ ๐ก+1 + ๐พ๐1 ๐๐ก+1, ๐ ๐๐ก+1 โ ๐1 ๐๐ก, ๐ด๐ก; ๐ค1
2
๐ ๐ก+1
2
+ ๐พ2
๐2 ๐๐ก+1, ๐ ๐๐ก+1 + 2๐พ๐ ๐ก+1๐1 ๐๐ก+1, ๐ ๐๐ก+1 โ ๐2 ๐๐ก, ๐ด๐ก; ๐ค2
2
Actor ๐ update as:
๐ โ ๐ โ ๐ผโ๐๐น(๐๐ก, ๐ ๐๐ก; ๐ )
โ๐๐น ๐๐ก, ๐ ๐๐ก; ๐ = โ๐๐1(๐๐ก, ๐) + ๐ฉ(โ๐๐2 ๐๐ก, ๐ โ 2๐1 ๐๐ก, ๐ โ๐๐1 ๐๐ก, ๐
where ๐ฉ =
๐
2
๐2 ๐๐ก, ๐ โ ๐1 ๐๐ก, ๐ 2 โ
1
2
Since expected value of ๐2 ๐๐ก, ๐ด๐ก =expected value of ๐ ๐ก+1 + ๐พ๐1 ๐๐ก+1, ๐ 2
,
25. 4. Experiments
Setting
โข We are in short position on 1 call option of different time-to-maturity
1. 1-month
2. 3-months
โข Strike price of call option ๐พ = ๐0 (ATM at time-step 0)
โข We can only use underlying stock to hedge.
โข Using DDPG algorithm.
โข Implement the prioritized experience replay method.
โข Using Accounting P&L approach.
28. 4. Experiments
II. Stochastic Volatility Test
When an option is ATM, implied volatility is approximately ๐0๐ต
taking ๐0๐ต into Black-Scholes model as input ๐, we can value a call option
SABER model (๐ฝ = 1)
๐๐ = ๐๐๐๐ก + ๐๐๐๐ง1
๐๐ = ๐ฃ๐๐๐ง2
๐ผ ๐๐ง1๐๐ง2 = ๐๐๐ก
where ๐ฃ: volatility of volatility
๐ = โ0.4, ๐0 = 20%, ๐ฃ = 60%, others = equal
๐น0 = ๐0๐ ๐โ๐ ๐
๐ต = 1 +
๐๐ฃ๐0
4
+
2โ3๐2 ๐ฃ
24
๐
๐ =
๐ฃ
๐0 ln
๐น0
๐พ
๐ = ln
1โ2๐+๐2+๐โ๐
1โ๐
29. 4. Experiments
II. Stochastic Volatility Test
Our model is compared with 2 delta-hedging strategy
1. Bartlett Delta : Delta calculated by SABER
2. Practitioner Delta : Delta calculated by market implied volatility
31. 4. Experiments
a. our hedge instrument position is close to theoretical hedge position: Delta hedging
b. our hedge instrument position is much less than theoretical hedge position: being under-hedging
c. our hedge instrument position is much more than theoretical hedge position: being over-hedging
Since transaction cost is significant,
model donโt take hedge position as much as model required
32. 4. Experiments
Since transaction cost is significant,
model donโt take hedge position as much as model required
When 0.6 delta is required and we take 0.5 delta hedge position, model take 0.1 delta more
When 0.9 delta is required and we take 0.5 delta hedge position, model take only 0.25 delta more
When 0.2 delta is required and we take 0.5 delta hedge position, model take only -0.2 delta more
34. 1. Use not only simulated data but real-world data
2. More well-structured architecture is needed
3. Practical hedging method like hedging vol as well as delta-hedging should be
controlled by RL
4. Adaptive transaction cost can be introduced
5. Conclusion