There exist a large number of metrics to evaluate the performance-risk trade-off of a portfolio. Although those metrics have proven to be useful tools in practice, most of them require a large amount of data and implicitly assume returns to be normally distributed. Bayesian modeling is a statistical framework that allows great flexibility in modeling financial returns as well as risk metrics. In addition, uncertainty of these metrics can be directly quantified in terms of the posterior distribution.
In this talk, Thomas will briefly provide an overview of Bayesian statistics and how Probabilistic Programming frameworks like PyMC can be used to build and estimate complex statistical models. He will then show how several common financial risk metrics like the Sharpe ratio can be expressed as a probabilistic program. Using real-world data from anonymized algorithms running on Quantopian, he will demonstrate how the normality assumption can strongly bias the Sharpe ratio and how heavy-tailed distributions can remedy this problem.
This presentation was part of the QuantCon 2015 Conference hosted by Quantopian. Visit us at: www.quantopian.com.
3. About meAbout me
Lead Data Scientist at : Building a
crowd sourced hedge fund.
PhD from Brown University -- research on computational neuroscience and machine
learning using Bayesian modeling.
Quantopian Inc (https://www.quantopian.com)
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
3 of 86 03/17/2015 08:47 PM
5. The problem we're gonna solveThe problem we're gonna solve
Two real-money strategies:
In [76]: plot_strats()
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
5 of 86 03/17/2015 08:47 PM
7. Types of risk
Systematic and Unsystematic Risk
Volatility
Tailrisk
Beta
D
raw
dow
n
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
7 of 86 03/17/2015 08:47 PM
11. Types of risk
Systematic and Unsystematic Risk
Model misspecification
Estimation Uncertainty
Programming errors
Data issuesModelRisk
Volatility
Tailrisk
Beta
D
raw
dow
n
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
11 of 86 03/17/2015 08:47 PM
15. Short primer on random variablesShort primer on random variables
Represents our beliefs about an unknown state.
Probability distribution assigns a probability to each possible state.
Not a single number (e.g. most likely state).
"When I bet on horses, I never lose. Why? I bet on all the horses." Tom Haverford
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
15 of 86 03/17/2015 08:47 PM
17. You already know what a variable is...You already know what a variable is...
In [8]: coin = 0 # 0 for tails
coin = 1 # 1 for heads
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
17 of 86 03/17/2015 08:47 PM
19. A random variable assigns all possible values a certainA random variable assigns all possible values a certain
probabilityprobability
In [ ]: coin = {0: 50%,
1: 50%}
Alternatively:Alternatively:
coin ~ Bernoulli(p=0.5)
coin is a random variable
Bernoulli is a probability distribution
~ reads as "is distributed as"
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
19 of 86 03/17/2015 08:47 PM
21. This was discrete (binary), what about the continuousThis was discrete (binary), what about the continuous
case?case?
returns ~ Normal( , )μ σ2
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
21 of 86 03/17/2015 08:47 PM
22. In [77]: from scipy import stats
sns.distplot(data_0, kde=False, fit=stats.norm)
plt.xlabel('returns')
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
22 of 86 03/17/2015 08:47 PM
23. How to estimateHow to estimate andand ??
Naive: point estimate
Set mu = mean(data) and sigma = std(data)
Maximum Likelihood Estimate
Correct answer as
μ σ
n → ∞
Bayesian analysisBayesian analysis
Most of the time ...
Uncertainty about and
Turn and into random variables
How to estimate?
n ≠ ∞
μ σ
μ σ
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
23 of 86 03/17/2015 08:47 PM
25. Bayes Formula!Bayes Formula!
BayesPrior
Data
Posterior
Use prior knowledge and data to update our beliefs.
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
25 of 86 03/17/2015 08:47 PM
29. Probabilistic ProgrammingProbabilistic Programming
Model unknown causes (e.g. ) of a phenomenon as random variables.
Write a programmatic story of how unknown causes result in observable data.
Use Bayes formula to invert generative model to infer unknown causes.
μ
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
29 of 86 03/17/2015 08:47 PM
31. Approximating the posterior with MCMC samplingApproximating the posterior with MCMC sampling
In [81]: plot_want_get()
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
31 of 86 03/17/2015 08:47 PM
33. PyMC3PyMC3
Probabilistic Programming framework written in Python.
Allows for construction of probabilistic models using intuitive syntax.
Features advanced MCMC samplers.
Fast: Just-in-time compiled by Theano.
Extensible: easily incorporates custom MCMC algorithms and unusual probability
distributions.
Authors: John Salvatier, Chris Fonnesbeck, Thomas Wiecki
Upcoming beta release!
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
33 of 86 03/17/2015 08:47 PM
45. Graphical model of returnsGraphical model of returns
Bayes
PosteriorsPriors
Data
µ ~
σ ~
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
45 of 86 03/17/2015 08:47 PM
47. This is what the data looks likeThis is what the data looks like
In [9]: print data_0.head()
2013-12-31 21:00:00 0.002143
2014-01-02 21:00:00 -0.028532
2014-01-03 21:00:00 -0.001577
2014-01-06 21:00:00 -0.000531
2014-01-07 21:00:00 0.011310
Name: 0, dtype: float64
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
47 of 86 03/17/2015 08:47 PM
53. Analyzing the posteriorAnalyzing the posterior
In [84]: sns.distplot(results_normal[0][0]['mean returns'], hist=False, label='etrade')
sns.distplot(results_normal[1][0]['mean returns'], hist=False, label='IB')
plt.title('Posterior of the mean'); plt.xlabel('mean returns')
Out[84]: <matplotlib.text.Text at 0x7fde80cb5850>
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
53 of 86 03/17/2015 08:47 PM
61. Value at Risk with uncertaintyValue at Risk with uncertainty
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
61 of 86 03/17/2015 08:47 PM
63. Interim summaryInterim summary
Bayesian stats allows us to reformulate common risk metrics, use priors and
quantify uncertainty.
IB strategy seems better in almost every regard. Is it though?
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
63 of 86 03/17/2015 08:47 PM
67. Is this a good model?Is this a good model?
In [93]: sns.distplot(data_1, label='data IB', kde=False, norm_hist=True, color='.5')
for p in ppc_dist_normal:
plt.plot(x, p, c='r', alpha=.1)
plt.plot(x, p, c='r', alpha=.5, label='Normal model')
plt.xlabel('Daily returns')
plt.legend();
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
67 of 86 03/17/2015 08:47 PM
69. Can it be improved? Yes!Can it be improved? Yes!
Identical model as before, but instead, use a heavy-tailed T distribution:
returns ∼ T(ν, μ, )σ2
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
69 of 86 03/17/2015 08:47 PM
70. In [94]: sns.distplot(data_1, label='data IB', kde=False, norm_hist=True, color='.5')
for p in ppc_dist_t:
plt.plot(x, p, c='y', alpha=.1)
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
70 of 86 03/17/2015 08:47 PM
71. Lets compare posteriors of the normal and TLets compare posteriors of the normal and T
modelmodel
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
71 of 86 03/17/2015 08:47 PM
81. Comparing the Bayesian T-Sharpe ratiosComparing the Bayesian T-Sharpe ratios
In [101]: sns.distplot(results_t[0][0]['sharpe'], hist=False, label='etrade')
sns.distplot(results_t[1][0]['sharpe'], hist=False, label='IB')
plt.xlabel('Bayesian Sharpe ratio'); plt.ylabel('Probability Density');
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
81 of 86 03/17/2015 08:47 PM
82. In [42]: print 'P(Sharpe ratio IB > Sharpe ratio etrade) = %.2f%%' %
(np.mean(results_t[1][0]['sharpe'] > results_t[0][0]['sharpe']) * 100)
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
82 of 86 03/17/2015 08:47 PM
83. ConclusionsConclusions
Bayesian statistics allows us to quantify uncertainty -- measure orthogonal sources
of risk.
Rich statistical framework to compare different models against each other.
Blackbox inference algorithms allow estimation of complex models.
PyMC3 puts advanced samplers at your fingertips.
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
83 of 86 03/17/2015 08:47 PM
85. Further readingFurther reading
-- Develop trading algorithms like this in
your browser.
-- IPython
Notebook book on Bayesian stats using PyMC2.
-- Great book by Kruschke.
Twitter:
Quantopian (https://www.quantopian.com)
My blog for Bayesian linear regression (financial alpha and beta)
(https://twiecki.github.io)
Probilistic Programming for Hackers (http://camdavidsonpilon.github.io
/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/)
Doing Bayesian Data Analysis (http://www.indiana.edu/~kruschke
/DoingBayesianDataAnalysis/)
PyMC3 repository (https://github.com/pymc-devs/pymc3)
@twiecki (https://twitter.com/twiecki)
bayesian_risk_perf_v3 slides http://twiecki.github.io/bayesian_risk_perf_v3.slides.html?print-pdf#/
85 of 86 03/17/2015 08:47 PM