SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Some contributions to the clustering
of financial time series
and its applications to credit default swaps
Gautier Marti
November 10, 2017
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Context of the PhD thesis
PhD studies in hedge funds:
Hellebore Capital Management,
63 Avenue des Champs-Elys´ees, Paris, France
(April 1, 2014 - February 29, 2016)
Hellebore Capital Limited,
81 Fulham Road, London, United Kingdom
(March 1, 2016 - September 20, 2017)
AXA IM Chorus,
18 Westlands Road, Hong Kong
(October 1, 2017 - present)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Outline
1. Introduction to the credit default swap dataset
Contributions:
2. About the consistency of clustering financial time series
3. Improving standard distances between financial time series
i) a simple correlation + distribution distance
ii) a geometrical approach to define dependence coefficients from
copulas
4. Perspectives
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Some movation for using clustering
Statistical modelling is difficult as the time series
are non stationary, e.g. economic regimes are changing, it
can be misleading to use data from a too distant past
are near efficient, i.e. behaving nearly like random walks (cf.
the efficient-market hypothesis (Fama, 1970) [5])
have a low signal-to-noise ratio, i.e. measure artifacts hide
information in random fluctuations
are in an unfavorable statistical setting, too few relevant
observations (length) wrt the number of variables (time series)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Some motivation for using clustering
Clustering helps to reduce dimensionality, and thus can be used as
a preprocessing for:
Risk management, e.g. filtering covariances, performance
and risk attribution
Investment, e.g. portfolio design, statarb, beta neutralization
Data analysis, e.g. outliers detection and missing values
imputation, exploration (www.datagrapple.com)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
1 Introduction to credit default swaps
2 About the consistency of clustering financial time series
3 Design of distances and alternative dependence coefficients
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
4 Summary and open questions
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Introduction to the credit default swap
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Introduction to the CDS market
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Introduction to the CDS raw dataset
Putting Self-Supervised Token Embedding on the Tables [15]
(ICMLA 2017)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
A ‘tick-by-tick’ dataset
Autoregressive Convolutional Neural Networks for Asynchronous
Time Series [1] (ICML Time Series WS 2017)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Historical daily time series of spread
From the received ‘tick-by-tick’ prices, a synthetic order book is built. At
5pm London time, we save the mid-price of the best bid and best offer in
the order book for each entity. N ≈ 800 liquid CDSs, with T ≈ 3000.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
1 Introduction to credit default swaps
2 About the consistency of clustering financial time series
3 Design of distances and alternative dependence coefficients
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
4 Summary and open questions
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Clustering of Financial Time Series
Stylized fact I: Financial time series correlations have a strong
hierarchical block diagonal structure (Mantegna, 1999) [6]
https://gmarti.gitlab.io/ml/2017/09/07/how-to-sort-distance-matrix.html
Stylized fact II: Most correlations are spurious (Bouchaud,
1999) [3]
Motivation for clustering financial time series using correlation as a
similarity measure:
dimensionality reduction ≡ filtering noisy correlations
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Challenge for the statistical practitioner
The dilemma:
the longer the time interval, the more precise the correlation
estimates, but also
the longer the time interval, the more unrealistic the
stationarity hypothesis for these time series.
Question: How does the clustering behave with statistical errors
of the correlation estimates?
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
A first theoretical approach - simplified setting
We consider the following framework:
financial time series ≡ random walks
they follow a joint elliptical distribution (e.g. Gaussian,
Student) parameterized by a correlation matrix
the correlation matrix has a hierarchical block structure:
Gautier Marti Some contributions to the clustering of financial time series
Hierarchical clustering algorithms - A taxonomy
We consider Hierarchical Agglomerative Clustering algorithms.
Such as single linkage, average linkage, Ward.
Space contracting vs. Space conserving vs. Space dilating [2]
D(t+1)
C
(t)
i
∪ C
(t)
j
, C
(t)
k
≤ min D
(t)
ik
, D
(t)
jk
D(t+1)
C
(t)
i
∪ C
(t)
j
, C
(t)
k
∈
min D
(t)
ik
, D
(t)
jk
, max D
(t)
ik
, D
(t)
jk
D(t+1)
C
(t)
i
∪ C
(t)
j
, C
(t)
k
≥ max D
(t)
ik
, D
(t)
jk
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Simulations in the simplified setting
Some influential parameters:
clustering algorithm
number of observations T
number of variables N relative to T
contrast between the correlations, and their values
correlation estimator (e.g. Pearson, Spearman)
100 200 300 400 500
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Score
Empirical rates of convergence for Single Linkage
Gaussian - Pearson
Gaussian - Spearman
Student - Pearson
Student - Spearman
100 200 300 400 500
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Score
Empirical rates of convergence for Average Linkage
Gaussian - Pearson
Gaussian - Spearman
Student - Pearson
Student - Spearman
100 200 300 400 500
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Score
Empirical rates of convergence for Ward
Gaussian - Pearson
Gaussian - Spearman
Student - Pearson
Student - Spearman
Ratio of the number of correct clustering obtained over the
number of trials as a function of T
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
A consistency proof & first convergence bounds
A 2-step proof. First step:
Which geometrical configurations lead to the true clustering?
For space-conserving algorithms (e.g. Single, Complete, Average
Linkage), a sufficient separability condition reads
max Dintra := max
1≤i,j≤N
C(i)=C(j)
d(Xi , Xj ) < min
1≤i,j≤N
C(i)=C(j)
d(Xi , Xj ) =: min Dinter
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
A consistency proof & first convergence bounds
A 2-step proof. Second step:
How long does it take for the estimates of the correlation
coefficients to be precise enough to be with high probability in
a good configuration for the clustering algorithm?
Answer: Concentration inequalities for correlation coefficients.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Convergence bounds
Combining both steps, we get the following convergence rate:
Convergence rate
The probability of the clustering algorithm making an error is
O
log N
T
.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Proof. Step 1 - A bit more details
By induction.
Let’s assume the separability condition is satisfied at step t,
then
min D
(t)
intra ≤ max D
(t)
intra < min D
(t)
inter ≤ max D
(t)
inter
From the space-conserving property, we get:
D
(t+1)
intra ∈ min D
(t)
intra, max D
(t)
intra and D
(t+1)
inter ∈ min D
(t)
inter, max D
(t)
inter .
separability condition is satisfied at t+1,
the clustering algorithm has not linked points from two
different clusters between step t and step t + 1.
Gautier Marti Some contributions to the clustering of financial time series
Proof. Step 2 - A bit more details
Maximum statistical error - (Marti and Andler, IJCAI 2016)
For space conserving algorithm the separability condition is met if
ˆΣ − Σ ∞ <
minρi ,ρj
|ρi − ρj |
2
,
where C(i) = C(j).
This means that the statistical error has to be below the minimum
correlation ‘contrast’ between the clusters.
Weaker the ‘contrast’, more precise the correlation estimates have to be.
N.B. From Cram´er–Rao lower bound, we get for Pearson correlation
estimator:
var(ˆρ) ≥
1
I(ρ)
=
(ρ − 1)2
(ρ + 1)2
3(ρ2 + 1)
.
When correlation is high, it is easier to estimate. (Marti, SSP 2016)
Gautier Marti Some contributions to the clustering of financial time series
Correlation estimates concentration bounds
number of variables N, observations T, minimum separation d
Concentration bounds [4]
If Σ and ˆΣ are the population and empirical Spearman correlation
matrices respectively, then for N ≥ 24
log T + 2, we have with
probability at least 1 − 1
T2 ,
ˆΣ − Σ ∞ ≤ 24
log N
T
.
P(“correct clustering”) ≥ 1 − 2N2
e−Td2/24
Not sharp enough for reasonable values of N, T, d.
For example, for N = 500, T = 2500, d = 0.2, we obtain
≈ −7750.
Gautier Marti Some contributions to the clustering of financial time series
Future developments & open questions
Bounds are not sharp enough. We can try to refine them using:
(theoretical) Intrinsic dimension of the HCBM model [16];
(theoretical Use PSD-ness to refine the bounds for the matrix
(theoretical/empirical) A distance between dendrograms
(instead of correct/incorrect) for a finer analysis;
(empirical) A study of ‘correctness’ isoquants:
Precise convergence rates of clustering methodologies can provide
a useful model selection criterion for practitioners!
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
1 Introduction to credit default swaps
2 About the consistency of clustering financial time series
3 Design of distances and alternative dependence coefficients
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
4 Summary and open questions
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Motivations
Not only correlation!
Different kinds of returns distributions may exist in the data.
We may want to refine ‘correlation’ clusters with this information.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
A (too) naive distance and its pitfalls
Applying L2 directly on the time series mixes correlation and
volatility.
We are looking for a better representation so that a L2 is
meaningful.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
A (too) naive distance and its pitfalls
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Revisting Sklar theorem (1959)...
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
and Deheuvels empirical copulas (1981)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
A novel representation for time series
Basically, we transform each time series of returns to a (normalized
ranks, square root of marginal density) vector.
Applying a L2 between two of these vectors is now equivalent to a
distance in Spearman correlation + Hellinger between the densities.
cf. (Marti et al., 2016) [13] (Pattern Recognition Letters) for more
details on this representation and the associated distance.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Analysing the differences - Using the Sankey diagram
cf. (Marti et al., 2015) [14] (ICMLA 2015) for guidelines on how to
compare several clustering methodologies for financial time series.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Exploring and improving dependence coefficients –
Motivations
We want to use dependence measures which are:
robust to noise (not Pearson then) and preserve as much
information as possible, so that clusterings are more stable;
can be tuned to look for specific dependencies, e.g.
tail-dependence or more exotic ones.
As copulas are a convenient way to capture all the dependence
between two variables, we aim at leveraging them.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Minimum, Independence, Maximum copulas
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Relation to existing dependence coefficients
Some dependence coefficients can be readily expressed as:
deviation from Fr´echet-Hoeffding bounds
Spearman’s ρS = 1 − 6 [0,1]2 (ui − uj )2
dC(ui , uj ),
Gini’s γ,
Kendall distribution distance,
deviation from independence ui uj
Spearman ρS = 12 [0,1]2 (C(ui , uj ) − ui uj )dui
duj
,
Copula MMD,
Schweizer-Wolff’s σ,
Hoeffding’s φ2
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Idea: Relative position of empirical copula wrt ‘targets’
and ‘forgets’
In a classical setting, we choose the positive and negative
dependence copulas as ‘targets’, and the independence one as a
‘forget’ dependence.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Optimal Transport between empirical copulas
cf. (Marti et al., 2016) [12] (ICASSP 2016)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Why choosing Optimal Transport over f -divergences?
Distances between Gaussian copulas C1, C2, C3:
cf. (Marti et al., 2016) [7] (SSP 2016)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Standard setting: TFDC vs. Spearman
cf. (Marti et al., 2017) [8] (NIPS Time Series WS 2016)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Power of TFDC and state-of-the-art dependence measures
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Some applications of the Target/Forget Dependence
Coefficient
Applications in non-standard settings: We can look for particular
associations between random variables.
cf. (Marti et al., 2017) [8] (NIPS Time Series 2016)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
Impact of different coefficients on clustering
Different results... Stability and empirical convergence rates may
help for choosing one over the others.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
1 Introduction to credit default swaps
2 About the consistency of clustering financial time series
3 Design of distances and alternative dependence coefficients
Alternative representation and correlation+distribution distance
Copula-based dependence coefficients
4 Summary and open questions
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Summary of contributions
The contribution of the PhD thesis:
bring a greater focus on statistical reliability (convergence
rates and consistency) [9] (IJCAI 2016)
consider alternative representation and distances [13]
(Pattern Recognition Letters), [8] (NIPS Time Series 2016)
visualizations [10] and a framework to test for clustering
stability [14] (ICMLA 2015)
an extensive and regularly updated survey of the literature:
https://arxiv.org/pdf/1703.00485.pdf [11] (350+
references)
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Perspectives and open questions
“How many clusters?”
What is multivariate correlation? How to use it for
hierarchical clustering?
Using several time series representing a given entity, and
dependence between random vectors?
Riemannian geometry of correlation matrices (not a totally
geodesic submanifold of the well-explored manifold of
covariances)
Entities switching clusters: noise or signal?
More precise results for (empirical) convergence rates?
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Mikolaj Binkowski, Gautier Marti, and Philippe Donnat.
Autoregressive convolutional neural networks for asynchronous
time series.
arXiv preprint arXiv:1703.04122, 2017.
Zhenmin Chen and John W Van Ness.
Space-conserving agglomerative algorithms.
Journal of classification, 13(1):157–168, 1996.
Laurent Laloux, Pierre Cizeau, Marc Potters, and
Jean-Philippe Bouchaud.
Random matrix theory and financial correlations.
International Journal of Theoretical and Applied Finance,
3(03):391–397, 2000.
Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry
Wasserman, et al.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
High-dimensional semiparametric gaussian copula graphical
models.
The Annals of Statistics, 40(4):2293–2326, 2012.
Burton G Malkiel and Eugene F Fama.
Efficient capital markets: A review of theory and empirical
work.
The journal of Finance, 25(2):383–417, 1970.
Rosario N Mantegna.
Hierarchical structure in financial markets.
The European Physical Journal B-Condensed Matter and
Complex Systems, 11(1):193–197, 1999.
Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe
Donnat.
Optimal transport vs. fisher-rao distance between copulas for
clustering multivariate time series.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
In Statistical Signal Processing Workshop (SSP), 2016 IEEE,
pages 1–5. IEEE, 2016.
Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe
Donnat.
Exploring and measuring non-linear correlations: Copulas,
lightspeed transportation and clustering.
In NIPS 2016 Time Series Workshop, pages 59–69, 2017.
Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe
Donnat.
Clustering financial time series: How long is enough?
2016.
Gautier Marti, Philippe Donnat, Frank Nielsen, and Philippe
Very.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
HCMapper: An interactive visualization tool to compare
partition-based flat clustering extracted from pairs of
dendrograms.
arXiv preprint arXiv:1507.08137, 2015.
Gautier Marti, Frank Nielsen, Mikolaj Bi´nkowski, and Philippe
Donnat.
A review of two decades of correlations, hierarchies, networks
and clustering in financial markets.
arXiv preprint arXiv:1703.00485, 2017.
Gautier Marti, Frank Nielsen, and Philippe Donnat.
Optimal copula transport for clustering multivariate time
series.
In Acoustics, Speech and Signal Processing (ICASSP), 2016
IEEE International Conference on, pages 2379–2383. IEEE,
2016.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Gautier Marti, Philippe Very, and Philippe Donnat.
Toward a generic representation of random variables for
machine learning.
arXiv preprint arXiv:1506.00976, 2015.
Gautier Marti, Philippe Very, Philippe Donnat, and Frank
Nielsen.
A proposal of a methodological framework with experimental
guidelines to investigate clustering stability on financial time
series.
In 14th IEEE International Conference on Machine Learning
and Applications, ICMLA 2015, Miami, FL, USA, December
9-11, 2015, pages 32–37, 2015.
Marc Szafraniec, Gautier Marti, and Philippe Donnat.
Putting self-supervised token embedding on the tables.
arXiv preprint arXiv:1708.04120, 2017.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Joel A Tropp.
An introduction to matrix concentration inequalities.
arXiv preprint arXiv:1501.01571, 2015.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - CRLB for correlation
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - CRLB for correlation - Proof
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - Fisher-Rao geodesic distance
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - Optimal Transport distances
Other transportation distances: regularized discrete optimal
transport, Sinkhorn distances, etc.
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - Geometry of covariances
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - The standard methodology: Pearson + MST
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - The Target/Forget Dependence Coefficient
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - The Copula Transform
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - The correlation + distribution distance
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - The correlation + distribution distance
Gautier Marti Some contributions to the clustering of financial time series
Introduction to credit default swaps
About the consistency of clustering financial time series
Design of distances and alternative dependence coefficients
Summary and open questions
Appendix - Pearson correlation
Gautier Marti Some contributions to the clustering of financial time series

Weitere ähnliche Inhalte

Was ist angesagt?

Time series forecasting with ARIMA
Time series forecasting with ARIMATime series forecasting with ARIMA
Time series forecasting with ARIMAYury Kashnitsky
 
IRJET- Crop Yield Prediction based on Climatic Parameters
IRJET- Crop Yield Prediction based on Climatic ParametersIRJET- Crop Yield Prediction based on Climatic Parameters
IRJET- Crop Yield Prediction based on Climatic ParametersIRJET Journal
 
Lesson 2 stationary_time_series
Lesson 2 stationary_time_seriesLesson 2 stationary_time_series
Lesson 2 stationary_time_seriesankit_ppt
 
Introduction to Random Walk
Introduction to Random WalkIntroduction to Random Walk
Introduction to Random WalkShuai Zhang
 
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...Simplilearn
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingMaruthi Nataraj K
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 
Time Series Forecasting Project Presentation.
Time Series Forecasting Project  Presentation.Time Series Forecasting Project  Presentation.
Time Series Forecasting Project Presentation.Anupama Kate
 
Bachelor's thesis defence presentation
Bachelor's thesis defence presentationBachelor's thesis defence presentation
Bachelor's thesis defence presentationnayanbanik
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionLalit Jain
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Forecasting time series for business and operations data: A tutorial
Forecasting time series for business and operations data: A tutorialForecasting time series for business and operations data: A tutorial
Forecasting time series for business and operations data: A tutorialColleen Farrelly
 
stock market prediction
stock market predictionstock market prediction
stock market predictionSRIGINES
 

Was ist angesagt? (20)

Time series forecasting with ARIMA
Time series forecasting with ARIMATime series forecasting with ARIMA
Time series forecasting with ARIMA
 
IRJET- Crop Yield Prediction based on Climatic Parameters
IRJET- Crop Yield Prediction based on Climatic ParametersIRJET- Crop Yield Prediction based on Climatic Parameters
IRJET- Crop Yield Prediction based on Climatic Parameters
 
Lesson 2 stationary_time_series
Lesson 2 stationary_time_seriesLesson 2 stationary_time_series
Lesson 2 stationary_time_series
 
Time series
Time seriesTime series
Time series
 
Arch & Garch Processes
Arch & Garch ProcessesArch & Garch Processes
Arch & Garch Processes
 
Introduction to Random Walk
Introduction to Random WalkIntroduction to Random Walk
Introduction to Random Walk
 
Time Series Analysis/ Forecasting
Time Series Analysis/ Forecasting  Time Series Analysis/ Forecasting
Time Series Analysis/ Forecasting
 
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
 
Time series deep learning
Time series   deep learningTime series   deep learning
Time series deep learning
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Time Series Forecasting Project Presentation.
Time Series Forecasting Project  Presentation.Time Series Forecasting Project  Presentation.
Time Series Forecasting Project Presentation.
 
2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
Bachelor's thesis defence presentation
Bachelor's thesis defence presentationBachelor's thesis defence presentation
Bachelor's thesis defence presentation
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly Detection
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Forecasting time series for business and operations data: A tutorial
Forecasting time series for business and operations data: A tutorialForecasting time series for business and operations data: A tutorial
Forecasting time series for business and operations data: A tutorial
 
stock market prediction
stock market predictionstock market prediction
stock market prediction
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 

Ähnlich wie Some contributions to the clustering of financial time series - Applications to credit default swaps

On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...Gautier Marti
 
Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Gautier Marti
 
Real time clustering of time series
Real time clustering of time seriesReal time clustering of time series
Real time clustering of time seriescsandit
 
Inductive Modelling of an Entrepreneurial System
Inductive Modelling of an Entrepreneurial SystemInductive Modelling of an Entrepreneurial System
Inductive Modelling of an Entrepreneurial Systemmikeyearworth
 
Statistical Arbitrage Pairs Trading, Long-Short Strategy
Statistical Arbitrage Pairs Trading, Long-Short StrategyStatistical Arbitrage Pairs Trading, Long-Short Strategy
Statistical Arbitrage Pairs Trading, Long-Short Strategyz-score
 
IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...
IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...
IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...Nicky Campbell-Allen
 
27 ijaprr vol1-3-47-53dharam
27 ijaprr vol1-3-47-53dharam27 ijaprr vol1-3-47-53dharam
27 ijaprr vol1-3-47-53dharamijaprr_editor
 
An integrated inventory optimisation model for facility location allocation p...
An integrated inventory optimisation model for facility location allocation p...An integrated inventory optimisation model for facility location allocation p...
An integrated inventory optimisation model for facility location allocation p...Ramkrishna Manatkar
 
Optimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesOptimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesGautier Marti
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxAKHIL969626
 
2021.Hartl_The collaborative consistent vehicle routing problem with workload...
2021.Hartl_The collaborative consistent vehicle routing problem with workload...2021.Hartl_The collaborative consistent vehicle routing problem with workload...
2021.Hartl_The collaborative consistent vehicle routing problem with workload...AdrianSerrano31
 
Is robustness really robust? how different definitions of robustness impact d...
Is robustness really robust? how different definitions of robustness impact d...Is robustness really robust? how different definitions of robustness impact d...
Is robustness really robust? how different definitions of robustness impact d...Environmental Intelligence Lab
 
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial ShocksOptions on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial ShocksAM Publications,India
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Mining Transactional and Time Series Data
Mining Transactional and Time Series DataMining Transactional and Time Series Data
Mining Transactional and Time Series DataBrenda Wolfe
 
Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...
Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...
Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...Ankur Bindal
 
Thiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefs
Thiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefsThiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefs
Thiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefsEesti Pank
 

Ähnlich wie Some contributions to the clustering of financial time series - Applications to credit default swaps (20)

On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...
 
Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?
 
Beamer slide(hfd)
Beamer slide(hfd)Beamer slide(hfd)
Beamer slide(hfd)
 
Real time clustering of time series
Real time clustering of time seriesReal time clustering of time series
Real time clustering of time series
 
Inductive Modelling of an Entrepreneurial System
Inductive Modelling of an Entrepreneurial SystemInductive Modelling of an Entrepreneurial System
Inductive Modelling of an Entrepreneurial System
 
Statistical Arbitrage Pairs Trading, Long-Short Strategy
Statistical Arbitrage Pairs Trading, Long-Short StrategyStatistical Arbitrage Pairs Trading, Long-Short Strategy
Statistical Arbitrage Pairs Trading, Long-Short Strategy
 
IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...
IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...
IJPR (2015) A Distance-based Methodology for Increased Extraction Of Informat...
 
27 ijaprr vol1-3-47-53dharam
27 ijaprr vol1-3-47-53dharam27 ijaprr vol1-3-47-53dharam
27 ijaprr vol1-3-47-53dharam
 
An integrated inventory optimisation model for facility location allocation p...
An integrated inventory optimisation model for facility location allocation p...An integrated inventory optimisation model for facility location allocation p...
An integrated inventory optimisation model for facility location allocation p...
 
Optimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesOptimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time Series
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
 
Allignment of CIIP Structures
Allignment of CIIP StructuresAllignment of CIIP Structures
Allignment of CIIP Structures
 
2021.Hartl_The collaborative consistent vehicle routing problem with workload...
2021.Hartl_The collaborative consistent vehicle routing problem with workload...2021.Hartl_The collaborative consistent vehicle routing problem with workload...
2021.Hartl_The collaborative consistent vehicle routing problem with workload...
 
Is robustness really robust? how different definitions of robustness impact d...
Is robustness really robust? how different definitions of robustness impact d...Is robustness really robust? how different definitions of robustness impact d...
Is robustness really robust? how different definitions of robustness impact d...
 
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial ShocksOptions on Quantum Money: Quantum Path- Integral With Serial Shocks
Options on Quantum Money: Quantum Path- Integral With Serial Shocks
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Mining Transactional and Time Series Data
Mining Transactional and Time Series DataMining Transactional and Time Series Data
Mining Transactional and Time Series Data
 
Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...
Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...
Available online at [www.ijeete.com] IMPLEMENTATION OF SUPPLY CHAIN MANAGEMEN...
 
Thiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefs
Thiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefsThiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefs
Thiago de Oliveira Souza. Strategic asset allocation with heterogeneous beliefs
 

Mehr von Gautier Marti

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...Gautier Marti
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsGautier Marti
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...Gautier Marti
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?Gautier Marti
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGautier Marti
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in FinanceGautier Marti
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsGautier Marti
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaGautier Marti
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...Gautier Marti
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
 
Clustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesClustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesGautier Marti
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlationsGautier Marti
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationGautier Marti
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time SeriesGautier Marti
 

Mehr von Gautier Marti (16)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
 
Clustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesClustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence rates
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlations
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between Copulas
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 
Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time Series
 

Kürzlich hochgeladen

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Kürzlich hochgeladen (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Some contributions to the clustering of financial time series - Applications to credit default swaps

  • 1. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Some contributions to the clustering of financial time series and its applications to credit default swaps Gautier Marti November 10, 2017 Gautier Marti Some contributions to the clustering of financial time series
  • 2. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Context of the PhD thesis PhD studies in hedge funds: Hellebore Capital Management, 63 Avenue des Champs-Elys´ees, Paris, France (April 1, 2014 - February 29, 2016) Hellebore Capital Limited, 81 Fulham Road, London, United Kingdom (March 1, 2016 - September 20, 2017) AXA IM Chorus, 18 Westlands Road, Hong Kong (October 1, 2017 - present) Gautier Marti Some contributions to the clustering of financial time series
  • 3. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Outline 1. Introduction to the credit default swap dataset Contributions: 2. About the consistency of clustering financial time series 3. Improving standard distances between financial time series i) a simple correlation + distribution distance ii) a geometrical approach to define dependence coefficients from copulas 4. Perspectives Gautier Marti Some contributions to the clustering of financial time series
  • 4. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Some movation for using clustering Statistical modelling is difficult as the time series are non stationary, e.g. economic regimes are changing, it can be misleading to use data from a too distant past are near efficient, i.e. behaving nearly like random walks (cf. the efficient-market hypothesis (Fama, 1970) [5]) have a low signal-to-noise ratio, i.e. measure artifacts hide information in random fluctuations are in an unfavorable statistical setting, too few relevant observations (length) wrt the number of variables (time series) Gautier Marti Some contributions to the clustering of financial time series
  • 5. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Some motivation for using clustering Clustering helps to reduce dimensionality, and thus can be used as a preprocessing for: Risk management, e.g. filtering covariances, performance and risk attribution Investment, e.g. portfolio design, statarb, beta neutralization Data analysis, e.g. outliers detection and missing values imputation, exploration (www.datagrapple.com) Gautier Marti Some contributions to the clustering of financial time series
  • 6. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions 1 Introduction to credit default swaps 2 About the consistency of clustering financial time series 3 Design of distances and alternative dependence coefficients Alternative representation and correlation+distribution distance Copula-based dependence coefficients 4 Summary and open questions Gautier Marti Some contributions to the clustering of financial time series
  • 7. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Introduction to the credit default swap Gautier Marti Some contributions to the clustering of financial time series
  • 8. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Introduction to the CDS market Gautier Marti Some contributions to the clustering of financial time series
  • 9. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Introduction to the CDS raw dataset Putting Self-Supervised Token Embedding on the Tables [15] (ICMLA 2017) Gautier Marti Some contributions to the clustering of financial time series
  • 10. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions A ‘tick-by-tick’ dataset Autoregressive Convolutional Neural Networks for Asynchronous Time Series [1] (ICML Time Series WS 2017) Gautier Marti Some contributions to the clustering of financial time series
  • 11. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Historical daily time series of spread From the received ‘tick-by-tick’ prices, a synthetic order book is built. At 5pm London time, we save the mid-price of the best bid and best offer in the order book for each entity. N ≈ 800 liquid CDSs, with T ≈ 3000. Gautier Marti Some contributions to the clustering of financial time series
  • 12. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions 1 Introduction to credit default swaps 2 About the consistency of clustering financial time series 3 Design of distances and alternative dependence coefficients Alternative representation and correlation+distribution distance Copula-based dependence coefficients 4 Summary and open questions Gautier Marti Some contributions to the clustering of financial time series
  • 13. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Clustering of Financial Time Series Stylized fact I: Financial time series correlations have a strong hierarchical block diagonal structure (Mantegna, 1999) [6] https://gmarti.gitlab.io/ml/2017/09/07/how-to-sort-distance-matrix.html Stylized fact II: Most correlations are spurious (Bouchaud, 1999) [3] Motivation for clustering financial time series using correlation as a similarity measure: dimensionality reduction ≡ filtering noisy correlations Gautier Marti Some contributions to the clustering of financial time series
  • 14. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Challenge for the statistical practitioner The dilemma: the longer the time interval, the more precise the correlation estimates, but also the longer the time interval, the more unrealistic the stationarity hypothesis for these time series. Question: How does the clustering behave with statistical errors of the correlation estimates? Gautier Marti Some contributions to the clustering of financial time series
  • 15. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions A first theoretical approach - simplified setting We consider the following framework: financial time series ≡ random walks they follow a joint elliptical distribution (e.g. Gaussian, Student) parameterized by a correlation matrix the correlation matrix has a hierarchical block structure: Gautier Marti Some contributions to the clustering of financial time series
  • 16. Hierarchical clustering algorithms - A taxonomy We consider Hierarchical Agglomerative Clustering algorithms. Such as single linkage, average linkage, Ward. Space contracting vs. Space conserving vs. Space dilating [2] D(t+1) C (t) i ∪ C (t) j , C (t) k ≤ min D (t) ik , D (t) jk D(t+1) C (t) i ∪ C (t) j , C (t) k ∈ min D (t) ik , D (t) jk , max D (t) ik , D (t) jk D(t+1) C (t) i ∪ C (t) j , C (t) k ≥ max D (t) ik , D (t) jk Gautier Marti Some contributions to the clustering of financial time series
  • 17. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Simulations in the simplified setting Some influential parameters: clustering algorithm number of observations T number of variables N relative to T contrast between the correlations, and their values correlation estimator (e.g. Pearson, Spearman) 100 200 300 400 500 Sample size 0.0 0.2 0.4 0.6 0.8 1.0 Score Empirical rates of convergence for Single Linkage Gaussian - Pearson Gaussian - Spearman Student - Pearson Student - Spearman 100 200 300 400 500 Sample size 0.0 0.2 0.4 0.6 0.8 1.0 Score Empirical rates of convergence for Average Linkage Gaussian - Pearson Gaussian - Spearman Student - Pearson Student - Spearman 100 200 300 400 500 Sample size 0.0 0.2 0.4 0.6 0.8 1.0 Score Empirical rates of convergence for Ward Gaussian - Pearson Gaussian - Spearman Student - Pearson Student - Spearman Ratio of the number of correct clustering obtained over the number of trials as a function of T Gautier Marti Some contributions to the clustering of financial time series
  • 18. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions A consistency proof & first convergence bounds A 2-step proof. First step: Which geometrical configurations lead to the true clustering? For space-conserving algorithms (e.g. Single, Complete, Average Linkage), a sufficient separability condition reads max Dintra := max 1≤i,j≤N C(i)=C(j) d(Xi , Xj ) < min 1≤i,j≤N C(i)=C(j) d(Xi , Xj ) =: min Dinter Gautier Marti Some contributions to the clustering of financial time series
  • 19. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions A consistency proof & first convergence bounds A 2-step proof. Second step: How long does it take for the estimates of the correlation coefficients to be precise enough to be with high probability in a good configuration for the clustering algorithm? Answer: Concentration inequalities for correlation coefficients. Gautier Marti Some contributions to the clustering of financial time series
  • 20. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Convergence bounds Combining both steps, we get the following convergence rate: Convergence rate The probability of the clustering algorithm making an error is O log N T . Gautier Marti Some contributions to the clustering of financial time series
  • 21. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Proof. Step 1 - A bit more details By induction. Let’s assume the separability condition is satisfied at step t, then min D (t) intra ≤ max D (t) intra < min D (t) inter ≤ max D (t) inter From the space-conserving property, we get: D (t+1) intra ∈ min D (t) intra, max D (t) intra and D (t+1) inter ∈ min D (t) inter, max D (t) inter . separability condition is satisfied at t+1, the clustering algorithm has not linked points from two different clusters between step t and step t + 1. Gautier Marti Some contributions to the clustering of financial time series
  • 22. Proof. Step 2 - A bit more details Maximum statistical error - (Marti and Andler, IJCAI 2016) For space conserving algorithm the separability condition is met if ˆΣ − Σ ∞ < minρi ,ρj |ρi − ρj | 2 , where C(i) = C(j). This means that the statistical error has to be below the minimum correlation ‘contrast’ between the clusters. Weaker the ‘contrast’, more precise the correlation estimates have to be. N.B. From Cram´er–Rao lower bound, we get for Pearson correlation estimator: var(ˆρ) ≥ 1 I(ρ) = (ρ − 1)2 (ρ + 1)2 3(ρ2 + 1) . When correlation is high, it is easier to estimate. (Marti, SSP 2016) Gautier Marti Some contributions to the clustering of financial time series
  • 23. Correlation estimates concentration bounds number of variables N, observations T, minimum separation d Concentration bounds [4] If Σ and ˆΣ are the population and empirical Spearman correlation matrices respectively, then for N ≥ 24 log T + 2, we have with probability at least 1 − 1 T2 , ˆΣ − Σ ∞ ≤ 24 log N T . P(“correct clustering”) ≥ 1 − 2N2 e−Td2/24 Not sharp enough for reasonable values of N, T, d. For example, for N = 500, T = 2500, d = 0.2, we obtain ≈ −7750. Gautier Marti Some contributions to the clustering of financial time series
  • 24. Future developments & open questions Bounds are not sharp enough. We can try to refine them using: (theoretical) Intrinsic dimension of the HCBM model [16]; (theoretical Use PSD-ness to refine the bounds for the matrix (theoretical/empirical) A distance between dendrograms (instead of correct/incorrect) for a finer analysis; (empirical) A study of ‘correctness’ isoquants: Precise convergence rates of clustering methodologies can provide a useful model selection criterion for practitioners! Gautier Marti Some contributions to the clustering of financial time series
  • 25. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients 1 Introduction to credit default swaps 2 About the consistency of clustering financial time series 3 Design of distances and alternative dependence coefficients Alternative representation and correlation+distribution distance Copula-based dependence coefficients 4 Summary and open questions Gautier Marti Some contributions to the clustering of financial time series
  • 26. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Motivations Not only correlation! Different kinds of returns distributions may exist in the data. We may want to refine ‘correlation’ clusters with this information. Gautier Marti Some contributions to the clustering of financial time series
  • 27. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients A (too) naive distance and its pitfalls Applying L2 directly on the time series mixes correlation and volatility. We are looking for a better representation so that a L2 is meaningful. Gautier Marti Some contributions to the clustering of financial time series
  • 28. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients A (too) naive distance and its pitfalls Gautier Marti Some contributions to the clustering of financial time series
  • 29. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Revisting Sklar theorem (1959)... Gautier Marti Some contributions to the clustering of financial time series
  • 30. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients and Deheuvels empirical copulas (1981) Gautier Marti Some contributions to the clustering of financial time series
  • 31. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients A novel representation for time series Basically, we transform each time series of returns to a (normalized ranks, square root of marginal density) vector. Applying a L2 between two of these vectors is now equivalent to a distance in Spearman correlation + Hellinger between the densities. cf. (Marti et al., 2016) [13] (Pattern Recognition Letters) for more details on this representation and the associated distance. Gautier Marti Some contributions to the clustering of financial time series
  • 32. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Analysing the differences - Using the Sankey diagram cf. (Marti et al., 2015) [14] (ICMLA 2015) for guidelines on how to compare several clustering methodologies for financial time series. Gautier Marti Some contributions to the clustering of financial time series
  • 33. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Exploring and improving dependence coefficients – Motivations We want to use dependence measures which are: robust to noise (not Pearson then) and preserve as much information as possible, so that clusterings are more stable; can be tuned to look for specific dependencies, e.g. tail-dependence or more exotic ones. As copulas are a convenient way to capture all the dependence between two variables, we aim at leveraging them. Gautier Marti Some contributions to the clustering of financial time series
  • 34. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Minimum, Independence, Maximum copulas Gautier Marti Some contributions to the clustering of financial time series
  • 35. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Relation to existing dependence coefficients Some dependence coefficients can be readily expressed as: deviation from Fr´echet-Hoeffding bounds Spearman’s ρS = 1 − 6 [0,1]2 (ui − uj )2 dC(ui , uj ), Gini’s γ, Kendall distribution distance, deviation from independence ui uj Spearman ρS = 12 [0,1]2 (C(ui , uj ) − ui uj )dui duj , Copula MMD, Schweizer-Wolff’s σ, Hoeffding’s φ2 Gautier Marti Some contributions to the clustering of financial time series
  • 36. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Idea: Relative position of empirical copula wrt ‘targets’ and ‘forgets’ In a classical setting, we choose the positive and negative dependence copulas as ‘targets’, and the independence one as a ‘forget’ dependence. Gautier Marti Some contributions to the clustering of financial time series
  • 37. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Optimal Transport between empirical copulas cf. (Marti et al., 2016) [12] (ICASSP 2016) Gautier Marti Some contributions to the clustering of financial time series
  • 38. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Why choosing Optimal Transport over f -divergences? Distances between Gaussian copulas C1, C2, C3: cf. (Marti et al., 2016) [7] (SSP 2016) Gautier Marti Some contributions to the clustering of financial time series
  • 39. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Standard setting: TFDC vs. Spearman cf. (Marti et al., 2017) [8] (NIPS Time Series WS 2016) Gautier Marti Some contributions to the clustering of financial time series
  • 40. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Power of TFDC and state-of-the-art dependence measures Gautier Marti Some contributions to the clustering of financial time series
  • 41. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Some applications of the Target/Forget Dependence Coefficient Applications in non-standard settings: We can look for particular associations between random variables. cf. (Marti et al., 2017) [8] (NIPS Time Series 2016) Gautier Marti Some contributions to the clustering of financial time series
  • 42. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Alternative representation and correlation+distribution distance Copula-based dependence coefficients Impact of different coefficients on clustering Different results... Stability and empirical convergence rates may help for choosing one over the others. Gautier Marti Some contributions to the clustering of financial time series
  • 43. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions 1 Introduction to credit default swaps 2 About the consistency of clustering financial time series 3 Design of distances and alternative dependence coefficients Alternative representation and correlation+distribution distance Copula-based dependence coefficients 4 Summary and open questions Gautier Marti Some contributions to the clustering of financial time series
  • 44. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Summary of contributions The contribution of the PhD thesis: bring a greater focus on statistical reliability (convergence rates and consistency) [9] (IJCAI 2016) consider alternative representation and distances [13] (Pattern Recognition Letters), [8] (NIPS Time Series 2016) visualizations [10] and a framework to test for clustering stability [14] (ICMLA 2015) an extensive and regularly updated survey of the literature: https://arxiv.org/pdf/1703.00485.pdf [11] (350+ references) Gautier Marti Some contributions to the clustering of financial time series
  • 45. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Perspectives and open questions “How many clusters?” What is multivariate correlation? How to use it for hierarchical clustering? Using several time series representing a given entity, and dependence between random vectors? Riemannian geometry of correlation matrices (not a totally geodesic submanifold of the well-explored manifold of covariances) Entities switching clusters: noise or signal? More precise results for (empirical) convergence rates? Gautier Marti Some contributions to the clustering of financial time series
  • 46. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Mikolaj Binkowski, Gautier Marti, and Philippe Donnat. Autoregressive convolutional neural networks for asynchronous time series. arXiv preprint arXiv:1703.04122, 2017. Zhenmin Chen and John W Van Ness. Space-conserving agglomerative algorithms. Journal of classification, 13(1):157–168, 1996. Laurent Laloux, Pierre Cizeau, Marc Potters, and Jean-Philippe Bouchaud. Random matrix theory and financial correlations. International Journal of Theoretical and Applied Finance, 3(03):391–397, 2000. Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry Wasserman, et al. Gautier Marti Some contributions to the clustering of financial time series
  • 47. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics, 40(4):2293–2326, 2012. Burton G Malkiel and Eugene F Fama. Efficient capital markets: A review of theory and empirical work. The journal of Finance, 25(2):383–417, 1970. Rosario N Mantegna. Hierarchical structure in financial markets. The European Physical Journal B-Condensed Matter and Complex Systems, 11(1):193–197, 1999. Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat. Optimal transport vs. fisher-rao distance between copulas for clustering multivariate time series. Gautier Marti Some contributions to the clustering of financial time series
  • 48. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions In Statistical Signal Processing Workshop (SSP), 2016 IEEE, pages 1–5. IEEE, 2016. Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat. Exploring and measuring non-linear correlations: Copulas, lightspeed transportation and clustering. In NIPS 2016 Time Series Workshop, pages 59–69, 2017. Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat. Clustering financial time series: How long is enough? 2016. Gautier Marti, Philippe Donnat, Frank Nielsen, and Philippe Very. Gautier Marti Some contributions to the clustering of financial time series
  • 49. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions HCMapper: An interactive visualization tool to compare partition-based flat clustering extracted from pairs of dendrograms. arXiv preprint arXiv:1507.08137, 2015. Gautier Marti, Frank Nielsen, Mikolaj Bi´nkowski, and Philippe Donnat. A review of two decades of correlations, hierarchies, networks and clustering in financial markets. arXiv preprint arXiv:1703.00485, 2017. Gautier Marti, Frank Nielsen, and Philippe Donnat. Optimal copula transport for clustering multivariate time series. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 2379–2383. IEEE, 2016. Gautier Marti Some contributions to the clustering of financial time series
  • 50. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Gautier Marti, Philippe Very, and Philippe Donnat. Toward a generic representation of random variables for machine learning. arXiv preprint arXiv:1506.00976, 2015. Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen. A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series. In 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015, pages 32–37, 2015. Marc Szafraniec, Gautier Marti, and Philippe Donnat. Putting self-supervised token embedding on the tables. arXiv preprint arXiv:1708.04120, 2017. Gautier Marti Some contributions to the clustering of financial time series
  • 51. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Joel A Tropp. An introduction to matrix concentration inequalities. arXiv preprint arXiv:1501.01571, 2015. Gautier Marti Some contributions to the clustering of financial time series
  • 52. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - CRLB for correlation Gautier Marti Some contributions to the clustering of financial time series
  • 53. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - CRLB for correlation - Proof Gautier Marti Some contributions to the clustering of financial time series
  • 54. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - Fisher-Rao geodesic distance Gautier Marti Some contributions to the clustering of financial time series
  • 55. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - Optimal Transport distances Other transportation distances: regularized discrete optimal transport, Sinkhorn distances, etc. Gautier Marti Some contributions to the clustering of financial time series
  • 56. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - Geometry of covariances Gautier Marti Some contributions to the clustering of financial time series
  • 57. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - The standard methodology: Pearson + MST Gautier Marti Some contributions to the clustering of financial time series
  • 58. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - The Target/Forget Dependence Coefficient Gautier Marti Some contributions to the clustering of financial time series
  • 59. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - The Copula Transform Gautier Marti Some contributions to the clustering of financial time series
  • 60. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - The correlation + distribution distance Gautier Marti Some contributions to the clustering of financial time series
  • 61. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - The correlation + distribution distance Gautier Marti Some contributions to the clustering of financial time series
  • 62. Introduction to credit default swaps About the consistency of clustering financial time series Design of distances and alternative dependence coefficients Summary and open questions Appendix - Pearson correlation Gautier Marti Some contributions to the clustering of financial time series