... two decades of correlation, hierarchies, networks and clustering in financial markets
Summary of some of my past research work at Complex Networks 2022.
The study of correlations, hierarchies, networks and communities (or clustering) has more than 20 years of history in econophysics.
However, for the practitioner, it seems that these tools are not fully ready yet:
Many questions around their proper use for trading or risk monitoring are left unanswered.
Deep Learning might help solve some hard problems such as finding more reliably communities (or clusters) and their number.
Running large simulations (based on GANs, VAEs or realistic market simulators) could also help understand when complex networks methods can give wrong insights (e.g. not enough data, or not stationary enough; too low correlations).
Conference: Complex Networks 2022 in Palermo, Sicily, Italy.
1. G
a
utier M
a
rti, COMPLEX NETWORKS 2022
Whatdeeplearningcanbringto
two dec
a
des of correl
a
tion, hier
a
rchies, networks
a
nd
clustering in
f
in
a
nci
a
l m
a
rkets
3. Quanttraderconcerns
Problems poorly
a
ddressed by the liter
a
ture
โข Which datasets are relevant to
build
fi
nancial networks between
companies, to predict what?
โข We cannot use future data, i.e.
using rolling or expanding window:
How long is enough?
โข (Too) many clustering and
network-methods available: Which
one should we use, and why?
โข Very expensive, IP-protected, not
very suitable for academic research;
Explains focus on stocks returns...
โข Many studies are full sample
without out-of-sample validation:
Prediction is not the focus.
โข No well de
fi
ned benchmarks:
It makes hard to compare methods.
5. โข Deep Learning for simulations,
and
fi
nding 'laws' in large
amount of data.
Howmuchdataisnecessary?
One possible criterion to choose
a
mongst methods
The Hierarchical Correlation Block Model (HCBM)
is a convenient assumption to do some math
(matrix concentration inequalities) but it is
challenging to obtain practical results.
Within this model, simulations help to chooseg best method:
Ward + Spearman correlation with at least 200 days of past returns.
6. Manychallengestoovercome...
before implementing the 'simul
a
tor'
โข The simulator module:
โข Financial time series simulators:
Generative Adversarial Networks for Financial Trading
Strategies Fine-Tuning and Combination (2019)
โข Financial correlations simulator:
CorrGAN: Sampling Realistic Financial Correlation Matrices
Using Generative Adversarial Networks (2019)
โข Both at the same time?
=> It does not exist yet (TTBOMK)
X
7. Fromsimulations...
to supervised le
a
rning of clustering
a
ccur
a
cy
โข For a given fuzzy HCBM model, one can collect
X := noisy estimates (empirical correlation
matrices from the simulated time series of
length T), y := clustering accuracy wrt model.
โข How can we go from
(empirical correlation matrix, T)
to an expected clustering accuracy?
=> supervised learning.
?
What is a relevant feature space to describe empirical correlation matrices?
For example:
- correlation coe
ffi
cients summary statistics
- percentage of variance explained by the k-
fi
rst eigenvalues
-
fi
rst eigenvector summary statistics
- minimum spanning tree statistics (centrality, average shortest path length)
- cophenetic correlation coe
ffi
cient
- condition number
- ...
A poor choice of a somewhat arbitrary feature space may bias learning and results...
Deep learning provides an end-to-end approach from
raw empirical matrices to target variables (clustering accuracy).
- CNN (seeing the correlation matrix as an image)
- GNN/GCN (the correlation matrix as a network)
We plan to investigate using convolutional and graph neural networks,
and compare predictive results with standard machine learning approaches.
https://marti.ai/q
fi
n/2020/08/17/empirical-matrices-portfolio-comparisons.html
8. Applicationtoclustering...
for qu
a
nts
โข One can use the predictive model to
determine the smallest possible
window in order to get a valid
clustering, given what the empirical
correlation matrices look like.
โข It should be useful for:
โข statistical arbitrage
โข risk factors and risk models
โข portfolio allocation methods
(HRP, HCAA, HERC)
Clustering of global CDS based on Hellebore Capital's proprietary data
10. Numberofclusters,hierarchies
a
nd their
a
utom
a
tic detection
โข Automated detection of:
โข
fl
at clustering
โข hierarchical clustering
โข altogether with the relevant number of
clusters or hierarchical levels.
โข Not all clusters found by standard methods
are true clusters! Filtering criteria are ad hoc
and not stable for trading/risk systems.
โข A task similar to Object Detection and
Recognition with Deep Learning in
Computer Vision
11. NewopenPiTdatasets
for empiric
a
l
f
in
a
nci
a
l networks rese
a
rch
โข Networks from text instead of
correlation of stock returns
โข Use of novel large language models
easily available from Hugging Face
to build networks of similar
products & services companies
(cf. Hoberg and Phillips Text Based
Industry Classi
fi
cations for early
work using crude NLP techniques)
Illustrations from
Text-Based Representations of Market Structures, Gerard Hoberg
12. Whyclusteringatall?
end-to-end deep le
a
rning
โข End-to-end approach with a particular
downstream task in mind can, maybe,
recover the 'optimal' clustering, which
is then used implicitly...
โข Is it better than relying on expert
knowledge to
fi
nd a good combination
of relevant distance, clustering algo.,
hyper-params, su
ffi
cient rolling
window, and post-processing of the
signals based on clusters obtained? ?