SlideShare a Scribd company logo
1 of 36
Center for Financial Studies at the Goethe University
PhD Mini-course
Frankfurt, 25 January 2013




Financial Networks

V. Inferring Links




                                                        Dr. Kimmo Soramäki
                                                        Founder and CEO
                                                        FNA, www.fna.fi
I. Mapping        II. Mapping
Systemic Risk   Financial Markets




                                    2
Observing vs inferring

• Observing links
  – Exposures, payment flow, trade, co-ownership, joint
    board membership, etc.
  – Cause of link is known


• Inferring links
  – Observing the effects and inferring a relationship
  – Cause of link is unknown
  – Time series on asset prices, trade volumes, balance
    sheet items

                                                          3
Agenda

V. Inferring Links
•   Prices and Returns
•   Controlling for common factors
•   Correlation and dependence
•   Significant correlations
•   Multiple Comparisons


VI. Correlation Networks
•   Distance and Hierarchical Clustering
•   Minimum Spanning Tree & PMFG
•   Filtering
•   Visual Layouts
                                           4
Prices and Returns




                     5
Prices vs Returns
• Benefit of using returns vs prices for correlations:

•   Normalization. All variables are denoted in same units
•   Required by some matrix algebra
•   Returns (on asset prices) have low serial correlation
•   Returns are what matter once investement is made

• We'll use returns




                                                             6
Log vs Arithmetics Returns
• Most often logarithmic returns are used in Finance. Why?

• Benefit of using log returns vs arithmetic returns
   – Assumptions in statistical estimates (log-normal returns)
   – Easier calculations (integration, compounding) lead to smaller
     algorithmic complexity

• Normaliy assumptions make log returns better

• We'll use log returns




                                                                      7
Data Issues
•   Which prices
     – daily (low, high, close)
     – time-zone issues
     – high-frequency


•   Missing prices
     –   Errors in source data
     –   Partial holidays
     –   Weekends/holidays
     –   Exit and entry of assets


•   Handling options
     –   Look up from other sources
     –   Edit values, e.g. replace with same as previous (prices) or 0 (returns)
     –   Exclude days/series from analysis
     –   Decide on weekend returns depending on data
                                                                                   8
Ticker   Name                     Sector                         Weight


DAX stocks
             ADS      Adidas                   clothing and footwear               2.0
             ALV      Allianz                  insurance                           6.7
             BAS      BASF                     speciality chemicals                9.6
             BAYN     Bayer                    speciality chemicals                7.6
             BEI      Beiersdorf               personal products                   0.9
             BMW      BMW                      automobile manufacturers            3.3
             CBK      Commerzbank              credit banks                        1.0
             CON      Continental              car parts manufacturers             0.8
             DAI      Daimler                  automobile manufacturers            5.8
             DB1      Deutsche Börse           securities brokers                  1.5
             DBK      Deutsche Bank            credit banks                        5.1
             DPW      Deutsche Post            logistics                           1.9
             DTE      Deutsche Telekom         fixed-line telecommunication        5.4
             EOAN     E.ON                     multi-utilities                     6.2
             FME      Fresenius Medical Care   health care                         2.2
             FRE      Fresenius                health care                         1.6
             HEI      HeidelbergCement         building materials                  0.9
             HEN3     Henkel                   personal products                   1.5
             IFX      Infineon Technologies    semiconductors                      1.3
             LHA      Deutsche Lufthansa       airlines                            0.8
             LIN      Linde                    industrial gases                    3.8
             LXS      Lanxess                  specialty chemicals                 0.7
             MRK      Merck                    pharmaceuticals                     1.0
             MUV2     Munich Re                re-insurance                        2.9
             RWE      RWE                      multi-utilities                     2.2
             SAP      SAP                      software                            7.7
             SDF      K+S                      commodity chemicals                 1.2
             SIE      Siemens                  diversified industrials            10.0
             TKA      ThyssenKrupp             diversified industrials             1.3
             VOW3     Volkswagen Group         automobile manufacturers            3.4

                                                                                         9
Data: Prices of 30 stocks in DAX in 2011


          date         ADS           ALV           BAYN       BAS           ...
          2011-12-30         50.26         73.91       49.4         53.89
          2011-12-29         49.94         73.21      48.73         53.17
          2011-12-28         49.79         73.15      47.36         52.39
          2011-12-27          50.3         75.45      48.27         53.17
          2011-12-23         50.29         75.97      48.13         52.83
          2011-12-22         49.98         75.81      47.92         52.63
          2011-12-21         49.43         75.37      47.04          52.4
          2011-12-20         49.56         75.43      46.99          52.9
          2011-12-19         47.96         72.23      44.75         51.15
          2011-12-16          48.1         72.77      45.04         51.71
          2011-12-15         47.89         73.47      44.99         51.13
          2011-12-14            48         71.53       44.8         50.58
          2011-12-13         48.45         73.04      45.73          51.1

           ...

                                                                                  10
calculatereturns -command
•   File (-file) : File with price data. The file must contain a table where each
    row has the observation date as its first element, and asset prices as
    subsequent elements separated by a field delimiter. The first line read must
    contain names for each asset column (except the first column which
    contains the date). Mandatory.
•   Save As (-saveas) : File where result should be saved. Mandatory.
•   Date format (-dateformat) : Format of date in input file. Default value yyyy-
    MM-dd. Optional.
•   Method of calculation (-method) : Arithmetic (arithmetic) or logarithmic
    (log) returns. Mandatory. By default 'log'.
•   Number of observations (-obs) : Larger than or equal to 2. Optional. By
    default '2'.
•   Length of interval (-interval) : Larger than or equal to 1. Optional. By
    default '1'.
•   Date order (-dateorder) : Date order in the file. Optional. Allowed values:
    [asc, desc]. By default 'asc'.

                                                                                11
Examples
# Calculate daily log returns
calculatereturns -file daxprices-2011.csv -saveas daxreturns-2011.csv -method
log -dateorder desc

# Calculate two-day log returns
calculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl2.csv -method
log -obs 2 -interval 2 -dateorder desc

# Calculate two-day log returns with a one-day window
calculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl1.csv -method
log -obs 2 -interval 1 -dateorder desc

# Calculate daily arithmetic returns
calculatereturns -file daxprices-2011.csv -saveas daxreturns-2da1.acsv -method
arithmetic -dateorder desc
                                                                                 12
Data: Returns of 30 stocks in DAX in 2011


          date         ADS.DE ALV.DE BAYN.DE BAS.DE      ...
          2011-12-29      0.0064  0.0095  0.0137  0.0135
          2011-12-28      0.0030  0.0008  0.0285  0.0148
          2011-12-27     -0.0102 -0.0310 -0.0190 -0.0148
          2011-12-23      0.0002 -0.0069  0.0029  0.0064
          2011-12-22      0.0062  0.0021  0.0044  0.0038
          2011-12-21      0.0111  0.0058  0.0185  0.0044
          2011-12-20     -0.0026 -0.0008  0.0011 -0.0095
          2011-12-19      0.0328  0.0433  0.0488  0.0336
          2011-12-16     -0.0029 -0.0074 -0.0065 -0.0109
          2011-12-15      0.0044 -0.0096  0.0011  0.0113
          2011-12-14     -0.0023  0.0268  0.0042  0.0108
          2011-12-13     -0.0093 -0.0209 -0.0205 -0.0102
          2011-12-12      0.0058 -0.0163 -0.0020 -0.0099

           ...

                                                               13
Common Factors
• A common factor may be affecting all the returns
   – E.g bullish/bearish market


• If we know the market return, we can deduct it from the returns to
  look at excess returns to market

• If we don't know, we can try to control for this unknown common
  factor via statistical methods

• Principal Component Analysis allows us to identify and remove the
  common factor




                                                                       14
Eigenvectors - example
                         A plot of the normalised
                         data (mean subtracted)
                         -> First component
                         explains maximum
                         variance

                         The data has two variables
                         (x and y), the covariance
                         matrix is 2x2 and thus
                         there are two Eigenvectors

                         The eigenvectors of
                         the covariance matrix are
                         overlayed as dotted lines


                                                15
Eigenvectors of asset correlations
•   The eigenvectors of a correlation matrix are orthogonal to each other. The
    data can be projected in terms of its eigenvectors

•   Each eigenvector represents a Priciple Component
      – if values are similar, the component affects all returns similarly


•   Eigenvalues scale with the share of variance explaned by each component
      – Divide each by the sum to get share

•   Laloux et al (1999) find on S&P500 (and other markets)
      –    The largest eigenvalue is well separated from the bulk and corresponds to the whole market
           as the corresponding eigenvector has roughly equal components
      –    The next largest eigenvectors carry information about the real correlations and can be used
           in identifying clusters of strongly interacting assets
      –    The smaller eigenvectors are noise (95%)

L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Noise Dressing of Financial Correlation Matrices. Phys. Rev. Lett. 83 (1999)
1467 ; T. Heimo, J. Saramaki, J-P. Onnela, K. Kaski (2007). Spectral and network methods in the analysis of correlation
matrices of stock returns, Physica A 383, pp. 147–151                                                                            16
pca -command
•   File (-file) : File with input data.
•   Components (-components) : Components to keep, e.g. 2- or 1 or 2-10
    Optional.
•   Observations (-obs) : Number of observations in each period.
•   Interval (-interval) : Number of observations between each period.
•   Saveas (-saveas) : Name of file to store amended data.

Examples

# calculate returns corresponding to all but principal component
pca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2- -saveas dax-montly1.csv


# calculate returns corresponding to second to 10th largest component
pca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2-10 -saveas dax-montly2.csv




                                                                                               17
Correlation and Covariance




                             18
19
buildbycorrelation -command
•   File (-file) : File from which to read data. To skip lines or use non-default field
    delimiter use the 'delimiter' and 'skiplines' arguments. Mandatory.
•   Save as property (-saveas) : Property name of saved result. Optional. By default
    'correlation'.

•   Number of observations (-obs) : Larger than or equal to 2. Optional.
•   Length of interval (-interval) : Larger than or equal to 1. Optional.

•   -commitinterval : Larger than or equal to 1. Optional. By default '10000'.

•   Save standard deviation (-savestdev) : Option to save standard deviation as vertex
    property. Optional. By default 'false'.
•   Save average returns (-savereturns) : Option to save average returns as vertex
    property. Optional. By default 'true'.

•   Preserve (-preserve) : Do not overwrite existing networks in database. Optional. By
    default 'true'.
                                                                                          20
•   Missing values (-missing) : Handling of missing or non-numeric values in data.
    Alternatives:
     –   Zero : missing values are considered as 0.
     –   NaN : default value corresponds to 'NaN and missing values' setting in 'Default format
         settings'.
     –   Alert : missing values cause the calculation to fail.
    Optional. Allowed values: [Zero, NaN, Alert]. By default 'Zero'.

•   Start date (-start) : Starting date of calculation. Optional.
•   End date (-end) : Ending date of calculation. Optional.
•   Date format (-dateformat) : Format of date in input file. Default value corresponds to
    'Date format' setting in 'Default format settings'. Optional.
•   Date order (-dateorder) : Order of dates in the input file. If dates are ascending order,
    the newest network has fewer observations than other networks. If dates are
    descending order, the oldest network has fewer observations. Optional. Allowed
    values: [asc, desc]. By default 'desc'.


                                                                                                  21
Correlation Matrix & Network




              A          B          C
          A          0       0.72       0.31
          B       0.72          0       0.60
          C       0.31       0.60          0




                                               22
Examples
# Calculate 100 day pairwise correlations with 1 day sliding window, consider
any missing/non-numeric return as 0
buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval
1 -preserve false

# Calculate 100 day pairwise correlations with no sliding window, consider any
missing/non-numeric return as 0
buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval
100 -preserve false

# Calculate pairwise correlations for whole data, alert about missing values
buildbycorrelation -file daxreturns-2011.csv -missing Alert -preserve false




                                                                               23
Heatmap -command
•   Creates time series of heatmaps
•   Like viz -command
•   Allows mapping of data to cell size, shape, color, label, hover
•   Allows definition of color ranges and domains



# create heatmap from correlations
heatmap -sortv vertex_id -p correlation -symmetric true -
cellsizedefault 13 -transition 0 -cellhover correlation -colordomain (-1)-
1 -palette darkblue-lightgray-darkred -saveas daxheat-def-Y




                                                                        24
28&29




All components   Principal Component Removed


                                               25
About Color Perception

A and B are the same
shade of gray




  http://upload.wikimedia.org/wikipedia/comm
  ons/9/93/Optical_illusion_greysquares.gif
Anscombes Quartet
Constructed in 1973 by Francis Anscombe to demonstrate both the
importance of graphing data before analyzing it and the effect
of outliers on statistical properties

                                             Mean of x                         9
                                             Variance of x                     11

                                             Mean of y                         ~7.50
                                             Variance of y                     ~4.1

                                             Correlation                       ~0.816

                                             Linear regression:
                                             y = 3.00 + 0.500x

                                          Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with
                                          Identical Statistics but Dissimilar Graphics: A Follow up to the
                                          Anscombe Dataset". American Statistician 61 (3): 248–254.
                                                                                             27
Significance of Correlations

• We are estimating 'true' correlations by means of
  samples
• Our estimates have errors

• Question: What is the chance that random data
  exhibits an observed correlation?




                                                  28
Example - distribution of correlation in 30
trials with random numbers
         20 pairs                50 pairs




        100 pair                 200 pairs




                                              29
Test for significance

• Errors:
   – Type I: False positive - we identify a correlation where there is
     none
   – Type II: False negative - we identify no correlation when there is
     one


• We can control for Type I
   – Fisher transformation -> Normal distr. -> Confidence Interval


• Possible to estimate Type II error rate - only more data
  will bring this down

                                                                      30
30-32


alpha -parameter

Significance level (-alpha) : Two-sided significance level for
confidence interval.

Example:
buildbycorrelation -file daxreturns-2011.csv -missing Alert
-obs 30 -interval 1 -alpha 0.05 -preserve false




                                                              31
No signifance test, Principal   Only significant correlations ( =0.05),
Component Removed               Principal Component Removed

                                                                     32
Problem of Multiple Comparisons
• Occurs when considering multiple simultanous estimates
    – e.g. with correlation matrix of n=30 we have 30*30/2-30 = 420
      estimates
    – > too many false positives by random chance alone
    – the system may not be accurate enough
• Familywise error rate (FWER) - Probability of making one or more
  false positives among all the hypotheses when performing multiple
  hypotheses tests.
    – Solved with e.g. Bonferroni correction
• False discovery rate (FDR) - Proportion of significant correlations
  that are false positives
    – Benjamini-Hochberg -procedure


• Question: which is more important? Missing correlations that exist,
  or disregarding ones that do?                                      33
33-35


bonferroni -parameter
Enable Bonferroni correction (-bonferroni) : Option to use Bonferroni
correction in significance testing. Optional. By default 'false'.


Example
buildbycorrelation -file daxreturns.csv -missing Alert -obs
30 -interval 1 -alpha 0.05 -bonferroni true -preserve false




                                                                        34
Only significant correlations   Bonferroni correction, Only
( =0.05), Principal Component   significant correlations ( =0.05),
Removed                         Principal Component Removed
                                                                     35
Blog, Library and Demos at www.fna.fi




Dr. Kimmo Soramäki
kimmo@soramaki.net
Twitter: soramaki

More Related Content

Similar to Financial Networks V - Inferring Links

SICK AG Corp presentation basics_2012-10_en
SICK AG Corp presentation basics_2012-10_enSICK AG Corp presentation basics_2012-10_en
SICK AG Corp presentation basics_2012-10_en
SICK, Inc.
 
JDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdf
JDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdfJDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdf
JDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdf
InSync2011
 
EE 460 Project Plan V1
EE 460 Project Plan V1EE 460 Project Plan V1
EE 460 Project Plan V1
Large Kellider
 
Chronos Systems Presentation 2009 English
Chronos Systems Presentation 2009 EnglishChronos Systems Presentation 2009 English
Chronos Systems Presentation 2009 English
anemeti
 

Similar to Financial Networks V - Inferring Links (20)

DISCUS for FAI
DISCUS for FAIDISCUS for FAI
DISCUS for FAI
 
OSMC 2021 || Open Source Application Performance Monitoring in the Enterprise
OSMC 2021 || Open Source Application Performance Monitoring in the EnterpriseOSMC 2021 || Open Source Application Performance Monitoring in the Enterprise
OSMC 2021 || Open Source Application Performance Monitoring in the Enterprise
 
SICK AG Corp presentation basics_2012-10_en
SICK AG Corp presentation basics_2012-10_enSICK AG Corp presentation basics_2012-10_en
SICK AG Corp presentation basics_2012-10_en
 
Intro_toDB.ppt
Intro_toDB.pptIntro_toDB.ppt
Intro_toDB.ppt
 
Unit 04: From Requirements to the UX Model
Unit 04: From Requirements to the UX ModelUnit 04: From Requirements to the UX Model
Unit 04: From Requirements to the UX Model
 
A Holistic Approach to Addressing the Cloud's Paradox of Choice
A Holistic Approach to Addressing the Cloud's Paradox of ChoiceA Holistic Approach to Addressing the Cloud's Paradox of Choice
A Holistic Approach to Addressing the Cloud's Paradox of Choice
 
JDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdf
JDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdfJDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdf
JDE & Peoplesoft 3 _ Marc Weintraub _ PeopleSoft Roadmap.pdf
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data Quality
 
[SiriusCon 2018] Extensive Use of Custom Properties Views in a Banking DSL Wo...
[SiriusCon 2018] Extensive Use of Custom Properties Views in a Banking DSL Wo...[SiriusCon 2018] Extensive Use of Custom Properties Views in a Banking DSL Wo...
[SiriusCon 2018] Extensive Use of Custom Properties Views in a Banking DSL Wo...
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 
Finalyser - the Plug-In and ExtendScripts Collection for Adobe FrameMaker an...
Finalyser - the Plug-In and ExtendScripts  Collection for Adobe FrameMaker an...Finalyser - the Plug-In and ExtendScripts  Collection for Adobe FrameMaker an...
Finalyser - the Plug-In and ExtendScripts Collection for Adobe FrameMaker an...
 
Can you trust the cloud provider?
Can you trust the cloud provider?Can you trust the cloud provider?
Can you trust the cloud provider?
 
CSA14_Congress%20Top_5%2075_Brokering_PPT
CSA14_Congress%20Top_5%2075_Brokering_PPTCSA14_Congress%20Top_5%2075_Brokering_PPT
CSA14_Congress%20Top_5%2075_Brokering_PPT
 
Finit Solutions - What is New in Hyperion Financial Management 11.1.2.2
Finit Solutions - What is New in Hyperion Financial Management 11.1.2.2Finit Solutions - What is New in Hyperion Financial Management 11.1.2.2
Finit Solutions - What is New in Hyperion Financial Management 11.1.2.2
 
Cloud computing for Department of Transportation Federal Motor Carrier Safety...
Cloud computing for Department of Transportation Federal Motor Carrier Safety...Cloud computing for Department of Transportation Federal Motor Carrier Safety...
Cloud computing for Department of Transportation Federal Motor Carrier Safety...
 
Leadership Management Dashboard Open Source Design Document
Leadership Management Dashboard Open Source Design DocumentLeadership Management Dashboard Open Source Design Document
Leadership Management Dashboard Open Source Design Document
 
EE 460 Project Plan V1
EE 460 Project Plan V1EE 460 Project Plan V1
EE 460 Project Plan V1
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Military cloud computing 2011 kevin jackson
Military cloud computing 2011 kevin jacksonMilitary cloud computing 2011 kevin jackson
Military cloud computing 2011 kevin jackson
 
Chronos Systems Presentation 2009 English
Chronos Systems Presentation 2009 EnglishChronos Systems Presentation 2009 English
Chronos Systems Presentation 2009 English
 

More from Kimmo Soramaki

More from Kimmo Soramaki (20)

Applications of Network Theory in Finance
Applications of Network Theory in FinanceApplications of Network Theory in Finance
Applications of Network Theory in Finance
 
Applications of Network Theory in Finance and Production
Applications of Network Theory in Finance and ProductionApplications of Network Theory in Finance and Production
Applications of Network Theory in Finance and Production
 
Global Network of Payment Flows - Presentation at Commerzbank Cash Forum
Global Network of Payment Flows - Presentation at Commerzbank Cash ForumGlobal Network of Payment Flows - Presentation at Commerzbank Cash Forum
Global Network of Payment Flows - Presentation at Commerzbank Cash Forum
 
Visualizing Financial Stress - Talk at European Central Bank
Visualizing Financial Stress - Talk at European Central BankVisualizing Financial Stress - Talk at European Central Bank
Visualizing Financial Stress - Talk at European Central Bank
 
Financial Cartography
Financial CartographyFinancial Cartography
Financial Cartography
 
Financial Cartography at Bogazici University
Financial Cartography at Bogazici UniversityFinancial Cartography at Bogazici University
Financial Cartography at Bogazici University
 
Network Simulations for Business Continuity
Network Simulations for Business ContinuityNetwork Simulations for Business Continuity
Network Simulations for Business Continuity
 
Financial Cartography for Payments and Markets
Financial Cartography for Payments and MarketsFinancial Cartography for Payments and Markets
Financial Cartography for Payments and Markets
 
Quantitative Oversight of Financial Market Infrastructures
Quantitative Oversight of Financial Market InfrastructuresQuantitative Oversight of Financial Market Infrastructures
Quantitative Oversight of Financial Market Infrastructures
 
Emerging Stress Scenarios
Emerging Stress ScenariosEmerging Stress Scenarios
Emerging Stress Scenarios
 
Network Approaches for Interbank Markets
Network Approaches for Interbank MarketsNetwork Approaches for Interbank Markets
Network Approaches for Interbank Markets
 
System shock analysis and complex network effects
System shock analysis and complex network effectsSystem shock analysis and complex network effects
System shock analysis and complex network effects
 
Adaptive Stress Testing
Adaptive Stress TestingAdaptive Stress Testing
Adaptive Stress Testing
 
Illuminating Interconnectedness and Contagion
Illuminating Interconnectedness and ContagionIlluminating Interconnectedness and Contagion
Illuminating Interconnectedness and Contagion
 
Financial Networks and Cartography
Financial Networks and CartographyFinancial Networks and Cartography
Financial Networks and Cartography
 
Financial Networks VI - Correlation Networks
Financial Networks VI - Correlation NetworksFinancial Networks VI - Correlation Networks
Financial Networks VI - Correlation Networks
 
Financial Cartography - PRMIA Webinar
Financial Cartography - PRMIA WebinarFinancial Cartography - PRMIA Webinar
Financial Cartography - PRMIA Webinar
 
Financial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresFinancial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing Exposures
 
Financial Networks III. Centrality and Systemic Importance
Financial Networks III. Centrality and Systemic ImportanceFinancial Networks III. Centrality and Systemic Importance
Financial Networks III. Centrality and Systemic Importance
 
Financial Cartography - Center for Financial Research
Financial Cartography - Center for Financial ResearchFinancial Cartography - Center for Financial Research
Financial Cartography - Center for Financial Research
 

Financial Networks V - Inferring Links

  • 1. Center for Financial Studies at the Goethe University PhD Mini-course Frankfurt, 25 January 2013 Financial Networks V. Inferring Links Dr. Kimmo Soramäki Founder and CEO FNA, www.fna.fi
  • 2. I. Mapping II. Mapping Systemic Risk Financial Markets 2
  • 3. Observing vs inferring • Observing links – Exposures, payment flow, trade, co-ownership, joint board membership, etc. – Cause of link is known • Inferring links – Observing the effects and inferring a relationship – Cause of link is unknown – Time series on asset prices, trade volumes, balance sheet items 3
  • 4. Agenda V. Inferring Links • Prices and Returns • Controlling for common factors • Correlation and dependence • Significant correlations • Multiple Comparisons VI. Correlation Networks • Distance and Hierarchical Clustering • Minimum Spanning Tree & PMFG • Filtering • Visual Layouts 4
  • 6. Prices vs Returns • Benefit of using returns vs prices for correlations: • Normalization. All variables are denoted in same units • Required by some matrix algebra • Returns (on asset prices) have low serial correlation • Returns are what matter once investement is made • We'll use returns 6
  • 7. Log vs Arithmetics Returns • Most often logarithmic returns are used in Finance. Why? • Benefit of using log returns vs arithmetic returns – Assumptions in statistical estimates (log-normal returns) – Easier calculations (integration, compounding) lead to smaller algorithmic complexity • Normaliy assumptions make log returns better • We'll use log returns 7
  • 8. Data Issues • Which prices – daily (low, high, close) – time-zone issues – high-frequency • Missing prices – Errors in source data – Partial holidays – Weekends/holidays – Exit and entry of assets • Handling options – Look up from other sources – Edit values, e.g. replace with same as previous (prices) or 0 (returns) – Exclude days/series from analysis – Decide on weekend returns depending on data 8
  • 9. Ticker Name Sector Weight DAX stocks ADS Adidas clothing and footwear 2.0 ALV Allianz insurance 6.7 BAS BASF speciality chemicals 9.6 BAYN Bayer speciality chemicals 7.6 BEI Beiersdorf personal products 0.9 BMW BMW automobile manufacturers 3.3 CBK Commerzbank credit banks 1.0 CON Continental car parts manufacturers 0.8 DAI Daimler automobile manufacturers 5.8 DB1 Deutsche Börse securities brokers 1.5 DBK Deutsche Bank credit banks 5.1 DPW Deutsche Post logistics 1.9 DTE Deutsche Telekom fixed-line telecommunication 5.4 EOAN E.ON multi-utilities 6.2 FME Fresenius Medical Care health care 2.2 FRE Fresenius health care 1.6 HEI HeidelbergCement building materials 0.9 HEN3 Henkel personal products 1.5 IFX Infineon Technologies semiconductors 1.3 LHA Deutsche Lufthansa airlines 0.8 LIN Linde industrial gases 3.8 LXS Lanxess specialty chemicals 0.7 MRK Merck pharmaceuticals 1.0 MUV2 Munich Re re-insurance 2.9 RWE RWE multi-utilities 2.2 SAP SAP software 7.7 SDF K+S commodity chemicals 1.2 SIE Siemens diversified industrials 10.0 TKA ThyssenKrupp diversified industrials 1.3 VOW3 Volkswagen Group automobile manufacturers 3.4 9
  • 10. Data: Prices of 30 stocks in DAX in 2011 date ADS ALV BAYN BAS ... 2011-12-30 50.26 73.91 49.4 53.89 2011-12-29 49.94 73.21 48.73 53.17 2011-12-28 49.79 73.15 47.36 52.39 2011-12-27 50.3 75.45 48.27 53.17 2011-12-23 50.29 75.97 48.13 52.83 2011-12-22 49.98 75.81 47.92 52.63 2011-12-21 49.43 75.37 47.04 52.4 2011-12-20 49.56 75.43 46.99 52.9 2011-12-19 47.96 72.23 44.75 51.15 2011-12-16 48.1 72.77 45.04 51.71 2011-12-15 47.89 73.47 44.99 51.13 2011-12-14 48 71.53 44.8 50.58 2011-12-13 48.45 73.04 45.73 51.1 ... 10
  • 11. calculatereturns -command • File (-file) : File with price data. The file must contain a table where each row has the observation date as its first element, and asset prices as subsequent elements separated by a field delimiter. The first line read must contain names for each asset column (except the first column which contains the date). Mandatory. • Save As (-saveas) : File where result should be saved. Mandatory. • Date format (-dateformat) : Format of date in input file. Default value yyyy- MM-dd. Optional. • Method of calculation (-method) : Arithmetic (arithmetic) or logarithmic (log) returns. Mandatory. By default 'log'. • Number of observations (-obs) : Larger than or equal to 2. Optional. By default '2'. • Length of interval (-interval) : Larger than or equal to 1. Optional. By default '1'. • Date order (-dateorder) : Date order in the file. Optional. Allowed values: [asc, desc]. By default 'asc'. 11
  • 12. Examples # Calculate daily log returns calculatereturns -file daxprices-2011.csv -saveas daxreturns-2011.csv -method log -dateorder desc # Calculate two-day log returns calculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl2.csv -method log -obs 2 -interval 2 -dateorder desc # Calculate two-day log returns with a one-day window calculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl1.csv -method log -obs 2 -interval 1 -dateorder desc # Calculate daily arithmetic returns calculatereturns -file daxprices-2011.csv -saveas daxreturns-2da1.acsv -method arithmetic -dateorder desc 12
  • 13. Data: Returns of 30 stocks in DAX in 2011 date ADS.DE ALV.DE BAYN.DE BAS.DE ... 2011-12-29 0.0064 0.0095 0.0137 0.0135 2011-12-28 0.0030 0.0008 0.0285 0.0148 2011-12-27 -0.0102 -0.0310 -0.0190 -0.0148 2011-12-23 0.0002 -0.0069 0.0029 0.0064 2011-12-22 0.0062 0.0021 0.0044 0.0038 2011-12-21 0.0111 0.0058 0.0185 0.0044 2011-12-20 -0.0026 -0.0008 0.0011 -0.0095 2011-12-19 0.0328 0.0433 0.0488 0.0336 2011-12-16 -0.0029 -0.0074 -0.0065 -0.0109 2011-12-15 0.0044 -0.0096 0.0011 0.0113 2011-12-14 -0.0023 0.0268 0.0042 0.0108 2011-12-13 -0.0093 -0.0209 -0.0205 -0.0102 2011-12-12 0.0058 -0.0163 -0.0020 -0.0099 ... 13
  • 14. Common Factors • A common factor may be affecting all the returns – E.g bullish/bearish market • If we know the market return, we can deduct it from the returns to look at excess returns to market • If we don't know, we can try to control for this unknown common factor via statistical methods • Principal Component Analysis allows us to identify and remove the common factor 14
  • 15. Eigenvectors - example A plot of the normalised data (mean subtracted) -> First component explains maximum variance The data has two variables (x and y), the covariance matrix is 2x2 and thus there are two Eigenvectors The eigenvectors of the covariance matrix are overlayed as dotted lines 15
  • 16. Eigenvectors of asset correlations • The eigenvectors of a correlation matrix are orthogonal to each other. The data can be projected in terms of its eigenvectors • Each eigenvector represents a Priciple Component – if values are similar, the component affects all returns similarly • Eigenvalues scale with the share of variance explaned by each component – Divide each by the sum to get share • Laloux et al (1999) find on S&P500 (and other markets) – The largest eigenvalue is well separated from the bulk and corresponds to the whole market as the corresponding eigenvector has roughly equal components – The next largest eigenvectors carry information about the real correlations and can be used in identifying clusters of strongly interacting assets – The smaller eigenvectors are noise (95%) L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Noise Dressing of Financial Correlation Matrices. Phys. Rev. Lett. 83 (1999) 1467 ; T. Heimo, J. Saramaki, J-P. Onnela, K. Kaski (2007). Spectral and network methods in the analysis of correlation matrices of stock returns, Physica A 383, pp. 147–151 16
  • 17. pca -command • File (-file) : File with input data. • Components (-components) : Components to keep, e.g. 2- or 1 or 2-10 Optional. • Observations (-obs) : Number of observations in each period. • Interval (-interval) : Number of observations between each period. • Saveas (-saveas) : Name of file to store amended data. Examples # calculate returns corresponding to all but principal component pca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2- -saveas dax-montly1.csv # calculate returns corresponding to second to 10th largest component pca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2-10 -saveas dax-montly2.csv 17
  • 19. 19
  • 20. buildbycorrelation -command • File (-file) : File from which to read data. To skip lines or use non-default field delimiter use the 'delimiter' and 'skiplines' arguments. Mandatory. • Save as property (-saveas) : Property name of saved result. Optional. By default 'correlation'. • Number of observations (-obs) : Larger than or equal to 2. Optional. • Length of interval (-interval) : Larger than or equal to 1. Optional. • -commitinterval : Larger than or equal to 1. Optional. By default '10000'. • Save standard deviation (-savestdev) : Option to save standard deviation as vertex property. Optional. By default 'false'. • Save average returns (-savereturns) : Option to save average returns as vertex property. Optional. By default 'true'. • Preserve (-preserve) : Do not overwrite existing networks in database. Optional. By default 'true'. 20
  • 21. Missing values (-missing) : Handling of missing or non-numeric values in data. Alternatives: – Zero : missing values are considered as 0. – NaN : default value corresponds to 'NaN and missing values' setting in 'Default format settings'. – Alert : missing values cause the calculation to fail. Optional. Allowed values: [Zero, NaN, Alert]. By default 'Zero'. • Start date (-start) : Starting date of calculation. Optional. • End date (-end) : Ending date of calculation. Optional. • Date format (-dateformat) : Format of date in input file. Default value corresponds to 'Date format' setting in 'Default format settings'. Optional. • Date order (-dateorder) : Order of dates in the input file. If dates are ascending order, the newest network has fewer observations than other networks. If dates are descending order, the oldest network has fewer observations. Optional. Allowed values: [asc, desc]. By default 'desc'. 21
  • 22. Correlation Matrix & Network A B C A 0 0.72 0.31 B 0.72 0 0.60 C 0.31 0.60 0 22
  • 23. Examples # Calculate 100 day pairwise correlations with 1 day sliding window, consider any missing/non-numeric return as 0 buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval 1 -preserve false # Calculate 100 day pairwise correlations with no sliding window, consider any missing/non-numeric return as 0 buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval 100 -preserve false # Calculate pairwise correlations for whole data, alert about missing values buildbycorrelation -file daxreturns-2011.csv -missing Alert -preserve false 23
  • 24. Heatmap -command • Creates time series of heatmaps • Like viz -command • Allows mapping of data to cell size, shape, color, label, hover • Allows definition of color ranges and domains # create heatmap from correlations heatmap -sortv vertex_id -p correlation -symmetric true - cellsizedefault 13 -transition 0 -cellhover correlation -colordomain (-1)- 1 -palette darkblue-lightgray-darkred -saveas daxheat-def-Y 24
  • 25. 28&29 All components Principal Component Removed 25
  • 26. About Color Perception A and B are the same shade of gray http://upload.wikimedia.org/wikipedia/comm ons/9/93/Optical_illusion_greysquares.gif
  • 27. Anscombes Quartet Constructed in 1973 by Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties Mean of x 9 Variance of x 11 Mean of y ~7.50 Variance of y ~4.1 Correlation ~0.816 Linear regression: y = 3.00 + 0.500x Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with Identical Statistics but Dissimilar Graphics: A Follow up to the Anscombe Dataset". American Statistician 61 (3): 248–254. 27
  • 28. Significance of Correlations • We are estimating 'true' correlations by means of samples • Our estimates have errors • Question: What is the chance that random data exhibits an observed correlation? 28
  • 29. Example - distribution of correlation in 30 trials with random numbers 20 pairs 50 pairs 100 pair 200 pairs 29
  • 30. Test for significance • Errors: – Type I: False positive - we identify a correlation where there is none – Type II: False negative - we identify no correlation when there is one • We can control for Type I – Fisher transformation -> Normal distr. -> Confidence Interval • Possible to estimate Type II error rate - only more data will bring this down 30
  • 31. 30-32 alpha -parameter Significance level (-alpha) : Two-sided significance level for confidence interval. Example: buildbycorrelation -file daxreturns-2011.csv -missing Alert -obs 30 -interval 1 -alpha 0.05 -preserve false 31
  • 32. No signifance test, Principal Only significant correlations ( =0.05), Component Removed Principal Component Removed 32
  • 33. Problem of Multiple Comparisons • Occurs when considering multiple simultanous estimates – e.g. with correlation matrix of n=30 we have 30*30/2-30 = 420 estimates – > too many false positives by random chance alone – the system may not be accurate enough • Familywise error rate (FWER) - Probability of making one or more false positives among all the hypotheses when performing multiple hypotheses tests. – Solved with e.g. Bonferroni correction • False discovery rate (FDR) - Proportion of significant correlations that are false positives – Benjamini-Hochberg -procedure • Question: which is more important? Missing correlations that exist, or disregarding ones that do? 33
  • 34. 33-35 bonferroni -parameter Enable Bonferroni correction (-bonferroni) : Option to use Bonferroni correction in significance testing. Optional. By default 'false'. Example buildbycorrelation -file daxreturns.csv -missing Alert -obs 30 -interval 1 -alpha 0.05 -bonferroni true -preserve false 34
  • 35. Only significant correlations Bonferroni correction, Only ( =0.05), Principal Component significant correlations ( =0.05), Removed Principal Component Removed 35
  • 36. Blog, Library and Demos at www.fna.fi Dr. Kimmo Soramäki kimmo@soramaki.net Twitter: soramaki