SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Knowledge Discovery in the Stock Market

Supervised and Unsupervised Learning with BayesiaLab




Stefan Conrady, stefan.conrady@conradyscience.com

Dr. Lionel Jouffe, jouffe@bayesia.com

June 29, 2011




Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
Knowledge Discovery in the Stock Market with Bayesian Networks




Table of Contents

Tutorial
  Highlights                                                      1

  Background & Objective                                          1
      Notation                                                    2

  Dataset                                                         3

  Data Preparation and Transformation                             4

  Data Import                                                     5
      Determining Discretization Intervals                        6
      Modeling Mode                                               8

  Unsupervised Learning                                          12
      Bayesian Network versus Correlation Matrix                 16
      Inference with Bayesian Networks                           16
        Inference with Hard Evidence                             18
        Inference with Soft Evidence                             22
      Bayesian Network Metrics                                   25
        Arc Force                                                25
        Mutual Information                                       26
        Correlation                                              27
      Summary - Unsupervised Learning                            27

  Supervised Learning                                            29
      Inference with Supervised Learning                         32
      Adaptive Questionnaire                                     34
      Summary - Supervised Learning                              38

Appendix
  Appendix                                                       39
      Markov Blanket                                             39
      Bayes’ Theorem                                             39

  About the Authors                                              40



www.conradyscience.com | www.bayesia.com
                         ii
Knowledge Discovery in the Stock Market with Bayesian Networks



        Stefan Conrady                                           40
        Lionel Jouffe                                            40

  Contact Information                                            41
        Conrady Applied Science, LLC                             41
        Bayesia S.A.S.                                           41

  Copyright                                                      41




www.conradyscience.com | www.bayesia.com
                         iii
Knowledge Discovery in the Stock Market with Bayesian Networks




Tutorial

Highlights
• Unsupervised Learning with BayesiaLab can rapidly generate plausible structures of unfamiliar problem domains, as
  illustrated in this paper with examples from the U.S. stock market.

• Supervised Learning with BayesiaLab delivers reliable models in high-dimensional domains, providing both powerful
  predictive performance plus a platform for simulating domain dynamics.

• Knowledge representation with Bayesian networks is highly intuitive and effectively provides computable knowledge
  that allows inference and reasoning under uncertainty.



Background & Objective
Perhaps more than any other kind of time series data, nancial markets have been scrutinized by countless mathemati-
cians, economists, investors and speculators over hundreds of years. Even in modern times, despite all scienti c ad-
vances, the effort of predicting future movements of the stock market sometimes still bears resemblance to the ancient
alchemistic aspirations of turning base metals into gold. That is not to say that there is no genuine scienti c effort in
studying nancial markets, but distinguishing serious research from charlatanism (or even fraud) remains remarkably
dif cult.

We neither aspire to develop a crystal ball for investors nor do we expect to contribute to the economic and economet-
ric literature. However, we nd the wealth of data in the nancial markets to be fertile ground for experimenting with
knowledge discovery algorithms and for generating knowledge representations in the form of Bayesian networks. This
area can perhaps serve as a very practical proof of the powerful properties of Bayesian networks, as we can quickly
compare machine-learned ndings with our own understanding of market dynamics. For instance, the prevailing opin-
ions among investors regarding the relationships between major stocks should be re ected in any structure that is to be
discovered by our algorithms.

More speci cally, we will utilize the unsupervised and supervised learning algorithms of the BayesiaLab software pack-
age to automatically generate Bayesian networks from daily stock returns over a six-year period. We will examine 459
stocks from the S&P 500 index, for which observations are available over the entire timeframe. We selected the S&P
500 as the basis for our study, as the companies listed on this index are presumably among the best-known corporations
worldwide, so even a casual observer should be able to critically review the machine-learned ndings. In other words,
we are trying to machine-learn the obvious, as any mistakes in this process would automatically become self-evident.
Quite often experts’ reaction to such machine-learned ndings is, “well, we already knew that.” That is the very point
we want to make, as machine-learning can — within seconds — catch up with human expertise accumulated over years,
and then rapidly expand beyond what is already known.

The power of such algorithmic learning will be still more apparent in entirely unknown domains. However, if we were
to machine-learn the structure of a foreign equity market for expository purposes in this paper, chances are that many
readers would not immediately be able to judge the resulting structure as plausible or not.




www.conradyscience.com | www.bayesia.com
                                                                              1
Knowledge Discovery in the Stock Market with Bayesian Networks



In addition to generating human-readable and interpretable structures, we want to illustrate how we can immediately
use machine-learned Bayesian networks as “computable knowledge” for automated inference and prediction. Our ob-
jective is to gain both a qualitative and quantitative understanding of the stock market by using Bayesian networks. In
the quantitative context, we will also show how BayesiaLab can reliably carry out inference with multiple pieces of un-
certain and even con icting evidence. The inherent ability of Bayesian networks to perform computations under uncer-
tainty makes them highly suitable for a wide range of real-world applications.

Continuing the practice established in our previous white papers, we attempt to present the proposed approach in the
style of a tutorial, so that each step can be immediately replicated (and scrutinized) by any reader equipped with the
BayesiaLab software.1 This re ects our desire to establish a high degree of transparency regarding all proposed methods
and to minimize the risk of Bayesian networks being perceived as a black-box technology.


Notation
To clearly distinguish between natural language, software-speci c functions and example-speci c variable names, the
following notation is used:

• Bayesian network and BayesiaLab-speci c functions, keywords, commands, etc., are capitalized and shown in bold
  type.

• Names of attributes, variables, nodes and are italicized.




1   The preprocessed dataset with daily return data is available for download from our website:
www.conradyscience.com/white_papers/ nancial/SP500_v6_dlog_b.csv


www.conradyscience.com | www.bayesia.com
                                                                            2
Knowledge Discovery in the Stock Market with Bayesian Networks



Dataset
The S&P 500 is a free- oat capitalization-weighted index of the prices of 500 large-cap common stocks actively traded
in the United States, which has been published since 1957. The stocks included in the S&P 500 are those of large pub-
licly held companies that trade on either of the two largest American stock market exchanges; the New York Stock Ex-
change and the NASDAQ. For our case study we have tracked the daily closing prices of all stocks included in the S&P
500 index from January 3, 2005 through December 30, 2010, only excluding those stocks which were not traded con-
tinuously over the entire study period. This leaves a total of 459 stock prices with 1,510 observations each.



                                                                                        60
    40     A                    AA            300    AAPL                ABC                   ABT            60     ACE
                         40                                        30

                         20                                        20                   40
    20                                        100                                                             40

     0    400 800 1200    0    400 800 1200     0   400 800 1200    0   400 800 1200     0    400 800 1200      0   400 800 1200

                         40                                                             60
           ADBE                 ADI                  ADM                 ADP                   ADSK           40     AEE
    40                                         40                  45

                         20                                        35
                                               20                                       20                    20
    20
      0   400 800 1200    0    400 800 1200     0   400 800 1200    0   400 800 1200     0    400 800 1200      0   400 800 1200

                                               60                                       80
           AEP                  AES                  AET           60    AFL                   AGN                   AIG
    40                   20                                                                                  1000

    30                   10                                                             40                   500
                                               20                  20
     0    400 800 1200    0    400 800 1200     0   400 800 1200    0   400 800 1200     0    400 800 1200      0   400 800 1200

                         75                    60                  75                   60                    40
           AIV                  AIZ                  AKAM                AKS                   ALL                   ALTR
    30

    10                   25                    20                  25                                         20
                                                                                        20
     0    400 800 1200    0    400 800 1200     0   400 800 1200    0   400 800 1200     0    400 800 1200      0   400 800 1200

                                                                                                              30
    20     AMAT                 AMD            80    AMGN                AMT                   AMZN                  AN
                         40                                        50                  150
                         20                                        30                                         10
    10                                         40                                       50
     0    400 800 1200    0    400 800 1200     0   400 800 1200    0   400 800 1200      0   400 800 1200      0   400 800 1200

                                              150                  80                  100
    75     ANF                  AON                  APA                 APC                   APD                   APH
                         40                                                                                   50

                                                                   40                                         30
    25                   20                    50                                       50
     0    400 800 1200     0   400 800 1200     0   400 800 1200    0   400 800 1200     0    400 800 1200      0   400 800 1200




www.conradyscience.com | www.bayesia.com
                                                                                          3
Knowledge Discovery in the Stock Market with Bayesian Networks



Data Preparation and Transformation
Rather than treating the time series in levels, we will difference the stock prices and compute the daily returns. More
speci cally, we will take differences of the logarithms of the levels, which is a good approximation of the daily stock
return in percentage terms. After this transformation, 1,509 observations remain and a selection of the rst 36 stocks (in
alphabetical order) is shown below.



            A                      AA            0.1     AAPL            0.1    ABC           0.05     ABT                    ACE
     0.1                  0.1                                                                                        0.1
     0.0
                                                                                              -0.05
                          -0.1                   -0.1                   -0.1                                         -0.1
       0   400 800 1200       0   400 800 1200      0   400 800 1200       0   400 800 1200       0   400 800 1200       0   400 800 1200

                                                                                                                     0.1
     0.1    ADBE                   ADI           0.1     ADM           0.05     ADP                    ADSK                   AEE
                          0.1                                                                   0.1

                                                                                               -0.1
    -0.1                  -0.1                   -0.1                  -0.05                                         -0.1
       0   400 800 1200      0    400 800 1200      0   400 800 1200       0   400 800 1200       0   400 800 1200      0    400 800 1200

                                                                                                0.2
            AEP                    AES                   AET           0.25     AFL                    AGN           0.5      AIG
   0.05                   0.2                    0.1

                          0.0                    -0.1                                           0.0
   -0.05                                                               -0.25                                         -0.5
       0   400 800 1200      0    400 800 1200      0   400 800 1200       0   400 800 1200       0   400 800 1200      0    400 800 1200

                          0.2
     0.2    AIV                    AIZ           0.2     AKAM                   AKS                    ALL           0.1      ALTR
                                                                         0.2                    0.1

                                                                         0.0                   -0.1
                          -0.2                                                                                       -0.1
    -0.2                                         -0.2
       0   400 800 1200      0    400 800 1200      0   400 800 1200       0   400 800 1200       0   400 800 1200      0    400 800 1200


            AMAT          0.2      AMD                   AMGN                   AMT             0.2    AMZN                   AN
     0.1                                         0.1                     0.1                                         0.1
     0.0                                         0.0                     0.0
                          -0.2                                                                 -0.2                  -0.1
       0   400 800 1200      0    400 800 1200      0   400 800 1200       0   400 800 1200       0   400 800 1200       0   400 800 1200


     0.1    ANF                    AON                   APA             0.1    APC           0.05     APD                    APH
                          0.1                    0.1                                                                 0.1

                                                 0.0                                                                 0.0
    -0.1                                                                -0.1                  -0.05
                          -0.1
       0   400 800 1200       0   400 800 1200      0   400 800 1200       0   400 800 1200       0   400 800 1200      0    400 800 1200




www.conradyscience.com | www.bayesia.com
                                                                                                   4
Knowledge Discovery in the Stock Market with Bayesian Networks



Data Import
We use BayesiaLab’s Data Import Wizard to load all 459 time series2 into memory from a comma-separated                     le.
BayesiaLab automatically detects the column headers, which contain the ticker symbols3 as variable names.




The next step identi es the data types contained in the dataset and, as expected, BayesiaLab nds 459 continuous vari-
ables.




2   Although the dataset has a temporal ordering, for expository simplicity we will treat each time interval as an inde-
pendent observation.
3   A ticker symbol is a short abbreviation used to uniquely identify publicly traded stocks.


www.conradyscience.com | www.bayesia.com
                                                                                   5
Knowledge Discovery in the Stock Market with Bayesian Networks



There are no missing values in the dataset and we do not want to lter out any observations, so the next screen of the
Data Import Wizard can be skipped entirely.




The next step, however, is critical. As part of every data import process into BayesiaLab we must discretize any con-
tinuous variables, which means all 459 variables in our particular case.

BayesiaLab offers a number of algorithms to automatically discretize the continuous variables and one of the most prac-
tical ones, for subsequent Unsupervised Learning, is the K-Means algorithm. It provides a very quick way to capture the
salient characteristics of probability density curves and creates suitable thresholds for binning purposes.


Determining Discretization Intervals
Analyst judgement is required though for choosing an appropriate number of intervals. A common heuristic found in
the statistical literature is ve observations per parameter. We adapt this as a guide for the minimum number of obser-
vations required for each cell in any of the yet-to-be-learned Conditional Probability Tables (CPT).

In our particular case we already know that we will initially perform Unsupervised Learning with the Maximum Weight
Spanning Tree algorithm. This tree structure implies that each Node will have only have one parent, which, in turn,
means that each CPT will have the size determined by number of parent states times the number of child states. Choos-
ing ve intervals for the discretization process would thus mean a CPT size of 25 cells.4

With a uniform distribution of the states this would suggest that we have approximately 60 observations per cell, which
would clearly be more than enough. However, upon visual inspection of the actual distributions of the variables, the
uniform distribution assumption does de nitely not hold. The graph below shows the distribution of variable AA:




4   Other learning algorithms do not have this one-parent constraint and, for instance, a ve-interval discretization with
three parents per node would generate CPTs consisting of 625 cells. Even when assuming uniform distributions, the
available observations would be insuf cient for estimation purposes.


www.conradyscience.com | www.bayesia.com
                                                                                   6
Knowledge Discovery in the Stock Market with Bayesian Networks




Rather, looking at this graph, it may be more appropriate to assume a normal distribution.5 Given that each Node will
have one parent, we would perhaps further assume a bivariate normal distribution for the joint distribution of each pair
of Nodes. We need to emphasize that we are not attempting to t distributions per se, but that we are rather trying to
    nd a heuristic that allows us to establish the minimum number of observations needed to characterize the tail ends of
the distributions.

An assumed bivariate normal distribution would yield a discrete probability density function similar to what is shown in
the table below. In other words, this is what we would expect the Conditional Probability Table (CPT) to approxi-
mately look like, once we have discretized the states and learned the CPT from the actual occurrences. However, we
have not yet discretized the states and much less estimated the CPT. Actually, we have not really determined how many
discretization levels are correct. So, it is a catch-22 and hence the need for a heuristic.

Our heuristic is that we use our qualitative understanding of the distributions to determine a reasonable number of in-
tervals that provides a minimum number of samples for the tails. More formally, the “thinnest tail” is the minimal local
joint probability (MLJP). Assuming 5 states for parent and child each, and with a total of 1,509 observations, this
would translate into approximately 4 observations for the MLJP (highlighted in red).

                                      :212.-;4<;7=3>?;@4?.-
           789             !"       !#           $         #       "
                  !"     !"#$%    &"'&%       #"&(%      &"'&%   !"#$%
    :212.-;4<;    !#     &"'&%    (")(%       $"*(%      (")(%   &"'&%
     81/.52;       $     #"&(%    $"*(%      &("$#%      $"*(%   #"&(%
     @4?.-         #     &"'&%    (")(%       $"*(%      (")(%   &"'&%
                   "     !"#$%    &"'&%       #"&(%      &"'&%   !"#$%

         &(!$ +,-./012345-

                                      :212.-;4<;7=3>?;@4?.-
           789               !"     !#           $         #      "
                  !"          6     #!          ''         #!     6
    :212.-;4<;    !#         #!     ))         &6*         ))     #!
     81/.52;       $         ''    &6*         #6!        &6*     ''
     @4?.-         #         #!     ))         &6*         ))     #!
                   "          6     #!          ''         #!     6


Although the number of expected samples for the MLJP appears to be below the recommended minimum, we will for
now proceed on this basis and set the number of intervals to 5. Only upon completion of the discretization, and after
learning the network including the CPTs, we will know for sure whether this was indeed a reasonable assumption or
not.



5   We omit plotting the distributions of all variables, but all the variables’ distributions do indeed resemble the normal
distribution.


www.conradyscience.com | www.bayesia.com
                                                                                     7
Knowledge Discovery in the Stock Market with Bayesian Networks




Clicking Finish will now perform the discretization. A progress bar will be shown to track the state of this process.




Modeling Mode
Upon conclusion, the variables are delivered as blue Nodes into the Graph Panel of BayesiaLab and by default we are
now in the Modeling Mode. The original variable names, which were stored the rst line of the database, become our
Node Names.




www.conradyscience.com | www.bayesia.com
                                                                               8
Knowledge Discovery in the Stock Market with Bayesian Networks



At this point it is practical to add Node Comments to the Node Names. Node Comments are typically used in
BayesiaLab for longer and more descriptive titles, which can be turned on or off, depending on the desired view of the
graph. Here, we associate a dictionary of the complete company names with the Node Comments, while the more com-
pact ticker symbols remain as Node Names.6

The syntax for this association is rather straightforward: we simply de ne a text le which includes one Node Name per
line. Each Node Name is followed by the equal sign (“=”), or alternatively TAB or SPACE, and then by the full com-
pany name, which will serve as the Node Comment.




This le can then be loaded into BayesiaLab via Data>Associate Dictionary>Node>Comments.




Once the comments are loaded, a small call-out symbol          will appear next to each Node Name. This indicates that
Node Comments are available for display.




6   To maintain a compact presentation, we will typically use the ticker symbol when referencing a particular stock rather
than the full company name.


www.conradyscience.com | www.bayesia.com
                                                                                9
Knowledge Discovery in the Stock Market with Bayesian Networks




As the name implies, selecting View>Display Node Comments (or alternatively the keyboard shortcut “M”) will reveal
the company names.




www.conradyscience.com | www.bayesia.com
                                                                      10
Knowledge Discovery in the Stock Market with Bayesian Networks



Node Comments can be displayed for either all Nodes or only for selected ones.




Before proceeding with the rst learning step, it is also recommended to brie y switch into the Validation Mode (F5)
and to check the distributions of the states of the Nodes. The Monitors of the rst nine Nodes are shown below. At rst
glance, the distributions appear to be plausible representations of the historical return distributions.




www.conradyscience.com | www.bayesia.com
                                                                         11
Knowledge Discovery in the Stock Market with Bayesian Networks



Unsupervised Learning
To perform the rst Unsupervised Learning algorithm on our dataset, we switch back into Modeling Mode (F4) and
select Learning>Association Discovering>Maximum Spanning Tree.7 This starts the Maximum Weight Spanning Tree
algorithm, which is the fastest of the Unsupervised Learning algorithms and thus recommended at the beginning of most
studies.8 As the name implies, this algorithm generates a tree structure, i.e. it permits only one parent per Node. This
constraint is one of the reasons for the extreme learning speed of this algorithm.9 Performing the algorithm with a le of
this size should only take a few seconds.




7   In BayesiaLab nomenclature, Unsupervised Learning is listed in the Learning menu as “Association Discovering”
8   Several other Unsupervised Learning algorithms are available in BayesiaLab, including Taboo, EQ, SopLEQ and Ta-
boo Order.
9   It goes beyond the scope of this tutorial to discuss the different types of learning algorithms and their speci c proper-
ties.


www.conradyscience.com | www.bayesia.com
                                                                                   12
Knowledge Discovery in the Stock Market with Bayesian Networks



At rst glance, however, the resulting network does not appear simple and tree-like at all.




This can be quickly resolved with BayesiaLab’s built-in layout algorithms. Selecting View>Automatic Layout (shortcut
“P”) rearranges the network instantly to reveal a much more intuitive structure.




www.conradyscience.com | www.bayesia.com
                                                                        13
Knowledge Discovery in the Stock Market with Bayesian Networks



The resulting, reformatted Bayesian network representing the stock returns can now be read and interpreted
immediately:10 11




For instance, we can zoom into the branch of the Bayesian network which contains Procter & Gamble (PG).
BayesiaLab offers a search function (shortcut Ctrl-F or ⌘-F), which helps nd individual nodes very easily.




10   A separate, high-resolution PDF of this Bayesian network can be downloaded here:
www.conradyscience.com/white_papers/ nancial/SP500_V13.pdf. This allows those readers without an active
BayesiaLab installation to explore the network graph in much greater detail.
11   For expositional clarity we have only learned contemporaneous relationships and, as a result, potential lag structures
will not appear in this network. However, in BayesiaLab, Unsupervised Learning can be generalized to a temporal ap-
plication. A white paper speci cally focusing on learning temporal (or dynamic) Bayesian networks is planned for the
near future.


www.conradyscience.com | www.bayesia.com
                                                                                14
Knowledge Discovery in the Stock Market with Bayesian Networks




The neighborhood of Procter & Gamble contains many familiar company names, mostly from the CPG industry.12 Per-
haps these companies appear all-too-obvious and the reader may wonder what insight is gained at this point. Chances
are that even a casual observer of the industry would have mentioned Kimberly-Clark, Colgate-Palmolive and Johnson
& Johnson as businesses operating in the same eld as Procter & Gamble, which would therefore presumably have
somewhat related stock price movements.

The key point is that without any prior knowledge of this domain a computer algorithm automatically extracted this
structure, i.e. a Bayesian network, which intuitively matches the understanding that we have established over years as
consumers of these companies’ products.

Clearly, if this was an unfamiliar domain, the knowledge gain for the reader would be far greater. However, a lesser-
known domain would presumably prevent the reader’s intuitive veri cation of the machine-discovered structure here.




12   CPG stands for Consumer Packaged Goods.


www.conradyscience.com | www.bayesia.com
                                                                            15
Knowledge Discovery in the Stock Market with Bayesian Networks



Bayesian Network versus Correlation Matrix
The bene t of the concise representation as a Bayesian network is further demonstrated by juxtaposing it to a correla-
tion matrix, which would perhaps be the rst step in a traditional statistical analysis of this domain. Even when using
heat map-style color-coding, the sheer number of relationships13 makes an immediate visual interpretation of the corre-
lation matrix very dif cult (see the subset of 25 by 25 cells from the correlation matrix below).

        A            AA         AAPL       ABC        ADI        ADM        ADP        ADSK       AEE        AEP        AES        AET        AFL        AGN        AIV        AIZ         AKAM       AKS        ALL        ALTR       AMAT       AMD        AMGN       AMT        AMZN
A       1              0.570668    0.46678   0.408163   0.533252   0.425324   0.535525   0.495613   0.531351   0.486749   0.490094   0.384297   0.476417   0.465186   0.506165   0.450875      0.4315   0.533276   0.490529   0.521889   0.541416   0.454983   0.388191   0.526454   0.447969
AA          0.570668 1            0.412423   0.363121   0.432512    0.49727   0.513374   0.453742   0.540668   0.487494   0.555778   0.386198   0.505749   0.417878   0.533665   0.525495    0.433653   0.691676   0.558741   0.443481   0.502896   0.406542   0.357239   0.532022   0.369067
AAPL         0.46678   0.412423 1            0.236667    0.43525   0.323588   0.403402   0.417302   0.340484   0.322327   0.319482   0.289725   0.334087   0.328982   0.402068   0.340316     0.38855   0.432112   0.351426   0.444068   0.463454   0.395558   0.330339   0.437053   0.450858
ABC         0.408163   0.363121   0.236667 1            0.329262   0.298421   0.416881    0.31158   0.440094   0.417974   0.347976   0.408529   0.294418   0.391646    0.33699   0.360633    0.288028   0.340885    0.39043   0.318401   0.309671   0.244243    0.36276   0.347773   0.269919
ADI         0.533252   0.432512    0.43525   0.329262 1            0.321593   0.483858   0.482746   0.425898   0.371848   0.343594   0.314271   0.389693   0.366576   0.462091   0.371839    0.426141   0.460124   0.423266   0.691107   0.638214   0.495377   0.330517   0.467126   0.420969
ADM         0.425324    0.49727   0.323588   0.298421   0.321593 1            0.378516   0.322902   0.452433   0.403492   0.417093   0.305003   0.366817   0.304062   0.366267   0.358504    0.389176   0.452943   0.392224   0.352995   0.339473   0.274791   0.266671   0.414046   0.313261
ADP         0.535525   0.513374   0.403402   0.416881   0.483858   0.378516 1            0.452686   0.542809   0.527541   0.456298   0.372908    0.50101   0.486193   0.526986   0.507023    0.406286   0.476395   0.514611   0.513513   0.515278   0.394056   0.406387    0.48288    0.41627
ADSK        0.495613   0.453742   0.417302    0.31158   0.482746   0.322902   0.452686 1            0.421398   0.402325   0.442238   0.349215   0.417223   0.389226   0.447525   0.405751    0.392804    0.43849    0.41419    0.46149   0.497755   0.396007   0.333145    0.45594   0.383973
AEE         0.531351   0.540668   0.340484   0.440094   0.425898   0.452433   0.542809   0.421398 1            0.756735   0.590583   0.424766   0.513378   0.475327   0.474898   0.473565    0.321768   0.452686   0.537636   0.447271   0.436028    0.31983   0.390525   0.465076    0.32218
AEP         0.486749   0.487494   0.322327   0.417974   0.371848   0.403492   0.527541   0.402325   0.756735 1            0.565275   0.403458    0.42596   0.440173   0.419188   0.458727    0.318872   0.422276   0.459285   0.396228   0.417472   0.292099   0.398822   0.446867   0.314108
AES         0.490094   0.555778   0.319482   0.347976   0.343594   0.417093   0.456298   0.442238   0.590583   0.565275 1            0.378383   0.476892    0.40224   0.420327   0.453099     0.34483   0.492532   0.476188   0.349014   0.398017   0.315139   0.308978   0.438492    0.28071
AET         0.384297   0.386198   0.289725   0.408529   0.314271   0.305003   0.372908   0.349215   0.424766   0.403458   0.378383 1            0.370713   0.421565   0.364347   0.420521    0.249157   0.360531   0.427641   0.290668   0.279035   0.275143   0.321026   0.401321   0.280863
AFL         0.476417   0.505749   0.334087   0.294418   0.389693   0.366817    0.50101   0.417223   0.513378    0.42596   0.476892   0.370713 1            0.418877   0.588516   0.588617    0.351403   0.446767   0.634718   0.390395   0.459462   0.364762   0.285856    0.50493   0.359955
AGN         0.465186   0.417878   0.328982   0.391646   0.366576   0.304062   0.486193   0.389226   0.475327   0.440173    0.40224   0.421565   0.418877 1            0.422619   0.396071    0.323589   0.388559   0.443402   0.332295   0.393542   0.347243   0.345897   0.461649   0.336944
AIV         0.506165   0.533665   0.402068    0.33699   0.462091   0.366267   0.526986   0.447525   0.474898   0.419188   0.420327   0.364347   0.588516   0.422619 1            0.558192    0.408232    0.49093   0.644666   0.485371   0.541239   0.390922    0.30768   0.512831   0.397449
AIZ         0.450875   0.525495   0.340316   0.360633   0.371839   0.358504   0.507023   0.405751   0.473565   0.458727   0.453099   0.420521   0.588617   0.396071   0.558192 1             0.353718    0.45162   0.616235   0.378966   0.430116   0.315676   0.343417   0.513195   0.347806
AKAM          0.4315   0.433653    0.38855   0.288028   0.426141   0.389176   0.406286   0.392804   0.321768   0.318872    0.34483   0.249157   0.351403   0.323589   0.408232   0.353718 1             0.438362   0.364883   0.435992   0.428331   0.368554   0.245363   0.419715   0.385661
AKS         0.533276   0.691676   0.432112   0.340885   0.460124   0.452943   0.476395    0.43849   0.452686   0.422276   0.492532   0.360531   0.446767   0.388559    0.49093     0.45162   0.438362 1            0.478014   0.420897   0.475609   0.423204   0.337167   0.508704   0.390437
ALL         0.490529   0.558741   0.351426    0.39043   0.423266   0.392224   0.514611    0.41419   0.537636   0.459285   0.476188   0.427641   0.634718   0.443402   0.644666   0.616235    0.364883   0.478014 1            0.436321   0.503192   0.387605   0.312268   0.525026   0.351342
ALTR        0.521889   0.443481   0.444068   0.318401   0.691107   0.352995   0.513513    0.46149   0.447271   0.396228   0.349014   0.290668   0.390395   0.332295   0.485371   0.378966    0.435992   0.420897   0.436321 1            0.645041   0.490712   0.332572   0.480285   0.443469
AMAT        0.541416   0.502896   0.463454   0.309671   0.638214   0.339473   0.515278   0.497755   0.436028   0.417472   0.398017   0.279035   0.459462   0.393542   0.541239   0.430116    0.428331   0.475609   0.503192   0.645041 1            0.481282   0.354883   0.482778   0.435212
AMD         0.454983   0.406542   0.395558   0.244243   0.495377   0.274791   0.394056   0.396007    0.31983   0.292099   0.315139   0.275143   0.364762   0.347243   0.390922   0.315676    0.368554   0.423204   0.387605   0.490712   0.481282 1            0.230527   0.390012   0.318144
AMGN        0.388191   0.357239   0.330339    0.36276   0.330517   0.266671   0.406387   0.333145   0.390525   0.398822   0.308978   0.321026   0.285856   0.345897    0.30768   0.343417    0.245363   0.337167   0.312268   0.332572   0.354883   0.230527 1            0.327344   0.330847
AMT         0.526454   0.532022   0.437053   0.347773   0.467126   0.414046    0.48288    0.45594   0.465076   0.446867   0.438492   0.401321    0.50493   0.461649   0.512831   0.513195    0.419715   0.508704   0.525026   0.480285   0.482778   0.390012   0.327344 1            0.412541
AMZN        0.447969   0.369067   0.450858   0.269919   0.420969   0.313261    0.41627   0.383973    0.32218   0.314108    0.28071   0.280863   0.359955   0.336944   0.397449   0.347806    0.385661   0.390437   0.351342   0.443469   0.435212   0.318144   0.330847   0.412541 1




Admittedly, there are a number of statistical techniques available which can help in this situation, but the point is that
generating a Bayesian network (e.g. with the Maximum Weight Spanning Tree algorithm we used) takes the practitioner
about the same amount of time as computing a correlation matrix, yet the former yields a much richer picture.

Beyond visual interpretability, there is another key distinction between these two representations. Whereas the correla-
tion matrix is merely descriptive, the Bayesian network is actually computable. By its very nature, any Bayesian network
is a functioning model. On the other hand, with the correlation matrix one could not predict the value of one stock
given the observation of several others. For this purpose, we would have to t and estimate speci c models, e.g. a re-
gression. In a Bayesian network, however, we can use the graph of the Bayesian network itself for computing inference.
For instance, given that we observe the values of JNJ and CL, we immediately obtain an updated value for PG and, at
the same time, also updated values for all other Nodes in the network. We refer to this property as omnidirectional in-
ference, which re ects the updating of beliefs given evidence according to Bayes’ Rule.14 We shall illustrate carrying out
omnidirectional inference in the next section.


Inference with Bayesian Networks
We have shown that the Maximum Weight Spanning Tree algorithm can generate a readily-interpretable and fully-
computable Bayesian network from daily stock return data. However, we have not yet explained in detail what this
structure represents speci cally.

Each Arc in this structure represents a probabilistic relationship between a pair of Nodes. The parameters15 of these
relationships are encoded in Conditional Probability Tables. In the example of the PG and JNJ relationship shown be-
low, the table de nes the probabilities of the states of PG, given the states of JNJ. This table can be accessed in the
Modeling Mode by simply double-clicking on the desired Node, which opens up the Node Editor.




13     459 2 − 459
                   = 105,111
            2

14   See appendix for a brief summary of Bayes’ Theorem.
15   We use the term “parameter” rather loosely in this context, as Bayesian networks are entirely nonparametric models
in BayesiaLab.


www.conradyscience.com | www.bayesia.com
                                                                                                                                                                                                                                              16
Knowledge Discovery in the Stock Market with Bayesian Networks




For clarity, we show the relevant portion of the network for JNJ and PG below plus an enlarged version of the condi-
tional probability table from the Node Editor:




This says, among other things, given that we observe a JNJ return greater than 1.2%, there would be a 50.9% probabil-
ity that we would observe a PG return of greater than 1.2% (see bottom right cell in the above table). More formally
we can also write, P(PG>0.012 | JNJ > 0.012) = 50.9%.

The upper left cell says, given that we observe a JNJ return smaller than -0.9% there is a 46.5% probability that we will
observe a PG return smaller than -1.3%, i.e. P(PG<=0.013 | JNJ <=0.009) = 46.5%.16

If we follow the network “downstream,” i.e from PG to KMB, we see that their relationship is quanti ed in yet another
conditional probability table.




16   As the discretization intervals were generated by the K-Means algorithm, the bins do not necessarily have the same
interval size, which we see in this example.


www.conradyscience.com | www.bayesia.com
                                                                                 17
Knowledge Discovery in the Stock Market with Bayesian Networks




This can be interpreted in the same way: given that we observe a return of PG greater than 1.2%, there is a 42.4%
probability that we would also observe a KMB return of higher than 1.2%. This kind of inference is perhaps the sim-
plest type, as we can directly read the table, i.e. “given this, then that.”

Inference with Hard Evidence
Beyond reviewing the conditional probability tables directly in Modeling Mode in the Node Editor, as above, we can
carry out inference conveniently in the Validation Mode (shortcut F5) of BayesiaLab.




This allows setting evidence and observing inference directly via the Monitors in the Monitor Panel (right side of screen-
shot). We will now highlight JNJ and PG and focus on their Monitors only. Prior to setting any evidence, we will sim-
ply see their marginal distributions in the Monitors. As we would expect, we see the returns distributed around 0 and
the expected value of the returns is 0.




www.conradyscience.com | www.bayesia.com
                                                                              18
Knowledge Discovery in the Stock Market with Bayesian Networks




Observing a speci c state of a Node is equivalent to setting evidence and we can do that directly on the histograms in-
side the Monitors. For instance, we can double-click on the state JNJ > 0.012, which sets it to a 100% probability, as
indicated by the green bar. Setting such evidence will automatically propagate this evidence throughout the network and
we can immediately observe the new distribution of PG. The gray arrows indicate how the distributions have changed
compared to before setting evidence.




So far, this provides no more insight than what we could read from the Conditional Probability Table in the Node Edi-
tor of the PG Node. What is not readily accessible from the CPT is the inverse probability by carrying out inference in
the opposite direction of the Arc, i.e. setting evidence on PG and computing JNJ. Bayes’ Rule speci es the necessary
computation in this case.17




17   See appendix for more details about Bayes’ Rule. Although this calculation is straightforward, application errors are
unfortunately commonplace. The error is so common that is now widely known as the Prosecutor’s Fallacy. In a recent
white paper, Paradoxes and Fallacies, we dedicated a chapter to this problem:
www.conradyscience.com/index.php/paradoxes


www.conradyscience.com | www.bayesia.com
                                                                                19
Knowledge Discovery in the Stock Market with Bayesian Networks



In BayesiaLab the inference computation of JNJ is automatic once we set evidence to PG. To illustrate this, we arbitrar-
ily set the PG return to <=-1.3% and we can immediately see the updated distribution of JNJ.




So far, this could have been computed quite easily by directly applying Bayes’ Rule. It becomes a bit more challenging
when we look at more than two Nodes at the same time. This time we will examine JNJ, PG and KMB (their relevant
subnetwork is shown for reference below).




Once again, prior to setting any evidence, the Monitors show the marginal distributions of JNJ, PG and KMB.




www.conradyscience.com | www.bayesia.com
                                                                            20
Knowledge Discovery in the Stock Market with Bayesian Networks



Upon setting JNJ > 0.012, we can now see how the evidence not only propagates to PG, but also further “downstream”
to KMB:




We can also invert the chain of inference by simply setting evidence at the other end of the network, e.g. KMB > 0.012:




www.conradyscience.com | www.bayesia.com
                                                                            21
Knowledge Discovery in the Stock Market with Bayesian Networks



Or, we can set evidence on both ends, i.e. on JNJ and KMB, and then read the inference in the middle, for PG.




This inference will probably not surprise us: we now have an 80% probability that PG will have a return greater than
1.2%, given that we set both JNJ and KMB to >0.012.

Inference with Soft Evidence
We are not limited to only setting “hard evidence,” as we did above. In the real world, observations often provide “soft
evidence” only. So, instead of setting any of these variables to a state with a 100% probability and thus make them
“hard evidence,” we can use BayesiaLab to set any evidence according to its nature, even when it is uncertain.

For illustration purposes, we will now generate two kinds of “soft evidence,” one for JNJ and one for KMB.

1. We set the evidence directly by right-clicking on the JNJ Monitor and selecting Enter Probabilities:




  We can now adjust the histogram by dragging the bars to the desired probability levels which re ect our subjective
  belief.



www.conradyscience.com | www.bayesia.com
                                                                            22
Knowledge Discovery in the Stock Market with Bayesian Networks




  Clicking the light-green button con rms our choice of probabilities.




  In addition, we right-click on the Monitor again to Fix Probabilities, meaning that we want to hold these values re-
  gardless of any subsequent evidence we enter.




2. Assuming that we have a more general expectation regarding the KMB return, without having any beliefs regarding
   the probabilities of speci c states, we can set the expected mean of the entire KMB distribution. For instance, we set
  the expected mean of the states of KMB to -1% by right-clicking the KMB Monitor and selecting Distribution for
  Target Value/Mean.




www.conradyscience.com | www.bayesia.com
                                                                             23
Knowledge Discovery in the Stock Market with Bayesian Networks



  We type in “-0.01” into the dialog box,




  which generates a new KMB distribution with the desired mean value of -0.01 or -1%.




  It is obvious that an in nite number of combinations could generate a mean value of -1%. However, as an aid to the
  analyst, BayesiaLab computes which distribution with a mean value of -1% would be “closest” to the a-priori distri-
  bution.

Not only are these observations “soft,” in this example they are also of the opposite sign, i.e. JNJ has a positive mean of
the return and KMB has a negative mean of the return.




As a result, carrying out inference generates a more uniform probability distribution for PG (rather than a narrower
distribution), effectively increasing our uncertainty about the state of PG compared to the marginal distribution. The
knowledge gain for the analyst is that greater volatility for PG must be expected.

We have limited our example to inference within a small subnetwork of only three Nodes, but we could have performed
the same approach over the entire Bayesian network of 459 Nodes. With this, the analyst has the complete freedom to
set an unlimited number of all different kinds of evidence, both hard and soft, and to carry out inference “backwards”
and “forwards” within the network. For users of the BayesiaLab software, the automatic computation of inference and
the instant visual updating of the Monitors is comparable to recalculating all cells in a large spreadsheet.



www.conradyscience.com | www.bayesia.com
                                                                               24
Knowledge Discovery in the Stock Market with Bayesian Networks



Bayesian Network Metrics
As shown in these examples, the Arcs represent the probabilistic relationships between Nodes. In addition to visually
interpreting the network structure, and beyond carrying out inference, we can also review the “summary statistics” of
the network and its components with several metrics.

It is important to point out that we use the information theory-based concepts of Entropy, Arc Force and Mutual In-
formation as central metrics in generating and analyzing Bayesian networks. This is a clear departure from commonly
used metrics in traditional statistics, such as covariance and correlation. While these information theory-based metrics
may appear novel to end-users of research, they have many advantages. Most importantly, we can entirely discard the
(often incorrect) assumption regarding linearity and normal distributions. As a result, highly nonlinear dynamics can be
easily captured in a Bayesian network.

Arc Force
For instance, the importance of each Arc can be highlighted by displaying the associated Arc Force and its contribution
with respect to the overall network. From within the Validation Mode, the Arc Force can be displayed by selecting
Analysis>Graphic>Arc Force (or with the shortcut “F”).




www.conradyscience.com | www.bayesia.com
                                                                            25
Knowledge Discovery in the Stock Market with Bayesian Networks



Mutual Information
A perhaps more accessible interpretation is possible by displaying the Mutual Information, which can be obtained by
selecting Analysis>Graphic>Arcs’ Mutual Information.18




The Mutual Information I(X,Y) measures how much (on average) the observation of random variable Y tells us about
the uncertainty of X, i.e. by how much the entropy of X is reduced if we have information on Y. Mutual Information is
a symmetric metric, which re ects the uncertainty reduction of X by knowing Y as well as of Y by knowing X.

In our example, knowing the value of PG on average reduces the uncertainty of the value of KMB by 0.2843 bits, which
means that it reduces its uncertainty by 13.27% (shown in blue, in the direction of the arc). Conversely, knowing KMB
reduces the uncertainty or PG by 13.09% (shown in red, in the opposite direction of the arc).




18   Although interpreting Mutual Information is somewhat more intuitive, in the case of a network tree, Mutual Infor-
mation is identical to Arc Force. For Bayesian networks that are not trees, this distinction becomes very important.


www.conradyscience.com | www.bayesia.com
                                                                                26
Knowledge Discovery in the Stock Market with Bayesian Networks



Correlation
While we emphasize the importance of Arc Force and Mutual Information as measures capable for capturing nonlinear
relationships, BayesiaLab allows to display Pearson’s R for the network (select Analysis>Graphic>Pearson’s Correlation
or shortcut “G”).




By displaying the Pearson’s correlation coef cient, we implicitly make the assumption of linear relationships between
the connected Nodes, which may often not hold in practice. Special care must thus be taken when interpreting low val-
ues of R, as they may re ect nonlinearity rather than independence. On the other hand, R values close to 1 do indeed
suggest the presence of linear relationship. Furthermore, Pearson’s R can be very helpful for determining the sign of the
relationship between variables. BayesiaLab will color-code positive and negative correlations by highlighting the associ-
ated Arcs in blue and red respectively. Finally, correlation is typically a much more familiar metric to most audiences
who are not familiar with Mutual Information.


Summary - Unsupervised Learning
In summary, Unsupervised Learning is an excellent approach to obtain a general understanding of simultaneous rela-
tionships between many variables in a dataset. The learned Bayesian network allows immediate visual interpretation



www.conradyscience.com | www.bayesia.com
                                                                             27
Knowledge Discovery in the Stock Market with Bayesian Networks



plus immediate computation of omnidirectional inference based on any type of evidence, including uncertain and con-
 icting observations. Given these properties, Unsupervised Learning with Bayesian networks becomes a universal and
robust tool for knowledge discovery and modeling in unknown problem domains.




www.conradyscience.com | www.bayesia.com
                                                                       28
Knowledge Discovery in the Stock Market with Bayesian Networks



Supervised Learning
Upon gaining a general understanding of a domain, questions typically arise regarding individual variables and how to
predict them speci cally. Even though we can use Unsupervised Learning to discover a network structure and use it for
prediction, Supervised Learning is often a more appropriate method when studying a speci c target variable. By focus-
ing on a single target variable, BayesiaLab’s learning algorithms focus on tting a (generative) model to a single target
rather than tting a model that balances the t in terms of all variables.

To remain consistent with the example we started earlier, we will once again use PG for illustration purposes. More
speci cally, we will characterize PG as the Target Node. We can do so by right-clicking on the node and then selecting
Set as Target Node from the contextual menu (or by double-clicking the Node while holding “T”).




Now that we have de ned a Target Node, we can perform a range of Supervised Learning algorithms implemented in
BayesiaLab.19

The Markov Blanket20 algorithm is suitable for this kind of application and its speed is particularly helpful when deal-
ing with hundreds or even thousands of variables. Furthermore, BayesiaLab offers the Augmented Markov Blanket,
which starts with the Markov Blanket structure and then uses an unsupervised search to nd the probabilistic relations
that hold between each variable belonging to the Markov Blanket.21 This unsupervised search requires additional com-
putation time but generally results in an improved predictive performance of the model.

The learning process can be started by selecting Learning>Target Node Characterization>Augmented Markov Blanket
from the menu.22



19   For expositional clarity we will only learn contemporaneous relationships and, as a result, potential lag structures will
not appear in the resulting networks. However, in BayesiaLab, Supervised Learning can be generalized to a temporal
application.
20   See appendix for a de nition of the Markov Blanket
21   Intuitively, the “augmented” part of the network plays the same role as the interaction terms between independent
variables in a regression.
22   In BayesiaLab nomenclature, Supervised Learning is listed in the Learning menu as “Target Node Characterization”


www.conradyscience.com | www.bayesia.com
                                                                                  29
Knowledge Discovery in the Stock Market with Bayesian Networks




As we still have our previous network that was generated through Unsupervised Learning, we need to con rm the dele-
tion of that original network before proceeding with Supervised Learning.




After a few seconds, we will see the result of the Supervised Learning process. Our Target Node PG is now connected to
all variables in its Markov Blanket. This means that, given the knowledge of the Nodes in the Markov Blanket, PG is
independent of the remaining network. This effectively identi es the subset of variables which are most important for
predicting the value of the Target Node, PG.




As stated in the introduction, it is not our intention to forecast stock prices per se, but rather to identify meaningful and
relevant structures in the market. Such a structure is this Augmented Markov Blanket and a stock market analyst can
use it to identify a relevant subset of stocks for an in-depth analysis, perhaps with the objective of establishing a buy/sell
recommendation or to directly trade on such knowledge.

Once we have this network, we can use it to analyze these Nodes’ relationships in a number of ways within BayesiaLab.
For instance, we can select Analysis>Graphic>Target Mean Analysis, which graphs PG as a function of the other Nodes
in the network.


www.conradyscience.com | www.bayesia.com
                                                                                  30
Knowledge Discovery in the Stock Market with Bayesian Networks




Alternatively, by selecting Analysis>Report>Target Analysis>Correlation with the Target Node,




we obtain a table displaying the Mutual Information between the Nodes in the network and the Target Variable, PG:




www.conradyscience.com | www.bayesia.com
                                                                           31
Knowledge Discovery in the Stock Market with Bayesian Networks




By clicking Quadrants these values can be displayed as a graph:




Inference with Supervised Learning
To illustrate potential applications of Supervised Learning, beyond interpretation, we have created a simple simulation
of possible stock market conditions. Despite the hypothetical nature of these scenarios, the underlying Bayesian network
was learned from actual market data (as is the case for this entire white paper) and, as a result, the computed inference
based on these assumed conditions is “real.”

One could imagine this purely hypothetical scenario: Colgate-Palmolive and Johnson & Johnson are involved in a pat-
ent lawsuit and an investment analyst speculates about the impact of the imminent verdict in this court case. It is fairly
easy to imagine that a verdict in favor of Johnson & Johnson would result in a boost to its stock price and simultane-



www.conradyscience.com | www.bayesia.com
                                                                              32
Knowledge Discovery in the Stock Market with Bayesian Networks



ously cause a sharp drop for Colgate-Palmolive’s stock. Conversely, a win for Colgate-Palmolive would result in just the
opposite. However, our question is how either outcome would affect Procter & Gamble’s return, PG. We can best an-
swer this question by simulating either outcome within the Bayesian network we learned.

Prior to setting any evidence, our marginal distributions of returns would be as follows, i.e. this is what we would ex-
pect any given day without any other knowledge:




If we were now to believe in a Johnson & Johnson win in combination with a Colgate-Palmolive loss and the corre-
sponding stock price movement for both of them, we could create the following scenario:




The gray arrows now highlight the impact on all other stocks in this model, including our target variable, PG. The
model suggests that the new distribution for PG would now be distinctly bimodal as opposed to the normal marginal
distribution.




www.conradyscience.com | www.bayesia.com
                                                                            33
Knowledge Discovery in the Stock Market with Bayesian Networks



Now considering the opposite verdict, i.e. a Colgate-Palmolive win and a Johnson & Johnson defeat, we can once again
assume their resulting stock price movements and then infer the impact on PG.




This time, the a gain for PG would be much more probable.

So, if an analyst had a deep understanding of the subject matter (or insider knowledge23 ) and hence could anticipate the
patent trial’s outcome, he should, everything else being equal, update his beliefs regarding the Procter & Gamble stock
return according to the computed inference of our model.

It is important to stress that this doesn’t mean we have discovered a causal pathway, but rather that we are taking ad-
vantage of historically observed associations between returns, which have generated a model in the form of a Bayesian
network. The Bayesian network simply allows us to consequently exploit our learned knowledge.


Adaptive Questionnaire
The Bayesian network from above can perhaps also serve to illustrate how evidence-gathering can be optimized in
BayesiaLab. Once again, this is purely hypothetical, but let’s assume that a stock trader seeks to predict tomorrow’s
return of PG. Tomorrow, as it turns out, earnings will also be released for numerous other stocks in the CPG industry,
excluding PG. With limited time, our stock trader needs to prioritize his research resources on those stocks, which will
be most informative of the PG return. BayesiaLab has a convenient function, Adaptive Questionnaire, which allows the
analyst to adapt his evidence-seeking process as per the most recent information obtained and given the previously
learned Bayesian network (shown again below for reference).




23   It should be noted that insider trading can refer to both legal and illegal conduct. See
http://www.sec.gov/answers/insider.htm


www.conradyscience.com | www.bayesia.com
                                                                             34
Knowledge Discovery in the Stock Market with Bayesian Networks




The function can be called by selecting Inference>Adaptive Questionnaire. The following pop-up window then prompts
to select and con rm the Target.




Initially, the analyst’s research should begin with CL as the most informative Node, which is listed at the top of all
Monitors, right below the Target, PG.




www.conradyscience.com | www.bayesia.com
                                                                          35
Knowledge Discovery in the Stock Market with Bayesian Networks



Let’s now assume he receives a tip, suggesting that CL earnings are coming in much higher than expected. He translates
this updated, subjective beliefs into “soft” evidence and thus sets P(CL>0.017)=60%, P(CL<=0.017)=30%,
P(CL<=0.05)=10%, plus the remaining states to zero.

Upon entering this probability distribution, the Adaptive Questionnaire will move CL to the bottom (green bars with
gray background) and scroll up the next most important Node to study, in this case KMB.




Upon setting this evidence, the probabilities need to be xed by right-clicking the Monitor and selecting Fix Probabili-
ties.




This is important as other simultaneous beliefs have yet to be set. By not xing the probabilities of CL, subsequent evi-
dence could inadvertently update the probabilities that were just de ned.

Next, the analyst may obtain inconclusive views from his sources on KMB and thus he cannot set any new evidence to
this particular Node, although it would be the most informative evidence at this point. Rather, he moves on to CLX,
which is widely believed to meet the expected earnings without any surprises. As a result, our analyst sets hard negative
evidence on either end of the return distribution, meaning that he anticipates no major swings either way:
P(CLX<=-0.11)=0 and P(CLX>0.13)=0. Upon setting this evidence, and once again xing it, the Adaptive Question-




www.conradyscience.com | www.bayesia.com
                                                                             36
Knowledge Discovery in the Stock Market with Bayesian Networks



naire presents a new order of Nodes. Interestingly, given the evidence set on CLX, KMB has declined in importance
with respect to PG.




In the new order JNJ is next and our analyst determines that the stock will de nitely gain based on insider rumors he
heard. He translates this insight into a certain JNJ return greater than 1.2% and sets it as “hard” evidence accordingly.




Given all the evidence he gathered, although some of it may be vague, the analyst concludes that there is now a 90%
probability of a PG return greater than 0.3%. Perhaps more importantly, the chance of a decline of -1.3% or below has
diminished to virtually zero. This translates into an expected mean return of 1.5% versus the a-priori expectation of
0%.

With the Bayesian network generated through Unsupervised Learning and the subsequent application of the Adaptive
Questionnaire, the analyst has optimized his information-seeking process and thus spent the least amount of resources
for a maximum reduction of uncertainty regarding the variable of interest.



www.conradyscience.com | www.bayesia.com
                                                                              37
Knowledge Discovery in the Stock Market with Bayesian Networks



Summary - Supervised Learning
In many ways, Supervised Learning with BayesiaLab resembles traditional modeling and can thus be benchmarked
against a wide range of statistical techniques. In addition to its predictive performance, BayesiaLab offers an array of
analysis tools, which can provide the analyst with a deeper understanding of the domain’s underlying dynamics. The
Bayesian network also provides the basis for a wide range of scenario simulation and optimization algorithms imple-
mented in BayesiaLab. Beyond mere one-time predictions, BayesiaLab allows dealing with evidence interactively and
incrementally, which makes it a highly adaptive tool for real-time inference.




www.conradyscience.com | www.bayesia.com
                                                                            38
Knowledge Discovery in the Stock Market with Bayesian Networks



Appendix

Appendix

Markov Blanket
In many cases, the Markov Blanket algorithm is a good starting point for any predictive model, whether used for scor-
ing or classi cation. This algorithm is extremely fast and can even be applied to databases with thousands of variables
and millions of records.

The Markov Blanket for a node A is the set of nodes composed of A’s parents, its children, and its children’s other par-
ents (=spouses).




The Markov Blanket of the node A contains all the variables, which, if we know their states, will shield the node A
from the rest of the network. This means that the Markov Blanket of a node is the only knowledge needed to predict
the behavior of that node A. Learning a Markov Blanket selects relevant predictor variables, which is particularly help-
ful when there is a large number of variables in the database (In fact, this can also serve as a highly-ef cient variable
selection method in preparation for other types of modeling, outside the Bayesian network framework).


Bayes’ Theorem
Bayes’ theorem relates the conditional and marginal probabilities of discrete events A and B, provided that the probabil-
ity of B does not equal zero:


              P(B A)P(A)
P(A B) =
                 P(B)

In Bayes’ theorem, each probability has a conventional name:

• P(A) is the prior probability (or “unconditional” or “marginal” probability) of  A. It is “prior” in the sense that it
  does not take into account any information about  B. The unconditional probability  P(A) was called “a  priori” by
  Ronald A. Fisher.

• P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from
  or depends upon the speci ed value of B.



www.conradyscience.com | www.bayesia.com
                                                                                 39
Knowledge Discovery in the Stock Market with Bayesian Networks



• P(B|A) is the conditional probability of B given A. It is also called the likelihood.

• P(B) is the prior or marginal probability of B.

Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B is
related to the converse conditional probability of B given A.



About the Authors
Stefan Conrady
Stefan Conrady is the cofounder and managing partner of Conrady Applied Science, LLC, a privately held consulting
 rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010, Conrady Applied
Science was appointed the authorized sales and consulting partner of Bayesia S.A.S. for North America.

Stefan Conrady studied Electrical Engineering and has extensive management experience in the elds of product plan-
ning, marketing and analytics, working at Daimler and BMW Group in Europe, North America and Asia. Prior to es-
tablishing his own rm, he was heading the Analytics & Forecasting group at Nissan North America.

Lionel Jouffe
Dr. Lionel Jouffe is cofounder and CEO of France-based Bayesia S.A.S. Lionel Jouffe holds a Ph.D. in Computer Science
and has been working in the eld of Arti cial Intelligence since the early 1990s. He and his team have been developing
BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and
knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as
in business and industry. The relevance of Bayesian networks, especially in the context of consumer research, is high-
lighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007.




www.conradyscience.com | www.bayesia.com
                                                                            40
Knowledge Discovery in the Stock Market with Bayesian Networks



Contact Information

Conrady Applied Science, LLC
312 Hamlet’s End Way
Franklin, TN 37067
USA
+1 888-386-8383
info@conradyscience.com
www.conradyscience.com

Bayesia S.A.S.
6, rue Léonard de Vinci
BP 119
53001 Laval Cedex
France
+33(0)2 43 49 75 69
info@bayesia.com
www.bayesia.com



Copyright
© 2011 Conrady Applied Science, LLC and Bayesia S.A.S. All rights reserved.

Any redistribution or reproduction of part or all of the contents in any form is prohibited other than the following:

• You may print or download this document for your personal and noncommercial use only.

• You may copy the content to individual third parties for their personal use, but only if you acknowledge Conrady
  Applied Science, LLC and Bayesia S.A.S as the source of the material.

• You may not, except with our express written permission, distribute or commercially exploit the content. Nor may
  you transmit it or store it in any other website or other form of electronic retrieval system.




www.conradyscience.com | www.bayesia.com
                                                                               41

Weitere ähnliche Inhalte

Andere mochten auch

Energy Sense Benelux Introductie
Energy Sense Benelux IntroductieEnergy Sense Benelux Introductie
Energy Sense Benelux Introductie
LeonCoolen
 
Medicina En La Era De Facebook.Ppt [Recuperado]
Medicina En La Era De Facebook.Ppt [Recuperado]Medicina En La Era De Facebook.Ppt [Recuperado]
Medicina En La Era De Facebook.Ppt [Recuperado]
Josea Perez
 
Aoe 2009 Pp 9.30.09
Aoe 2009 Pp 9.30.09Aoe 2009 Pp 9.30.09
Aoe 2009 Pp 9.30.09
guestb55a12
 
Panpattana
PanpattanaPanpattana
Panpattana
sakeenan
 
Fy2006 Mfc Construction
Fy2006 Mfc ConstructionFy2006 Mfc Construction
Fy2006 Mfc Construction
Paul Melton
 

Andere mochten auch (18)

NEDMAInno14: Targeting Audiences with Direct Response Campaigns on Mobile - T...
NEDMAInno14: Targeting Audiences with Direct Response Campaigns on Mobile - T...NEDMAInno14: Targeting Audiences with Direct Response Campaigns on Mobile - T...
NEDMAInno14: Targeting Audiences with Direct Response Campaigns on Mobile - T...
 
Energy Sense Benelux Introductie
Energy Sense Benelux IntroductieEnergy Sense Benelux Introductie
Energy Sense Benelux Introductie
 
Medicina En La Era De Facebook.Ppt [Recuperado]
Medicina En La Era De Facebook.Ppt [Recuperado]Medicina En La Era De Facebook.Ppt [Recuperado]
Medicina En La Era De Facebook.Ppt [Recuperado]
 
How the West was One Gold Rush Survival Kit
How the West was One Gold Rush Survival KitHow the West was One Gold Rush Survival Kit
How the West was One Gold Rush Survival Kit
 
Tutorial Search With Custom Column Slide Share
Tutorial Search With Custom Column Slide ShareTutorial Search With Custom Column Slide Share
Tutorial Search With Custom Column Slide Share
 
Agenda grupo(1) 2013-2014
Agenda grupo(1) 2013-2014Agenda grupo(1) 2013-2014
Agenda grupo(1) 2013-2014
 
Tilting at Windmills with ctypes and cygwinreg
Tilting at Windmills with ctypes and cygwinregTilting at Windmills with ctypes and cygwinreg
Tilting at Windmills with ctypes and cygwinreg
 
Aoe 2009 Pp 9.30.09
Aoe 2009 Pp 9.30.09Aoe 2009 Pp 9.30.09
Aoe 2009 Pp 9.30.09
 
Css3 fontface
Css3 fontfaceCss3 fontface
Css3 fontface
 
Blank Canvas
Blank CanvasBlank Canvas
Blank Canvas
 
Solutions.hw1
Solutions.hw1Solutions.hw1
Solutions.hw1
 
Panpattana
PanpattanaPanpattana
Panpattana
 
Customer Service by Jamie Haenggi
Customer Service by Jamie HaenggiCustomer Service by Jamie Haenggi
Customer Service by Jamie Haenggi
 
NEDMA14: 10 Types of Visuals to Boost Your Social Media Engagement - Bob Car...
NEDMA14: 10 Types of Visuals to Boost Your Social Media Engagement  - Bob Car...NEDMA14: 10 Types of Visuals to Boost Your Social Media Engagement  - Bob Car...
NEDMA14: 10 Types of Visuals to Boost Your Social Media Engagement - Bob Car...
 
Automotive industry sario
Automotive industry sarioAutomotive industry sario
Automotive industry sario
 
NEDMA14: Targeting Audiences with Direct Response Campaigns on Mobile - Ted M...
NEDMA14: Targeting Audiences with Direct Response Campaigns on Mobile - Ted M...NEDMA14: Targeting Audiences with Direct Response Campaigns on Mobile - Ted M...
NEDMA14: Targeting Audiences with Direct Response Campaigns on Mobile - Ted M...
 
Favorite Apps and Business Tools
Favorite Apps and Business ToolsFavorite Apps and Business Tools
Favorite Apps and Business Tools
 
Fy2006 Mfc Construction
Fy2006 Mfc ConstructionFy2006 Mfc Construction
Fy2006 Mfc Construction
 

Ähnlich wie Knowledge Discovery in Stock Market

Visualization library and tools
Visualization library and toolsVisualization library and tools
Visualization library and tools
seung hyun Seo
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
University of Washington
 

Ähnlich wie Knowledge Discovery in Stock Market (12)

Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
 
Bayesia Lab Choice Modeling 1
Bayesia Lab Choice Modeling 1Bayesia Lab Choice Modeling 1
Bayesia Lab Choice Modeling 1
 
The Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research SoftwareThe Bayesia Portfolio of Research Software
The Bayesia Portfolio of Research Software
 
Visualization library and tools
Visualization library and toolsVisualization library and tools
Visualization library and tools
 
BayesiaLab 5.0 Introduction
BayesiaLab 5.0 IntroductionBayesiaLab 5.0 Introduction
BayesiaLab 5.0 Introduction
 
Adapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan CodinaAdapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan Codina
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Putting Together the Pieces - A Guide to S&OP Technology Selection- 20 AUGUST...
Putting Together the Pieces - A Guide to S&OP Technology Selection- 20 AUGUST...Putting Together the Pieces - A Guide to S&OP Technology Selection- 20 AUGUST...
Putting Together the Pieces - A Guide to S&OP Technology Selection- 20 AUGUST...
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
 
Open sourcebi
Open sourcebiOpen sourcebi
Open sourcebi
 
Microarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLabMicroarray Analysis with BayesiaLab
Microarray Analysis with BayesiaLab
 
Information Architecture 3.0 (Second Life)
Information Architecture 3.0 (Second Life)Information Architecture 3.0 (Second Life)
Information Architecture 3.0 (Second Life)
 

Knowledge Discovery in Stock Market

  • 1. Knowledge Discovery in the Stock Market Supervised and Unsupervised Learning with BayesiaLab Stefan Conrady, stefan.conrady@conradyscience.com Dr. Lionel Jouffe, jouffe@bayesia.com June 29, 2011 Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
  • 2. Knowledge Discovery in the Stock Market with Bayesian Networks Table of Contents Tutorial Highlights 1 Background & Objective 1 Notation 2 Dataset 3 Data Preparation and Transformation 4 Data Import 5 Determining Discretization Intervals 6 Modeling Mode 8 Unsupervised Learning 12 Bayesian Network versus Correlation Matrix 16 Inference with Bayesian Networks 16 Inference with Hard Evidence 18 Inference with Soft Evidence 22 Bayesian Network Metrics 25 Arc Force 25 Mutual Information 26 Correlation 27 Summary - Unsupervised Learning 27 Supervised Learning 29 Inference with Supervised Learning 32 Adaptive Questionnaire 34 Summary - Supervised Learning 38 Appendix Appendix 39 Markov Blanket 39 Bayes’ Theorem 39 About the Authors 40 www.conradyscience.com | www.bayesia.com ii
  • 3. Knowledge Discovery in the Stock Market with Bayesian Networks Stefan Conrady 40 Lionel Jouffe 40 Contact Information 41 Conrady Applied Science, LLC 41 Bayesia S.A.S. 41 Copyright 41 www.conradyscience.com | www.bayesia.com iii
  • 4. Knowledge Discovery in the Stock Market with Bayesian Networks Tutorial Highlights • Unsupervised Learning with BayesiaLab can rapidly generate plausible structures of unfamiliar problem domains, as illustrated in this paper with examples from the U.S. stock market. • Supervised Learning with BayesiaLab delivers reliable models in high-dimensional domains, providing both powerful predictive performance plus a platform for simulating domain dynamics. • Knowledge representation with Bayesian networks is highly intuitive and effectively provides computable knowledge that allows inference and reasoning under uncertainty. Background & Objective Perhaps more than any other kind of time series data, nancial markets have been scrutinized by countless mathemati- cians, economists, investors and speculators over hundreds of years. Even in modern times, despite all scienti c ad- vances, the effort of predicting future movements of the stock market sometimes still bears resemblance to the ancient alchemistic aspirations of turning base metals into gold. That is not to say that there is no genuine scienti c effort in studying nancial markets, but distinguishing serious research from charlatanism (or even fraud) remains remarkably dif cult. We neither aspire to develop a crystal ball for investors nor do we expect to contribute to the economic and economet- ric literature. However, we nd the wealth of data in the nancial markets to be fertile ground for experimenting with knowledge discovery algorithms and for generating knowledge representations in the form of Bayesian networks. This area can perhaps serve as a very practical proof of the powerful properties of Bayesian networks, as we can quickly compare machine-learned ndings with our own understanding of market dynamics. For instance, the prevailing opin- ions among investors regarding the relationships between major stocks should be re ected in any structure that is to be discovered by our algorithms. More speci cally, we will utilize the unsupervised and supervised learning algorithms of the BayesiaLab software pack- age to automatically generate Bayesian networks from daily stock returns over a six-year period. We will examine 459 stocks from the S&P 500 index, for which observations are available over the entire timeframe. We selected the S&P 500 as the basis for our study, as the companies listed on this index are presumably among the best-known corporations worldwide, so even a casual observer should be able to critically review the machine-learned ndings. In other words, we are trying to machine-learn the obvious, as any mistakes in this process would automatically become self-evident. Quite often experts’ reaction to such machine-learned ndings is, “well, we already knew that.” That is the very point we want to make, as machine-learning can — within seconds — catch up with human expertise accumulated over years, and then rapidly expand beyond what is already known. The power of such algorithmic learning will be still more apparent in entirely unknown domains. However, if we were to machine-learn the structure of a foreign equity market for expository purposes in this paper, chances are that many readers would not immediately be able to judge the resulting structure as plausible or not. www.conradyscience.com | www.bayesia.com 1
  • 5. Knowledge Discovery in the Stock Market with Bayesian Networks In addition to generating human-readable and interpretable structures, we want to illustrate how we can immediately use machine-learned Bayesian networks as “computable knowledge” for automated inference and prediction. Our ob- jective is to gain both a qualitative and quantitative understanding of the stock market by using Bayesian networks. In the quantitative context, we will also show how BayesiaLab can reliably carry out inference with multiple pieces of un- certain and even con icting evidence. The inherent ability of Bayesian networks to perform computations under uncer- tainty makes them highly suitable for a wide range of real-world applications. Continuing the practice established in our previous white papers, we attempt to present the proposed approach in the style of a tutorial, so that each step can be immediately replicated (and scrutinized) by any reader equipped with the BayesiaLab software.1 This re ects our desire to establish a high degree of transparency regarding all proposed methods and to minimize the risk of Bayesian networks being perceived as a black-box technology. Notation To clearly distinguish between natural language, software-speci c functions and example-speci c variable names, the following notation is used: • Bayesian network and BayesiaLab-speci c functions, keywords, commands, etc., are capitalized and shown in bold type. • Names of attributes, variables, nodes and are italicized. 1 The preprocessed dataset with daily return data is available for download from our website: www.conradyscience.com/white_papers/ nancial/SP500_v6_dlog_b.csv www.conradyscience.com | www.bayesia.com 2
  • 6. Knowledge Discovery in the Stock Market with Bayesian Networks Dataset The S&P 500 is a free- oat capitalization-weighted index of the prices of 500 large-cap common stocks actively traded in the United States, which has been published since 1957. The stocks included in the S&P 500 are those of large pub- licly held companies that trade on either of the two largest American stock market exchanges; the New York Stock Ex- change and the NASDAQ. For our case study we have tracked the daily closing prices of all stocks included in the S&P 500 index from January 3, 2005 through December 30, 2010, only excluding those stocks which were not traded con- tinuously over the entire study period. This leaves a total of 459 stock prices with 1,510 observations each. 60 40 A AA 300 AAPL ABC ABT 60 ACE 40 30 20 20 40 20 100 40 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 40 60 ADBE ADI ADM ADP ADSK 40 AEE 40 40 45 20 35 20 20 20 20 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 60 80 AEP AES AET 60 AFL AGN AIG 40 20 1000 30 10 40 500 20 20 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 75 60 75 60 40 AIV AIZ AKAM AKS ALL ALTR 30 10 25 20 25 20 20 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 30 20 AMAT AMD 80 AMGN AMT AMZN AN 40 50 150 20 30 10 10 40 50 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 150 80 100 75 ANF AON APA APC APD APH 40 50 40 30 25 20 50 50 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 www.conradyscience.com | www.bayesia.com 3
  • 7. Knowledge Discovery in the Stock Market with Bayesian Networks Data Preparation and Transformation Rather than treating the time series in levels, we will difference the stock prices and compute the daily returns. More speci cally, we will take differences of the logarithms of the levels, which is a good approximation of the daily stock return in percentage terms. After this transformation, 1,509 observations remain and a selection of the rst 36 stocks (in alphabetical order) is shown below. A AA 0.1 AAPL 0.1 ABC 0.05 ABT ACE 0.1 0.1 0.1 0.0 -0.05 -0.1 -0.1 -0.1 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.1 0.1 ADBE ADI 0.1 ADM 0.05 ADP ADSK AEE 0.1 0.1 -0.1 -0.1 -0.1 -0.1 -0.05 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.2 AEP AES AET 0.25 AFL AGN 0.5 AIG 0.05 0.2 0.1 0.0 -0.1 0.0 -0.05 -0.25 -0.5 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.2 0.2 AIV AIZ 0.2 AKAM AKS ALL 0.1 ALTR 0.2 0.1 0.0 -0.1 -0.2 -0.1 -0.2 -0.2 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 AMAT 0.2 AMD AMGN AMT 0.2 AMZN AN 0.1 0.1 0.1 0.1 0.0 0.0 0.0 -0.2 -0.2 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0.1 ANF AON APA 0.1 APC 0.05 APD APH 0.1 0.1 0.1 0.0 0.0 -0.1 -0.1 -0.05 -0.1 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 www.conradyscience.com | www.bayesia.com 4
  • 8. Knowledge Discovery in the Stock Market with Bayesian Networks Data Import We use BayesiaLab’s Data Import Wizard to load all 459 time series2 into memory from a comma-separated le. BayesiaLab automatically detects the column headers, which contain the ticker symbols3 as variable names. The next step identi es the data types contained in the dataset and, as expected, BayesiaLab nds 459 continuous vari- ables. 2 Although the dataset has a temporal ordering, for expository simplicity we will treat each time interval as an inde- pendent observation. 3 A ticker symbol is a short abbreviation used to uniquely identify publicly traded stocks. www.conradyscience.com | www.bayesia.com 5
  • 9. Knowledge Discovery in the Stock Market with Bayesian Networks There are no missing values in the dataset and we do not want to lter out any observations, so the next screen of the Data Import Wizard can be skipped entirely. The next step, however, is critical. As part of every data import process into BayesiaLab we must discretize any con- tinuous variables, which means all 459 variables in our particular case. BayesiaLab offers a number of algorithms to automatically discretize the continuous variables and one of the most prac- tical ones, for subsequent Unsupervised Learning, is the K-Means algorithm. It provides a very quick way to capture the salient characteristics of probability density curves and creates suitable thresholds for binning purposes. Determining Discretization Intervals Analyst judgement is required though for choosing an appropriate number of intervals. A common heuristic found in the statistical literature is ve observations per parameter. We adapt this as a guide for the minimum number of obser- vations required for each cell in any of the yet-to-be-learned Conditional Probability Tables (CPT). In our particular case we already know that we will initially perform Unsupervised Learning with the Maximum Weight Spanning Tree algorithm. This tree structure implies that each Node will have only have one parent, which, in turn, means that each CPT will have the size determined by number of parent states times the number of child states. Choos- ing ve intervals for the discretization process would thus mean a CPT size of 25 cells.4 With a uniform distribution of the states this would suggest that we have approximately 60 observations per cell, which would clearly be more than enough. However, upon visual inspection of the actual distributions of the variables, the uniform distribution assumption does de nitely not hold. The graph below shows the distribution of variable AA: 4 Other learning algorithms do not have this one-parent constraint and, for instance, a ve-interval discretization with three parents per node would generate CPTs consisting of 625 cells. Even when assuming uniform distributions, the available observations would be insuf cient for estimation purposes. www.conradyscience.com | www.bayesia.com 6
  • 10. Knowledge Discovery in the Stock Market with Bayesian Networks Rather, looking at this graph, it may be more appropriate to assume a normal distribution.5 Given that each Node will have one parent, we would perhaps further assume a bivariate normal distribution for the joint distribution of each pair of Nodes. We need to emphasize that we are not attempting to t distributions per se, but that we are rather trying to nd a heuristic that allows us to establish the minimum number of observations needed to characterize the tail ends of the distributions. An assumed bivariate normal distribution would yield a discrete probability density function similar to what is shown in the table below. In other words, this is what we would expect the Conditional Probability Table (CPT) to approxi- mately look like, once we have discretized the states and learned the CPT from the actual occurrences. However, we have not yet discretized the states and much less estimated the CPT. Actually, we have not really determined how many discretization levels are correct. So, it is a catch-22 and hence the need for a heuristic. Our heuristic is that we use our qualitative understanding of the distributions to determine a reasonable number of in- tervals that provides a minimum number of samples for the tails. More formally, the “thinnest tail” is the minimal local joint probability (MLJP). Assuming 5 states for parent and child each, and with a total of 1,509 observations, this would translate into approximately 4 observations for the MLJP (highlighted in red). :212.-;4<;7=3>?;@4?.- 789 !" !# $ # " !" !"#$% &"'&% #"&(% &"'&% !"#$% :212.-;4<; !# &"'&% (")(% $"*(% (")(% &"'&% 81/.52; $ #"&(% $"*(% &("$#% $"*(% #"&(% @4?.- # &"'&% (")(% $"*(% (")(% &"'&% " !"#$% &"'&% #"&(% &"'&% !"#$% &(!$ +,-./012345- :212.-;4<;7=3>?;@4?.- 789 !" !# $ # " !" 6 #! '' #! 6 :212.-;4<; !# #! )) &6* )) #! 81/.52; $ '' &6* #6! &6* '' @4?.- # #! )) &6* )) #! " 6 #! '' #! 6 Although the number of expected samples for the MLJP appears to be below the recommended minimum, we will for now proceed on this basis and set the number of intervals to 5. Only upon completion of the discretization, and after learning the network including the CPTs, we will know for sure whether this was indeed a reasonable assumption or not. 5 We omit plotting the distributions of all variables, but all the variables’ distributions do indeed resemble the normal distribution. www.conradyscience.com | www.bayesia.com 7
  • 11. Knowledge Discovery in the Stock Market with Bayesian Networks Clicking Finish will now perform the discretization. A progress bar will be shown to track the state of this process. Modeling Mode Upon conclusion, the variables are delivered as blue Nodes into the Graph Panel of BayesiaLab and by default we are now in the Modeling Mode. The original variable names, which were stored the rst line of the database, become our Node Names. www.conradyscience.com | www.bayesia.com 8
  • 12. Knowledge Discovery in the Stock Market with Bayesian Networks At this point it is practical to add Node Comments to the Node Names. Node Comments are typically used in BayesiaLab for longer and more descriptive titles, which can be turned on or off, depending on the desired view of the graph. Here, we associate a dictionary of the complete company names with the Node Comments, while the more com- pact ticker symbols remain as Node Names.6 The syntax for this association is rather straightforward: we simply de ne a text le which includes one Node Name per line. Each Node Name is followed by the equal sign (“=”), or alternatively TAB or SPACE, and then by the full com- pany name, which will serve as the Node Comment. This le can then be loaded into BayesiaLab via Data>Associate Dictionary>Node>Comments. Once the comments are loaded, a small call-out symbol will appear next to each Node Name. This indicates that Node Comments are available for display. 6 To maintain a compact presentation, we will typically use the ticker symbol when referencing a particular stock rather than the full company name. www.conradyscience.com | www.bayesia.com 9
  • 13. Knowledge Discovery in the Stock Market with Bayesian Networks As the name implies, selecting View>Display Node Comments (or alternatively the keyboard shortcut “M”) will reveal the company names. www.conradyscience.com | www.bayesia.com 10
  • 14. Knowledge Discovery in the Stock Market with Bayesian Networks Node Comments can be displayed for either all Nodes or only for selected ones. Before proceeding with the rst learning step, it is also recommended to brie y switch into the Validation Mode (F5) and to check the distributions of the states of the Nodes. The Monitors of the rst nine Nodes are shown below. At rst glance, the distributions appear to be plausible representations of the historical return distributions. www.conradyscience.com | www.bayesia.com 11
  • 15. Knowledge Discovery in the Stock Market with Bayesian Networks Unsupervised Learning To perform the rst Unsupervised Learning algorithm on our dataset, we switch back into Modeling Mode (F4) and select Learning>Association Discovering>Maximum Spanning Tree.7 This starts the Maximum Weight Spanning Tree algorithm, which is the fastest of the Unsupervised Learning algorithms and thus recommended at the beginning of most studies.8 As the name implies, this algorithm generates a tree structure, i.e. it permits only one parent per Node. This constraint is one of the reasons for the extreme learning speed of this algorithm.9 Performing the algorithm with a le of this size should only take a few seconds. 7 In BayesiaLab nomenclature, Unsupervised Learning is listed in the Learning menu as “Association Discovering” 8 Several other Unsupervised Learning algorithms are available in BayesiaLab, including Taboo, EQ, SopLEQ and Ta- boo Order. 9 It goes beyond the scope of this tutorial to discuss the different types of learning algorithms and their speci c proper- ties. www.conradyscience.com | www.bayesia.com 12
  • 16. Knowledge Discovery in the Stock Market with Bayesian Networks At rst glance, however, the resulting network does not appear simple and tree-like at all. This can be quickly resolved with BayesiaLab’s built-in layout algorithms. Selecting View>Automatic Layout (shortcut “P”) rearranges the network instantly to reveal a much more intuitive structure. www.conradyscience.com | www.bayesia.com 13
  • 17. Knowledge Discovery in the Stock Market with Bayesian Networks The resulting, reformatted Bayesian network representing the stock returns can now be read and interpreted immediately:10 11 For instance, we can zoom into the branch of the Bayesian network which contains Procter & Gamble (PG). BayesiaLab offers a search function (shortcut Ctrl-F or ⌘-F), which helps nd individual nodes very easily. 10 A separate, high-resolution PDF of this Bayesian network can be downloaded here: www.conradyscience.com/white_papers/ nancial/SP500_V13.pdf. This allows those readers without an active BayesiaLab installation to explore the network graph in much greater detail. 11 For expositional clarity we have only learned contemporaneous relationships and, as a result, potential lag structures will not appear in this network. However, in BayesiaLab, Unsupervised Learning can be generalized to a temporal ap- plication. A white paper speci cally focusing on learning temporal (or dynamic) Bayesian networks is planned for the near future. www.conradyscience.com | www.bayesia.com 14
  • 18. Knowledge Discovery in the Stock Market with Bayesian Networks The neighborhood of Procter & Gamble contains many familiar company names, mostly from the CPG industry.12 Per- haps these companies appear all-too-obvious and the reader may wonder what insight is gained at this point. Chances are that even a casual observer of the industry would have mentioned Kimberly-Clark, Colgate-Palmolive and Johnson & Johnson as businesses operating in the same eld as Procter & Gamble, which would therefore presumably have somewhat related stock price movements. The key point is that without any prior knowledge of this domain a computer algorithm automatically extracted this structure, i.e. a Bayesian network, which intuitively matches the understanding that we have established over years as consumers of these companies’ products. Clearly, if this was an unfamiliar domain, the knowledge gain for the reader would be far greater. However, a lesser- known domain would presumably prevent the reader’s intuitive veri cation of the machine-discovered structure here. 12 CPG stands for Consumer Packaged Goods. www.conradyscience.com | www.bayesia.com 15
  • 19. Knowledge Discovery in the Stock Market with Bayesian Networks Bayesian Network versus Correlation Matrix The bene t of the concise representation as a Bayesian network is further demonstrated by juxtaposing it to a correla- tion matrix, which would perhaps be the rst step in a traditional statistical analysis of this domain. Even when using heat map-style color-coding, the sheer number of relationships13 makes an immediate visual interpretation of the corre- lation matrix very dif cult (see the subset of 25 by 25 cells from the correlation matrix below). A AA AAPL ABC ADI ADM ADP ADSK AEE AEP AES AET AFL AGN AIV AIZ AKAM AKS ALL ALTR AMAT AMD AMGN AMT AMZN A 1 0.570668 0.46678 0.408163 0.533252 0.425324 0.535525 0.495613 0.531351 0.486749 0.490094 0.384297 0.476417 0.465186 0.506165 0.450875 0.4315 0.533276 0.490529 0.521889 0.541416 0.454983 0.388191 0.526454 0.447969 AA 0.570668 1 0.412423 0.363121 0.432512 0.49727 0.513374 0.453742 0.540668 0.487494 0.555778 0.386198 0.505749 0.417878 0.533665 0.525495 0.433653 0.691676 0.558741 0.443481 0.502896 0.406542 0.357239 0.532022 0.369067 AAPL 0.46678 0.412423 1 0.236667 0.43525 0.323588 0.403402 0.417302 0.340484 0.322327 0.319482 0.289725 0.334087 0.328982 0.402068 0.340316 0.38855 0.432112 0.351426 0.444068 0.463454 0.395558 0.330339 0.437053 0.450858 ABC 0.408163 0.363121 0.236667 1 0.329262 0.298421 0.416881 0.31158 0.440094 0.417974 0.347976 0.408529 0.294418 0.391646 0.33699 0.360633 0.288028 0.340885 0.39043 0.318401 0.309671 0.244243 0.36276 0.347773 0.269919 ADI 0.533252 0.432512 0.43525 0.329262 1 0.321593 0.483858 0.482746 0.425898 0.371848 0.343594 0.314271 0.389693 0.366576 0.462091 0.371839 0.426141 0.460124 0.423266 0.691107 0.638214 0.495377 0.330517 0.467126 0.420969 ADM 0.425324 0.49727 0.323588 0.298421 0.321593 1 0.378516 0.322902 0.452433 0.403492 0.417093 0.305003 0.366817 0.304062 0.366267 0.358504 0.389176 0.452943 0.392224 0.352995 0.339473 0.274791 0.266671 0.414046 0.313261 ADP 0.535525 0.513374 0.403402 0.416881 0.483858 0.378516 1 0.452686 0.542809 0.527541 0.456298 0.372908 0.50101 0.486193 0.526986 0.507023 0.406286 0.476395 0.514611 0.513513 0.515278 0.394056 0.406387 0.48288 0.41627 ADSK 0.495613 0.453742 0.417302 0.31158 0.482746 0.322902 0.452686 1 0.421398 0.402325 0.442238 0.349215 0.417223 0.389226 0.447525 0.405751 0.392804 0.43849 0.41419 0.46149 0.497755 0.396007 0.333145 0.45594 0.383973 AEE 0.531351 0.540668 0.340484 0.440094 0.425898 0.452433 0.542809 0.421398 1 0.756735 0.590583 0.424766 0.513378 0.475327 0.474898 0.473565 0.321768 0.452686 0.537636 0.447271 0.436028 0.31983 0.390525 0.465076 0.32218 AEP 0.486749 0.487494 0.322327 0.417974 0.371848 0.403492 0.527541 0.402325 0.756735 1 0.565275 0.403458 0.42596 0.440173 0.419188 0.458727 0.318872 0.422276 0.459285 0.396228 0.417472 0.292099 0.398822 0.446867 0.314108 AES 0.490094 0.555778 0.319482 0.347976 0.343594 0.417093 0.456298 0.442238 0.590583 0.565275 1 0.378383 0.476892 0.40224 0.420327 0.453099 0.34483 0.492532 0.476188 0.349014 0.398017 0.315139 0.308978 0.438492 0.28071 AET 0.384297 0.386198 0.289725 0.408529 0.314271 0.305003 0.372908 0.349215 0.424766 0.403458 0.378383 1 0.370713 0.421565 0.364347 0.420521 0.249157 0.360531 0.427641 0.290668 0.279035 0.275143 0.321026 0.401321 0.280863 AFL 0.476417 0.505749 0.334087 0.294418 0.389693 0.366817 0.50101 0.417223 0.513378 0.42596 0.476892 0.370713 1 0.418877 0.588516 0.588617 0.351403 0.446767 0.634718 0.390395 0.459462 0.364762 0.285856 0.50493 0.359955 AGN 0.465186 0.417878 0.328982 0.391646 0.366576 0.304062 0.486193 0.389226 0.475327 0.440173 0.40224 0.421565 0.418877 1 0.422619 0.396071 0.323589 0.388559 0.443402 0.332295 0.393542 0.347243 0.345897 0.461649 0.336944 AIV 0.506165 0.533665 0.402068 0.33699 0.462091 0.366267 0.526986 0.447525 0.474898 0.419188 0.420327 0.364347 0.588516 0.422619 1 0.558192 0.408232 0.49093 0.644666 0.485371 0.541239 0.390922 0.30768 0.512831 0.397449 AIZ 0.450875 0.525495 0.340316 0.360633 0.371839 0.358504 0.507023 0.405751 0.473565 0.458727 0.453099 0.420521 0.588617 0.396071 0.558192 1 0.353718 0.45162 0.616235 0.378966 0.430116 0.315676 0.343417 0.513195 0.347806 AKAM 0.4315 0.433653 0.38855 0.288028 0.426141 0.389176 0.406286 0.392804 0.321768 0.318872 0.34483 0.249157 0.351403 0.323589 0.408232 0.353718 1 0.438362 0.364883 0.435992 0.428331 0.368554 0.245363 0.419715 0.385661 AKS 0.533276 0.691676 0.432112 0.340885 0.460124 0.452943 0.476395 0.43849 0.452686 0.422276 0.492532 0.360531 0.446767 0.388559 0.49093 0.45162 0.438362 1 0.478014 0.420897 0.475609 0.423204 0.337167 0.508704 0.390437 ALL 0.490529 0.558741 0.351426 0.39043 0.423266 0.392224 0.514611 0.41419 0.537636 0.459285 0.476188 0.427641 0.634718 0.443402 0.644666 0.616235 0.364883 0.478014 1 0.436321 0.503192 0.387605 0.312268 0.525026 0.351342 ALTR 0.521889 0.443481 0.444068 0.318401 0.691107 0.352995 0.513513 0.46149 0.447271 0.396228 0.349014 0.290668 0.390395 0.332295 0.485371 0.378966 0.435992 0.420897 0.436321 1 0.645041 0.490712 0.332572 0.480285 0.443469 AMAT 0.541416 0.502896 0.463454 0.309671 0.638214 0.339473 0.515278 0.497755 0.436028 0.417472 0.398017 0.279035 0.459462 0.393542 0.541239 0.430116 0.428331 0.475609 0.503192 0.645041 1 0.481282 0.354883 0.482778 0.435212 AMD 0.454983 0.406542 0.395558 0.244243 0.495377 0.274791 0.394056 0.396007 0.31983 0.292099 0.315139 0.275143 0.364762 0.347243 0.390922 0.315676 0.368554 0.423204 0.387605 0.490712 0.481282 1 0.230527 0.390012 0.318144 AMGN 0.388191 0.357239 0.330339 0.36276 0.330517 0.266671 0.406387 0.333145 0.390525 0.398822 0.308978 0.321026 0.285856 0.345897 0.30768 0.343417 0.245363 0.337167 0.312268 0.332572 0.354883 0.230527 1 0.327344 0.330847 AMT 0.526454 0.532022 0.437053 0.347773 0.467126 0.414046 0.48288 0.45594 0.465076 0.446867 0.438492 0.401321 0.50493 0.461649 0.512831 0.513195 0.419715 0.508704 0.525026 0.480285 0.482778 0.390012 0.327344 1 0.412541 AMZN 0.447969 0.369067 0.450858 0.269919 0.420969 0.313261 0.41627 0.383973 0.32218 0.314108 0.28071 0.280863 0.359955 0.336944 0.397449 0.347806 0.385661 0.390437 0.351342 0.443469 0.435212 0.318144 0.330847 0.412541 1 Admittedly, there are a number of statistical techniques available which can help in this situation, but the point is that generating a Bayesian network (e.g. with the Maximum Weight Spanning Tree algorithm we used) takes the practitioner about the same amount of time as computing a correlation matrix, yet the former yields a much richer picture. Beyond visual interpretability, there is another key distinction between these two representations. Whereas the correla- tion matrix is merely descriptive, the Bayesian network is actually computable. By its very nature, any Bayesian network is a functioning model. On the other hand, with the correlation matrix one could not predict the value of one stock given the observation of several others. For this purpose, we would have to t and estimate speci c models, e.g. a re- gression. In a Bayesian network, however, we can use the graph of the Bayesian network itself for computing inference. For instance, given that we observe the values of JNJ and CL, we immediately obtain an updated value for PG and, at the same time, also updated values for all other Nodes in the network. We refer to this property as omnidirectional in- ference, which re ects the updating of beliefs given evidence according to Bayes’ Rule.14 We shall illustrate carrying out omnidirectional inference in the next section. Inference with Bayesian Networks We have shown that the Maximum Weight Spanning Tree algorithm can generate a readily-interpretable and fully- computable Bayesian network from daily stock return data. However, we have not yet explained in detail what this structure represents speci cally. Each Arc in this structure represents a probabilistic relationship between a pair of Nodes. The parameters15 of these relationships are encoded in Conditional Probability Tables. In the example of the PG and JNJ relationship shown be- low, the table de nes the probabilities of the states of PG, given the states of JNJ. This table can be accessed in the Modeling Mode by simply double-clicking on the desired Node, which opens up the Node Editor. 13 459 2 − 459 = 105,111 2 14 See appendix for a brief summary of Bayes’ Theorem. 15 We use the term “parameter” rather loosely in this context, as Bayesian networks are entirely nonparametric models in BayesiaLab. www.conradyscience.com | www.bayesia.com 16
  • 20. Knowledge Discovery in the Stock Market with Bayesian Networks For clarity, we show the relevant portion of the network for JNJ and PG below plus an enlarged version of the condi- tional probability table from the Node Editor: This says, among other things, given that we observe a JNJ return greater than 1.2%, there would be a 50.9% probabil- ity that we would observe a PG return of greater than 1.2% (see bottom right cell in the above table). More formally we can also write, P(PG>0.012 | JNJ > 0.012) = 50.9%. The upper left cell says, given that we observe a JNJ return smaller than -0.9% there is a 46.5% probability that we will observe a PG return smaller than -1.3%, i.e. P(PG<=0.013 | JNJ <=0.009) = 46.5%.16 If we follow the network “downstream,” i.e from PG to KMB, we see that their relationship is quanti ed in yet another conditional probability table. 16 As the discretization intervals were generated by the K-Means algorithm, the bins do not necessarily have the same interval size, which we see in this example. www.conradyscience.com | www.bayesia.com 17
  • 21. Knowledge Discovery in the Stock Market with Bayesian Networks This can be interpreted in the same way: given that we observe a return of PG greater than 1.2%, there is a 42.4% probability that we would also observe a KMB return of higher than 1.2%. This kind of inference is perhaps the sim- plest type, as we can directly read the table, i.e. “given this, then that.” Inference with Hard Evidence Beyond reviewing the conditional probability tables directly in Modeling Mode in the Node Editor, as above, we can carry out inference conveniently in the Validation Mode (shortcut F5) of BayesiaLab. This allows setting evidence and observing inference directly via the Monitors in the Monitor Panel (right side of screen- shot). We will now highlight JNJ and PG and focus on their Monitors only. Prior to setting any evidence, we will sim- ply see their marginal distributions in the Monitors. As we would expect, we see the returns distributed around 0 and the expected value of the returns is 0. www.conradyscience.com | www.bayesia.com 18
  • 22. Knowledge Discovery in the Stock Market with Bayesian Networks Observing a speci c state of a Node is equivalent to setting evidence and we can do that directly on the histograms in- side the Monitors. For instance, we can double-click on the state JNJ > 0.012, which sets it to a 100% probability, as indicated by the green bar. Setting such evidence will automatically propagate this evidence throughout the network and we can immediately observe the new distribution of PG. The gray arrows indicate how the distributions have changed compared to before setting evidence. So far, this provides no more insight than what we could read from the Conditional Probability Table in the Node Edi- tor of the PG Node. What is not readily accessible from the CPT is the inverse probability by carrying out inference in the opposite direction of the Arc, i.e. setting evidence on PG and computing JNJ. Bayes’ Rule speci es the necessary computation in this case.17 17 See appendix for more details about Bayes’ Rule. Although this calculation is straightforward, application errors are unfortunately commonplace. The error is so common that is now widely known as the Prosecutor’s Fallacy. In a recent white paper, Paradoxes and Fallacies, we dedicated a chapter to this problem: www.conradyscience.com/index.php/paradoxes www.conradyscience.com | www.bayesia.com 19
  • 23. Knowledge Discovery in the Stock Market with Bayesian Networks In BayesiaLab the inference computation of JNJ is automatic once we set evidence to PG. To illustrate this, we arbitrar- ily set the PG return to <=-1.3% and we can immediately see the updated distribution of JNJ. So far, this could have been computed quite easily by directly applying Bayes’ Rule. It becomes a bit more challenging when we look at more than two Nodes at the same time. This time we will examine JNJ, PG and KMB (their relevant subnetwork is shown for reference below). Once again, prior to setting any evidence, the Monitors show the marginal distributions of JNJ, PG and KMB. www.conradyscience.com | www.bayesia.com 20
  • 24. Knowledge Discovery in the Stock Market with Bayesian Networks Upon setting JNJ > 0.012, we can now see how the evidence not only propagates to PG, but also further “downstream” to KMB: We can also invert the chain of inference by simply setting evidence at the other end of the network, e.g. KMB > 0.012: www.conradyscience.com | www.bayesia.com 21
  • 25. Knowledge Discovery in the Stock Market with Bayesian Networks Or, we can set evidence on both ends, i.e. on JNJ and KMB, and then read the inference in the middle, for PG. This inference will probably not surprise us: we now have an 80% probability that PG will have a return greater than 1.2%, given that we set both JNJ and KMB to >0.012. Inference with Soft Evidence We are not limited to only setting “hard evidence,” as we did above. In the real world, observations often provide “soft evidence” only. So, instead of setting any of these variables to a state with a 100% probability and thus make them “hard evidence,” we can use BayesiaLab to set any evidence according to its nature, even when it is uncertain. For illustration purposes, we will now generate two kinds of “soft evidence,” one for JNJ and one for KMB. 1. We set the evidence directly by right-clicking on the JNJ Monitor and selecting Enter Probabilities: We can now adjust the histogram by dragging the bars to the desired probability levels which re ect our subjective belief. www.conradyscience.com | www.bayesia.com 22
  • 26. Knowledge Discovery in the Stock Market with Bayesian Networks Clicking the light-green button con rms our choice of probabilities. In addition, we right-click on the Monitor again to Fix Probabilities, meaning that we want to hold these values re- gardless of any subsequent evidence we enter. 2. Assuming that we have a more general expectation regarding the KMB return, without having any beliefs regarding the probabilities of speci c states, we can set the expected mean of the entire KMB distribution. For instance, we set the expected mean of the states of KMB to -1% by right-clicking the KMB Monitor and selecting Distribution for Target Value/Mean. www.conradyscience.com | www.bayesia.com 23
  • 27. Knowledge Discovery in the Stock Market with Bayesian Networks We type in “-0.01” into the dialog box, which generates a new KMB distribution with the desired mean value of -0.01 or -1%. It is obvious that an in nite number of combinations could generate a mean value of -1%. However, as an aid to the analyst, BayesiaLab computes which distribution with a mean value of -1% would be “closest” to the a-priori distri- bution. Not only are these observations “soft,” in this example they are also of the opposite sign, i.e. JNJ has a positive mean of the return and KMB has a negative mean of the return. As a result, carrying out inference generates a more uniform probability distribution for PG (rather than a narrower distribution), effectively increasing our uncertainty about the state of PG compared to the marginal distribution. The knowledge gain for the analyst is that greater volatility for PG must be expected. We have limited our example to inference within a small subnetwork of only three Nodes, but we could have performed the same approach over the entire Bayesian network of 459 Nodes. With this, the analyst has the complete freedom to set an unlimited number of all different kinds of evidence, both hard and soft, and to carry out inference “backwards” and “forwards” within the network. For users of the BayesiaLab software, the automatic computation of inference and the instant visual updating of the Monitors is comparable to recalculating all cells in a large spreadsheet. www.conradyscience.com | www.bayesia.com 24
  • 28. Knowledge Discovery in the Stock Market with Bayesian Networks Bayesian Network Metrics As shown in these examples, the Arcs represent the probabilistic relationships between Nodes. In addition to visually interpreting the network structure, and beyond carrying out inference, we can also review the “summary statistics” of the network and its components with several metrics. It is important to point out that we use the information theory-based concepts of Entropy, Arc Force and Mutual In- formation as central metrics in generating and analyzing Bayesian networks. This is a clear departure from commonly used metrics in traditional statistics, such as covariance and correlation. While these information theory-based metrics may appear novel to end-users of research, they have many advantages. Most importantly, we can entirely discard the (often incorrect) assumption regarding linearity and normal distributions. As a result, highly nonlinear dynamics can be easily captured in a Bayesian network. Arc Force For instance, the importance of each Arc can be highlighted by displaying the associated Arc Force and its contribution with respect to the overall network. From within the Validation Mode, the Arc Force can be displayed by selecting Analysis>Graphic>Arc Force (or with the shortcut “F”). www.conradyscience.com | www.bayesia.com 25
  • 29. Knowledge Discovery in the Stock Market with Bayesian Networks Mutual Information A perhaps more accessible interpretation is possible by displaying the Mutual Information, which can be obtained by selecting Analysis>Graphic>Arcs’ Mutual Information.18 The Mutual Information I(X,Y) measures how much (on average) the observation of random variable Y tells us about the uncertainty of X, i.e. by how much the entropy of X is reduced if we have information on Y. Mutual Information is a symmetric metric, which re ects the uncertainty reduction of X by knowing Y as well as of Y by knowing X. In our example, knowing the value of PG on average reduces the uncertainty of the value of KMB by 0.2843 bits, which means that it reduces its uncertainty by 13.27% (shown in blue, in the direction of the arc). Conversely, knowing KMB reduces the uncertainty or PG by 13.09% (shown in red, in the opposite direction of the arc). 18 Although interpreting Mutual Information is somewhat more intuitive, in the case of a network tree, Mutual Infor- mation is identical to Arc Force. For Bayesian networks that are not trees, this distinction becomes very important. www.conradyscience.com | www.bayesia.com 26
  • 30. Knowledge Discovery in the Stock Market with Bayesian Networks Correlation While we emphasize the importance of Arc Force and Mutual Information as measures capable for capturing nonlinear relationships, BayesiaLab allows to display Pearson’s R for the network (select Analysis>Graphic>Pearson’s Correlation or shortcut “G”). By displaying the Pearson’s correlation coef cient, we implicitly make the assumption of linear relationships between the connected Nodes, which may often not hold in practice. Special care must thus be taken when interpreting low val- ues of R, as they may re ect nonlinearity rather than independence. On the other hand, R values close to 1 do indeed suggest the presence of linear relationship. Furthermore, Pearson’s R can be very helpful for determining the sign of the relationship between variables. BayesiaLab will color-code positive and negative correlations by highlighting the associ- ated Arcs in blue and red respectively. Finally, correlation is typically a much more familiar metric to most audiences who are not familiar with Mutual Information. Summary - Unsupervised Learning In summary, Unsupervised Learning is an excellent approach to obtain a general understanding of simultaneous rela- tionships between many variables in a dataset. The learned Bayesian network allows immediate visual interpretation www.conradyscience.com | www.bayesia.com 27
  • 31. Knowledge Discovery in the Stock Market with Bayesian Networks plus immediate computation of omnidirectional inference based on any type of evidence, including uncertain and con- icting observations. Given these properties, Unsupervised Learning with Bayesian networks becomes a universal and robust tool for knowledge discovery and modeling in unknown problem domains. www.conradyscience.com | www.bayesia.com 28
  • 32. Knowledge Discovery in the Stock Market with Bayesian Networks Supervised Learning Upon gaining a general understanding of a domain, questions typically arise regarding individual variables and how to predict them speci cally. Even though we can use Unsupervised Learning to discover a network structure and use it for prediction, Supervised Learning is often a more appropriate method when studying a speci c target variable. By focus- ing on a single target variable, BayesiaLab’s learning algorithms focus on tting a (generative) model to a single target rather than tting a model that balances the t in terms of all variables. To remain consistent with the example we started earlier, we will once again use PG for illustration purposes. More speci cally, we will characterize PG as the Target Node. We can do so by right-clicking on the node and then selecting Set as Target Node from the contextual menu (or by double-clicking the Node while holding “T”). Now that we have de ned a Target Node, we can perform a range of Supervised Learning algorithms implemented in BayesiaLab.19 The Markov Blanket20 algorithm is suitable for this kind of application and its speed is particularly helpful when deal- ing with hundreds or even thousands of variables. Furthermore, BayesiaLab offers the Augmented Markov Blanket, which starts with the Markov Blanket structure and then uses an unsupervised search to nd the probabilistic relations that hold between each variable belonging to the Markov Blanket.21 This unsupervised search requires additional com- putation time but generally results in an improved predictive performance of the model. The learning process can be started by selecting Learning>Target Node Characterization>Augmented Markov Blanket from the menu.22 19 For expositional clarity we will only learn contemporaneous relationships and, as a result, potential lag structures will not appear in the resulting networks. However, in BayesiaLab, Supervised Learning can be generalized to a temporal application. 20 See appendix for a de nition of the Markov Blanket 21 Intuitively, the “augmented” part of the network plays the same role as the interaction terms between independent variables in a regression. 22 In BayesiaLab nomenclature, Supervised Learning is listed in the Learning menu as “Target Node Characterization” www.conradyscience.com | www.bayesia.com 29
  • 33. Knowledge Discovery in the Stock Market with Bayesian Networks As we still have our previous network that was generated through Unsupervised Learning, we need to con rm the dele- tion of that original network before proceeding with Supervised Learning. After a few seconds, we will see the result of the Supervised Learning process. Our Target Node PG is now connected to all variables in its Markov Blanket. This means that, given the knowledge of the Nodes in the Markov Blanket, PG is independent of the remaining network. This effectively identi es the subset of variables which are most important for predicting the value of the Target Node, PG. As stated in the introduction, it is not our intention to forecast stock prices per se, but rather to identify meaningful and relevant structures in the market. Such a structure is this Augmented Markov Blanket and a stock market analyst can use it to identify a relevant subset of stocks for an in-depth analysis, perhaps with the objective of establishing a buy/sell recommendation or to directly trade on such knowledge. Once we have this network, we can use it to analyze these Nodes’ relationships in a number of ways within BayesiaLab. For instance, we can select Analysis>Graphic>Target Mean Analysis, which graphs PG as a function of the other Nodes in the network. www.conradyscience.com | www.bayesia.com 30
  • 34. Knowledge Discovery in the Stock Market with Bayesian Networks Alternatively, by selecting Analysis>Report>Target Analysis>Correlation with the Target Node, we obtain a table displaying the Mutual Information between the Nodes in the network and the Target Variable, PG: www.conradyscience.com | www.bayesia.com 31
  • 35. Knowledge Discovery in the Stock Market with Bayesian Networks By clicking Quadrants these values can be displayed as a graph: Inference with Supervised Learning To illustrate potential applications of Supervised Learning, beyond interpretation, we have created a simple simulation of possible stock market conditions. Despite the hypothetical nature of these scenarios, the underlying Bayesian network was learned from actual market data (as is the case for this entire white paper) and, as a result, the computed inference based on these assumed conditions is “real.” One could imagine this purely hypothetical scenario: Colgate-Palmolive and Johnson & Johnson are involved in a pat- ent lawsuit and an investment analyst speculates about the impact of the imminent verdict in this court case. It is fairly easy to imagine that a verdict in favor of Johnson & Johnson would result in a boost to its stock price and simultane- www.conradyscience.com | www.bayesia.com 32
  • 36. Knowledge Discovery in the Stock Market with Bayesian Networks ously cause a sharp drop for Colgate-Palmolive’s stock. Conversely, a win for Colgate-Palmolive would result in just the opposite. However, our question is how either outcome would affect Procter & Gamble’s return, PG. We can best an- swer this question by simulating either outcome within the Bayesian network we learned. Prior to setting any evidence, our marginal distributions of returns would be as follows, i.e. this is what we would ex- pect any given day without any other knowledge: If we were now to believe in a Johnson & Johnson win in combination with a Colgate-Palmolive loss and the corre- sponding stock price movement for both of them, we could create the following scenario: The gray arrows now highlight the impact on all other stocks in this model, including our target variable, PG. The model suggests that the new distribution for PG would now be distinctly bimodal as opposed to the normal marginal distribution. www.conradyscience.com | www.bayesia.com 33
  • 37. Knowledge Discovery in the Stock Market with Bayesian Networks Now considering the opposite verdict, i.e. a Colgate-Palmolive win and a Johnson & Johnson defeat, we can once again assume their resulting stock price movements and then infer the impact on PG. This time, the a gain for PG would be much more probable. So, if an analyst had a deep understanding of the subject matter (or insider knowledge23 ) and hence could anticipate the patent trial’s outcome, he should, everything else being equal, update his beliefs regarding the Procter & Gamble stock return according to the computed inference of our model. It is important to stress that this doesn’t mean we have discovered a causal pathway, but rather that we are taking ad- vantage of historically observed associations between returns, which have generated a model in the form of a Bayesian network. The Bayesian network simply allows us to consequently exploit our learned knowledge. Adaptive Questionnaire The Bayesian network from above can perhaps also serve to illustrate how evidence-gathering can be optimized in BayesiaLab. Once again, this is purely hypothetical, but let’s assume that a stock trader seeks to predict tomorrow’s return of PG. Tomorrow, as it turns out, earnings will also be released for numerous other stocks in the CPG industry, excluding PG. With limited time, our stock trader needs to prioritize his research resources on those stocks, which will be most informative of the PG return. BayesiaLab has a convenient function, Adaptive Questionnaire, which allows the analyst to adapt his evidence-seeking process as per the most recent information obtained and given the previously learned Bayesian network (shown again below for reference). 23 It should be noted that insider trading can refer to both legal and illegal conduct. See http://www.sec.gov/answers/insider.htm www.conradyscience.com | www.bayesia.com 34
  • 38. Knowledge Discovery in the Stock Market with Bayesian Networks The function can be called by selecting Inference>Adaptive Questionnaire. The following pop-up window then prompts to select and con rm the Target. Initially, the analyst’s research should begin with CL as the most informative Node, which is listed at the top of all Monitors, right below the Target, PG. www.conradyscience.com | www.bayesia.com 35
  • 39. Knowledge Discovery in the Stock Market with Bayesian Networks Let’s now assume he receives a tip, suggesting that CL earnings are coming in much higher than expected. He translates this updated, subjective beliefs into “soft” evidence and thus sets P(CL>0.017)=60%, P(CL<=0.017)=30%, P(CL<=0.05)=10%, plus the remaining states to zero. Upon entering this probability distribution, the Adaptive Questionnaire will move CL to the bottom (green bars with gray background) and scroll up the next most important Node to study, in this case KMB. Upon setting this evidence, the probabilities need to be xed by right-clicking the Monitor and selecting Fix Probabili- ties. This is important as other simultaneous beliefs have yet to be set. By not xing the probabilities of CL, subsequent evi- dence could inadvertently update the probabilities that were just de ned. Next, the analyst may obtain inconclusive views from his sources on KMB and thus he cannot set any new evidence to this particular Node, although it would be the most informative evidence at this point. Rather, he moves on to CLX, which is widely believed to meet the expected earnings without any surprises. As a result, our analyst sets hard negative evidence on either end of the return distribution, meaning that he anticipates no major swings either way: P(CLX<=-0.11)=0 and P(CLX>0.13)=0. Upon setting this evidence, and once again xing it, the Adaptive Question- www.conradyscience.com | www.bayesia.com 36
  • 40. Knowledge Discovery in the Stock Market with Bayesian Networks naire presents a new order of Nodes. Interestingly, given the evidence set on CLX, KMB has declined in importance with respect to PG. In the new order JNJ is next and our analyst determines that the stock will de nitely gain based on insider rumors he heard. He translates this insight into a certain JNJ return greater than 1.2% and sets it as “hard” evidence accordingly. Given all the evidence he gathered, although some of it may be vague, the analyst concludes that there is now a 90% probability of a PG return greater than 0.3%. Perhaps more importantly, the chance of a decline of -1.3% or below has diminished to virtually zero. This translates into an expected mean return of 1.5% versus the a-priori expectation of 0%. With the Bayesian network generated through Unsupervised Learning and the subsequent application of the Adaptive Questionnaire, the analyst has optimized his information-seeking process and thus spent the least amount of resources for a maximum reduction of uncertainty regarding the variable of interest. www.conradyscience.com | www.bayesia.com 37
  • 41. Knowledge Discovery in the Stock Market with Bayesian Networks Summary - Supervised Learning In many ways, Supervised Learning with BayesiaLab resembles traditional modeling and can thus be benchmarked against a wide range of statistical techniques. In addition to its predictive performance, BayesiaLab offers an array of analysis tools, which can provide the analyst with a deeper understanding of the domain’s underlying dynamics. The Bayesian network also provides the basis for a wide range of scenario simulation and optimization algorithms imple- mented in BayesiaLab. Beyond mere one-time predictions, BayesiaLab allows dealing with evidence interactively and incrementally, which makes it a highly adaptive tool for real-time inference. www.conradyscience.com | www.bayesia.com 38
  • 42. Knowledge Discovery in the Stock Market with Bayesian Networks Appendix Appendix Markov Blanket In many cases, the Markov Blanket algorithm is a good starting point for any predictive model, whether used for scor- ing or classi cation. This algorithm is extremely fast and can even be applied to databases with thousands of variables and millions of records. The Markov Blanket for a node A is the set of nodes composed of A’s parents, its children, and its children’s other par- ents (=spouses). The Markov Blanket of the node A contains all the variables, which, if we know their states, will shield the node A from the rest of the network. This means that the Markov Blanket of a node is the only knowledge needed to predict the behavior of that node A. Learning a Markov Blanket selects relevant predictor variables, which is particularly help- ful when there is a large number of variables in the database (In fact, this can also serve as a highly-ef cient variable selection method in preparation for other types of modeling, outside the Bayesian network framework). Bayes’ Theorem Bayes’ theorem relates the conditional and marginal probabilities of discrete events A and B, provided that the probabil- ity of B does not equal zero: P(B A)P(A) P(A B) = P(B) In Bayes’ theorem, each probability has a conventional name: • P(A) is the prior probability (or “unconditional” or “marginal” probability) of  A. It is “prior” in the sense that it does not take into account any information about  B. The unconditional probability  P(A) was called “a  priori” by Ronald A. Fisher. • P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the speci ed value of B. www.conradyscience.com | www.bayesia.com 39
  • 43. Knowledge Discovery in the Stock Market with Bayesian Networks • P(B|A) is the conditional probability of B given A. It is also called the likelihood. • P(B) is the prior or marginal probability of B. Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B is related to the converse conditional probability of B given A. About the Authors Stefan Conrady Stefan Conrady is the cofounder and managing partner of Conrady Applied Science, LLC, a privately held consulting rm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010, Conrady Applied Science was appointed the authorized sales and consulting partner of Bayesia S.A.S. for North America. Stefan Conrady studied Electrical Engineering and has extensive management experience in the elds of product plan- ning, marketing and analytics, working at Daimler and BMW Group in Europe, North America and Asia. Prior to es- tablishing his own rm, he was heading the Analytics & Forecasting group at Nissan North America. Lionel Jouffe Dr. Lionel Jouffe is cofounder and CEO of France-based Bayesia S.A.S. Lionel Jouffe holds a Ph.D. in Computer Science and has been working in the eld of Arti cial Intelligence since the early 1990s. He and his team have been developing BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The relevance of Bayesian networks, especially in the context of consumer research, is high- lighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007. www.conradyscience.com | www.bayesia.com 40
  • 44. Knowledge Discovery in the Stock Market with Bayesian Networks Contact Information Conrady Applied Science, LLC 312 Hamlet’s End Way Franklin, TN 37067 USA +1 888-386-8383 info@conradyscience.com www.conradyscience.com Bayesia S.A.S. 6, rue Léonard de Vinci BP 119 53001 Laval Cedex France +33(0)2 43 49 75 69 info@bayesia.com www.bayesia.com Copyright © 2011 Conrady Applied Science, LLC and Bayesia S.A.S. All rights reserved. Any redistribution or reproduction of part or all of the contents in any form is prohibited other than the following: • You may print or download this document for your personal and noncommercial use only. • You may copy the content to individual third parties for their personal use, but only if you acknowledge Conrady Applied Science, LLC and Bayesia S.A.S as the source of the material. • You may not, except with our express written permission, distribute or commercially exploit the content. Nor may you transmit it or store it in any other website or other form of electronic retrieval system. www.conradyscience.com | www.bayesia.com 41