SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
1 | P a g e
University of Dublin
TRINITY COLLEGE
Modelling Financial Market Forces Using
Regression and Sentiment Analysis
Mark John Lyons
B.A.I. Engineering
Final Year Project April 2016
Supervisor: Professor Khurshid Ahmad
School of Computer Science and Statistics
O’Reilly Institute, Trinity College, Dublin 2, Ireland
2 | P a g e
Abstract
The aim of the project was to model the dynamics of
financial markets so as to try observe and test the
validity of financial theories. Specifically the theories
mean reversion, volatility clustering and attribute
framing. Regression and sentiment analysis were
combined to achieve this. The R programming language
was used to compute the statistics involved in the
project.
3 | P a g e
Acknowledgements
I would like to thank my supervisor, Khurshid Ahmad,
for giving me this opportunity. He provided great advice
and helpful nudges throughout the project. I enjoyed the
discussions on a range of topics; finance, computer
science, maths and a small bit of history.
Thank you also to Stephen Kelly, PhD student at Trinity
College, for his help along the way. Particularly for
introducing me to the Rocksteady program and
discussing R with me.
Finally thank you to my family and friends for their
support throughout. In particular my Mother and Father
Margaret and Pat Lyons for all the help they gave me.
4 | P a g e
Table of Contents:
Abstract...........................................................2
Acknowledgements .........................................3
1. Introduction..................................................6
2. Motivation and Lit Review
2.1. Economics…………………………………….7
2.2. Financial Markets & Monetary Policy…8
2.3. Behavioural Finance………………………11
2.4. Conclusion…………………………………..12
3. Method
3.1. Stochastics & Financial Series………..14
3.2. Stationarity & Returns……………………14
3.3. Stylised Facts & Summary Statistics..15
3.4. Linear Regression…………………………17
3.5. Ordinary Least Squares………………….19
3.6. OLS Assumptions………………………….20
3.7. Autoregression…………………………….23
3.8. Vector Autoregression…………………..23
3.9. LexisNexis Corpus………………………..24
3.10. Sentiment Analysis……………………….25
3.11. Tableau……………………………………….26
4. Case Studies and Results
4.1. Summary Statistics……………………….28
4.2. Autoregression…………………………….29
4.3. Vector Autoregression…………………..34
5. Conclusion and Future Work.
5.1. Work Completed……………………………40
5.2. Conclusions…………………………………40
5.3. Future Work…………………………………40
5 | P a g e
6. References…………………………………………….41
6 | P a g e
1 Introduction:
This project began from a desire to pursue a project under
an area of personal interest, currency, or forex, markets. The
ever-changing value of money and the effect it could have on
whole countries has always fascinated me. The wish was to
learn Computer Engineering skills to better understand the
market. First statistical research on the forex market was done
and then evolved to financial markets as a whole. A
comparison of the behaviour of 3 of the major financial markets
(the bond market, the stock market and as stated the forex
market) will be made.
Traditional Finance theory states that markets should
usually be rational and exhibit mean reversion to reassert an
assets price. The newer emerging field of Behavioural Finance
states that the market isn’t as efficient as theorised and the
sentiment of traders will affect it. The project aimed to model
post Global Financial Crisis market behaviour to see whether
it, as theoretically proposed, performed mean reversion over
the period. Behavioural finance theorems were assessed to see
whether sentiment effects market movement when it acted
irrationally. Volatility clustering was also of interest and
observed in the market. Modelling the nature of the market is
essential for finding investment opportunities and risk
management in financial portfolios.
The project involved learning data analysis techniques such
as the ETL (extract, transform and load) process. Text analysis
was applied to determine the implicit sentiment on the markets
in major publications around the world. Knowledge of statistical
theory in the context of finance was essential for preceding
accurate modelling of the data. The R programming
environment and language was learned to warehouse the large
data sets and apply the statistical m ethods. Lastly emerging
data visualisation standard Tableau was used to display the
results as pleasingly as possible.
7 | P a g e
2 Motivation and Lit Review:
2.1 Economics
Economics is a social science studying the movement of
goods, services and wealth. BusinessDictionary.com defines
economics as “the theories, principles, and models that deal
with how the market process work s”. Economies and financial
markets are then intrinsically linked in the capitalist system.
The theoretical relationship between the US economy and its
stock market for example is shown in figure 1 below. Also of
note from the figure is the idea of econom ic cycles, a
periodical cycle through recession and recovery.
F ig u re 1
The economic crisis of 2007 –2008, from which many
countries including the U.S. are still only slowly recovering,
was of historic proportions, involving a f inancial market
collapse, a rapid rise in unemployment, an unprecedented
decline in world trade and massive government intervention
aimed at reversing the downturn (Hielbroner and Milberg,
2011). This economic crisis, called the Global Financial Crisis,
was the worst seen since the great depression of the 1930s. Of
interest to this project was the behavior of the stock markets
as an economic indicator post -crisis.
8 | P a g e
The U.S. National Bureau of Economic Research (NBER)
determined the economic trough of the Global Financial crisis
to be June 2009. This trough is marked in a graph, figure 2,
containing the stock market indices the S&P500 and the
NASDAQ composite index. The figure shows empirical evidence
of the relationship graphed above. The S&P 500 index wil l be
used as the indicator when m odelling stock market behavior.
F ig u re 2
2.2 Financial Markets & Monetary Policy
The U.S. Federal Reserve (Fed) sets monetary policy in
order to control this economic growth and contraction. Before
discussing monetary polic y we need to understand how
financial assets are priced in the market. Asset prices in the
market are determined through price discovery. The invisible
hand of the pricing mechanism coordinates supply and demand
in markets in a way that is automatically in the best interests of
society (Scott, 2006). Traditional finance theory would have us
believe that the free market will keep prices fair and balanced
and that arbitrageurs (people who utilise arbitrage*) will take
advantage of any deviations thus restorin g the equilibrium. The
return to equilibrium occurs through mean reversion which is
defined as “the theory that interest rates, security prices, and
*Investopedia: Arbitrage is the simultaneous purchase and sale
of an asset in order to profit from a difference in the price.
9 | P a g e
various economic indicators will, over time, return to their long -
term averages after a significant short -term move”. This is
called the efficient market hypothesis.
F ig u re 3
Intuitively mean reversion can be seen as a positive
change in price will be followed by a negative change and vice
versa. Figure 3 is a picture depicting mean reversion.
The three primary methods of im plementing monetary
policy is setting interest rates, buying/selling U.S. treasuries
on the open market and changing dollar reserve requirements.
These actions affect the dollar value and the return rate on
treasury bonds. A deeper discussion of monetary policy and the
means by which it is carried out is not presented here. The rate
of return on U.S. treasuries is an important indicator in the
trust of the market in the U.S economy and as such we will
assess it also.
The Fed discusses monetary policy in Federal Open
Market Committee (FOMC) meetings 8 of which are scheduled
every year, after the 2 d ay meeting the Fed reveals its view on
economic activity, forecasts for future activity and changes to
monetary policy in an announcement afterwards. It is stated
that volatility of asset prices such as the S&P 500 and dollar
foreign exchange rates increas es on announcement days and in
particular around the release time of the announcement. To
test this statement the minute by minute rate of change, or
volatility, of the EUR/USD exchange rate on both an average
day and a FOMC announcement day was calculated . This is
shown in figure 4. Unfortunately intraday data of the S&P500
and Treasury Bonds could not be obtained freely to do similar
10 | P a g e
calculations. The graphs y axes are normalised to the maxim um
return value of both series.
F ig u re 4
Note the overall in crease in volatility during the day and
the particular increases at the announcement time and the end
of the trading day. Per noble prize winning economist
Mandelbrot (1963) “large changes tend to be followed by large
changes, of either sign, and small cha nges tend to be followed
by small changes.” This is known as volatility clustering. The
DXY will be the final economic indicator modelled.
A paper by Romer and Romer (2000) discusses the Feds
FOMC forecasts and concludes that “the Fed has information
about future inflation that market participants do not have”.
Future inflation levels determine the future value of the dollar.
The FOMC announcement is then heavily dissected by market
participants and the future prospects of the dollar speculated.
The asymmetric holding of information by the Fed therefore can
cause movement in the dollar price as arbitrageurs buy/sell
dollars to capitalise on its long term change . As previously
described traditional finance theory assumes the supply and
demand equilibrium wil l be restored by the rational market and
11 | P a g e
the true price rediscovered. However the speculation about the
future value also has an effect beyond supply and demand on
the current price. This affect falls under the branch of
behavioural finance which attempts to explain price anomalies
in terms of the biased behaviour of individuals.
2.3 Behavioural Finance
Behavioural finance is “a new approach to financial
markets that has emerged, at least in part, in response to the
difficulties faced by the traditional parad igm. In broad terms, it
argues that some financial phenomena can be better
understood using models in which some agents are not fully
rational” (Barbaris and Thaller, 2003). It states that the market
is exuberant rather than rational and arbitrage may not always
offset shocks to the market. Figure 5 shows empirical evidence
that assets demonstrate high volatility much more frequently
than expected. These instances of irrationality are of concern
to market participants. From figure 4 it can be seen that
volatility on FOMC days regularly exceeds expectation, we infer
the market is acting exuberantly and turn to behavioural
finance theory to see if it can explain the activity.
F ig u re 5 ( re pr o du ce d fro m K h u r s hi d A h ma d s B e hav io ura l Fi nan c e
l e ct ur e s)
12 | P a g e
One important observation in behavioural finance is
framing. Entman (1993) summarises framing as “selecting some
aspects of perceived reality and make them more salient in the
communicating text, in such a way as to promote a particular
problem definition, causal interpretation, moral evaluation
and/or treatment recommendation for the item described". W e
hypothesise that the framing of information about the FOMC
announcement can influence the uncertainty in speculators
towards a certain bias. This type of frami ng is called attribute
framing. Panasiak and Terry (2013) say of attribute framing “an
event can receive different reviews when it is framed in a
positive vs negative light”. To test the hypothesis that positive
or negative framing can sway the bias, and cons equently
activity, of the market we evaluate the sentiment of major
worldwide publications when reviewing the FOMC meetings. If
there is correlation between market movement and the
sentiment over a long period of time our hypothesis is
confirmed.
2.4 Conclusion
So the goal is to model mean reversion of the market over
a long period and also analyse the sentiments effect on market
movement, how do we achieve this? Using regression analysis.
Regression analysis is a branch of statistical modelling which
aims to estimate the relationship between variables.
In order to model mean reversion the autoregressive area
of regression analysis was researched. Autoregressive models
of time series estimate the effect of previous values on future
values of a variable. Using the definition of mean reversion
that a positive change will be followed by a negative one (and
vice versa) we see it is necessary to model the relationship
between price changes and their prior price change. Therefore
autoregression is a suitable model for testing for mean
reversion.
Modelling the influence of sentiment on price movement
involves the simultaneous analysis of multiple variables,
termed multivariate analysis. The vector autoregressive model
explains an endogenous variable by a range of e xogenous
variables. In the words of Del Negro and Schorfheide (2011) “at
first glance, VARs appear to be straightforward multivariate
generalisations of univariate autoregressive models. At second
sight, they turn out to be one of the key empirical tools i n
modern macroeconomics”. The power of vector autoregressive
models comes from the ability to model seemingly unrelated
variables and determine their interdependencies. Vector
13 | P a g e
autoregression is chosen to model the correlation between
sentiment and price change.
14 | P a g e
3 Method:
3.1 Stochastic Processes & Financial Series
A stochastic process is a sequence of random variables,
{Xt }, indexed by t where t is usually a subset, T, of time [0, ∞).
Many natural processes are modelled as stochastic due to their
random behaviour. Since the closing price of an asset
tomorrow, Pt + 1 , cannot be predicted today we regard P t + 1 as a
random variable (Taylor, 2005). The set of prices, {Pt }, can
then be thought of as a set of random variables or a realisation
of a stochastic process.
Price changes can occur at any point on the time scale
during the trading day; therefore P t is a continuous function of
time. The financial time series analysed here will be sampled
at regular time intervals and so are discrete stochastic
processes. Discretising the data makes for easier computation
and analysis of behaviour over specific periods.
3.2 Stationarity & Returns
Stationarity describes a property of the process to achieve a
certain state of statistical equilibrium so that the distribution of
the process does not change much (Rachev et al, 2007). Put
simpler a stationary series can be defined as one with a
constant mean, constant variance and constant
autocovariances for each given lag.
The probability distribution of financial time series over a
period is heavily time period dependent as prices naturally rise
due to inflation. The mean and standard deviation of the series
over a long period of time will not give an accurate
representation of the series behaviour over the period . To
achieve stationarity of our series we find price returns. Price
returns are the change in price over a time period. The formula
for log returns, denoted by 𝑟𝑡, is defined by Ruppert and
Matteson (2011) as:
𝑟𝑡 = log(1 + 𝑅𝑡) = log (
𝑃𝑡
𝑃𝑡−1
)
where 𝑅𝑡is the net return 𝑅𝑡 = (𝑃𝑡/𝑃𝑡−1) − 1.
15 | P a g e
Taking the return instead of raw price achiev es time
invariance of the series, a con stant mean and constant
variance for the series . Figure 6 shows the EUR/GBP exchange
rate, from 2013 to 2016, in orange and the return series
generated from it in blue. Note how the EUR/GBP rate has
been detrended to a constant nature over time in the return
series. The probability distribution for the return series gives a
more accurate indication of market behaviour than the raw
series distribution.
The expected mean of the return distribution is 0 with some
constant variance 𝜎2
.
Figure 6
Further mention of the economic indicator series will
reference their return series.
3.3 Stylised Facts & Summary Statistics
Stylised facts are “general properties that are expected to
be present in any set of returns” and “are pervasive a cross
time as well as across markets” (Taylor, 2005). One importan t
stylised fact Taylor states is “the distribution of returns is not
16 | P a g e
normal”; the assumption of normality of returns is important for
many financial techniques so the returns distribution is
analysed.
The summary statistics mean ( 𝑟̅), standard deviation ( s),
skewness (b) and kurtosis (k) are used to describe the
characteristics of a distribution. They are defined, for a set of
n returns to be:
𝑟̅ =
1
𝑛
∑ 𝑟𝑡
𝑛
𝑡=1
, 𝑠2
=
1
𝑛 − 1
∑(𝑟𝑡 −
𝑛
𝑡=1
𝑟̅)2
,
𝑏 =
1
𝑛 − 1
∑
(𝑟𝑡 − 𝑟̅)3
𝑠3
,
𝑛
𝑡=1
𝑘 =
1
𝑛 − 1
∑
(𝑟𝑡 − 𝑟̅)4
𝑠4
𝑛
𝑡=1
.
The summary statistics , also called the moments of data, are
used to find the closeness of the distribution of returns to a
normal distribution.
Mean, the first moment, and standard deviation, the square
root of the second moment variance, are elementary probability
measures and it is assumed the reader underst ands them
already. Briefly to note, the mean indicates the central
tendency point of the distribution and the standard deviation
reveals the dispersion of data points. The standard deviation is
also important for standardising the distribution using z -scores
particularly in multivariate analysis.
Skewness is a measure of the asymmetry of the distribution
about the central tendency. Outliers produce skewed
distributions. A visual display of skewness measurement is
shown below in figure 7.
Figure 7
17 | P a g e
Kurtosis is the relative concentration of scores in the center,
the upper and lower ends (tails), and the shoulders (between
the center and the tails) of a distribution (Norusis, 1994).
Kurtosis measures how peaked a distribution is. In a normal
distribution kurtosis is equal to three, to compare a
distributions kurtosis to the normal the “excess kurtosis” is
found by negating three from the measured kurtosis. A
distribution is called leptokurtic if the excess kurtosis is
positive, mesokurtic if there is no exce ss kurtosis and
platykurtic if excess kurtosis is negative. A visual
representation of kurtosis is given in figure 8.
Figure 8
A final summary statistic, the z -statistic, defined as:
𝑧 =
𝑟̅
𝑠/√ 𝑛
is used to “assess the null hypothesis that the expected return
is zero” (Taylor, 2005).
3.4 Linear Regression
Regression analysis is an area of statistics which aims to
model the effect of a given set of explanatory random variables
x, {x1 ,...,xk }, also called regressors, on a variable of primary
interest y. “A main characteristic of regression models is that
the relationship between the response variable y is not a
deterministic function f (x) of x (as often is the case in
18 | P a g e
classical physics), but rather shows random errors ” (Fahrmeir
et al, 2013).
Linear regression methods estimate the relationship between
y and x by modelling the best fitting linear relationship between
the response and explanatory variables. The ordinary least
squares method is a popular technique to model the best linear
fit and will be discussed shortly.
Linear regression models are composed of a “systematic (or
deterministic) component, 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘 , and an
idiosyncratic (or stochastic) co mponent, ε” (Miller, 2014). The
deterministic part consists of the vertical axis intercept 𝛽0 and a
summation of the regressors weighted by a set of m atching
regression coefficients {𝛽1, … , 𝛽 𝑘}. The regression coefficients
are unknown parameters which weight how much effect each
variable in the set 𝒙 has on 𝑦. “More precisely, 𝛽𝑖 is the partial
derivative of the expected response with respect to the ith
regressor” (Ruppert, Matteson 2011 ). The stochastic
component is a set of error (or re sidual) terms, ε, to account
for the error between the line and data points.
The linear regressive model equation:
𝑦 = 𝛽0 + ∑ 𝛽𝑖 𝑥𝑖
𝑘
𝑖=1
+ 𝜺
Figure 9 shows a plot of the response of y to a variable x
and a line fitted as a linear response o f y to x. The distance of
the thin lines connecting the points above and below the line
are the error terms for the points.
Figure 9
19 | P a g e
The unknown regression coefficients are solved for by the
ordinary least squares (OLS) method. Once these are
determined they’re plugged back into the equation above to
find the OLS linear regressive model.
3.5 Ordinary Least Squares
The process undertaken by the OLS method to estimate the
optimal regression coefficients and the slope is a minimisation
of the difference be tween the observed response variable data
points, yi , and their linearly predicted values 𝑦i – εi .
Wooldridge (2000) gives a good description of OLS and so to
detail the process we follow his explanation. Paraphrasing his
discussion of a 2 regressor vari able system to a general
system he states “given n observations on y, x1 , x2 , … xk , {(xi 1 ,
xi 2 , … xi k , yi ): i = 1, 2, … , n}, the estimates of β,{ 𝛽̂0, 𝛽̂1, … , 𝛽̂ 𝑘},
are chosen simultaneously to make:
𝑆𝑆𝐸 = ∑(
𝑛
𝑖=1
𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖1 − ⋯ − 𝛽̂ 𝑘 𝑥𝑖𝑘)2
= ∑ 𝜀𝑖
2
𝑛
𝑖=1
as small as possible.” The residuals are squared to account for
positive and negative values negating. That is, for all
observation points i= 1,…,n of the explanatory variables the
squared error terms ( 𝜀𝒊 = 𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑖 𝑥𝑖
𝑁
𝑖=1 , from the linear
regressive model equation) are summed up so that the
minimum solution can be found.
Multivariable calculus is used to solve thi s minimisation
problem to a system of k+1 linear equations in k+1 unknown’s
𝛽̂0, 𝛽̂1, … , 𝛽̂ 𝑘. W e want to find the critical points of the SSE
equation in order to minimise it. Taking the first partial
derivative of the equation with respect to each of the 𝛽̂𝑗,
evaluating them at the solutio ns, and setting them equal to
zero gives:
−2 ∑ 𝑥𝑖𝑗(
𝑛
𝑖=1
𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖1 − 𝛽̂2 𝑥𝑖2 − ⋯ − 𝛽̂ 𝑘 𝑥𝑖𝑘) = 0, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗 = 0, … , 𝑘.
Cancel the -2 and we have the desired system of linear
equations. This extremely large s ystem, called the OLS first
order conditions, can be solved through standard linear
equation methods by R, the statistics software used in this
project, very quickly.
20 | P a g e
3.6 OLS Assumptions
OLS will resolve the regression coefficients given any
arbitrary response variable or set of explanatory variables.
However in order to uniquely determine the regression
parameters, and be confident of the inferences we make based
on the model, we need to make assumptions about our
variables. Per Miller (2014) these assumptions are:
A1: The relationship between the regressor and the
regressand is linear.
A2: 𝐸[𝜀 | 𝑥] = 0
A3: 𝑉𝑎𝑟[𝜀 | 𝑥] = 𝜎2
A4: 𝐶𝑜𝑣[𝜀𝑖 | 𝜀𝑗 ] = 0 ∀ 𝑖 ≠ 𝑗
A5: 𝜀𝑖 ~ 𝑁(0, 𝜎2) ∀ 𝜀𝑖
A6: The regressor is nonstochastic .
It is important to test the model for these assumptions to
satisfy that it is valid. The images in this section are sourced
from a tutorial on R-bloggers.com called “Graphic Analysis of
Regression Assumptions”.
Assumption 1 should be self-explanatory. To linearly model y
against x there must be a linear relationship between them. To
test this assumption the residuals are plotted against the
values predicted by the model, if the graph shows an even
spread of data points about the x -axis then the linearity
assumption is met. Figure 10 shows an even spread and thus
linearity is confirmed.
Figure 10
21 | P a g e
Assumption 2 states that the expected value of the models
error terms should be zero; this refers to the fact that a
perfectly fitted line will have residuals distributed evenly above
and below the line leading to a mean value of zero. The mean
value of the models residuals is found to test this.
Assumptions 3 and 4 are sometimes grouped together. They
state the error terms must have constant variance (3) and be
uncorrelated (4). If this is true the error terms are called
spherical errors. Constant variance, also called
homoscedasticity, assumes that the variance of the error terms
does not change over time. To test this we look at the same
plot as assumption 1 to see if the vertical distance between
error terms grows consistently in either direction. If they don’t
then we have homoscedasticity. Figure 11 shows
homoscedasticity, note how the variance rises and falls but
doesn’t do so persistently.
Figure 11
Error terms should be completely random; having correlation
among the terms means the OLS process made a systematic
error judging the line. The autocorrelation function is run on
the residuals to determine if there is correlation. If
autocorrelation, or serial co rrelation, is found this can violate
trust in the model. The Durbin -W atson statistic, d, is used to
test the significance of this autocorrelation and consequently
the accuracy of the model.
22 | P a g e
𝑑 =
∑ (𝜀𝑡 − 𝜀𝑡−1)2𝑁
𝑛=2
∑ 𝜀2𝑁
𝑛=1
The value of d always lies between 0 and 4. If d is 2 there is
no autocorrelation, values below this imply positive
autocorrelation (successive error terms are close in value to
one another) and valu es above indicate negative
autocorrelation (successive error terms are different).
Assumption 5 says that the residuals must be normally
distributed. However as Miller states “many of the results of
the OLS model are true, regardless of this assumption.” T he
assumption is therefore mostly useful for defining confidence
levels for the model parameters. A probability distribution of
the errors is graphed and analysed to determine the closeness
to a normal distribution . Figure 12 shows this comparison.
Figure 12
The summary statistics could also be useful here; they can
be helpful in comparing the distribution to a normal
distribution. W e omit the summary statistics for the error terms
and instead present the distribution visually. The less similar
the error terms distribution is to the normal distribution the less
accurate the OLS will be.
23 | P a g e
From a practical standpoint the OLS model holds true
regardless of the 6t h
assumption so it will not be discussed
here nor will we test for it.
3.7 Autoregression
An autoregression model, notation AR(p), is a form of linear
regression model where the set of regressor variables is p lags
of the response variable. The equation for an AR(p) model:
𝑦𝑡 = 𝛽0 + ∑ 𝛽𝑖 𝑦𝑡−𝑖
𝑝
𝑖=1
+ 𝜀𝑡
We wish to use the autoregressive model to detect if mean
reversion occurred therefore a lag of 1 is chosen for our model.
Taking the first lag defines the effect on a time period, t, of the
period before. The AR(1) model is:
𝑦𝑡 = 𝛽0 + 𝛽1 𝑦𝑡−1 + 𝜀𝑡
The regression coefficient 𝛽1 determines whether there is mean
reversion. Rouzet (2010) states if |𝛽1| < 1 in an AR(1), the
process is mean reverting. This can be seen if you think of a
realisation of 𝑦𝑡−1 being a non-zero number then the 𝛽1
coefficient will “shrink” 𝑦𝑡 towards our mean of zero. There is
then an inverse relationship between 𝛽1 and mean reversion,
the smaller the absolute value of 𝛽1 the more reversion has
occurred. If 𝛽1 is negative then we can say a positive change is
usually followed by a negative one.
3.8 Vector Autoregression
The vector autoregressive model, notation VAR(p), is a
multivariate generalised version of th e autoregressive model. It
extends the set of regressor values from the lags of the
dependent variable to the lags of exogenous variables as well
as the dependent variable lags. The equation for a VAR(p)
model with k variables:
𝑦𝑡 = 𝛽0 + ∑ 𝛽𝑖 𝑦𝑡−𝑖
𝑝
𝑖=1
+ 𝜀𝑡
24 | P a g e
the same form as the AR(p) model except that 𝛽0, 𝜀𝑡, and each
𝑦𝑖 is a vector of length k and each 𝛽𝑖 is a kxk matrix.
To illustrate, the general example of a VAR(1) in 2 variables is
given:
[
𝑦1,𝑡
𝑦2,𝑡
] = [
𝛽1,0
𝛽2,0
] + [
𝛽1,1 𝛽1,2
𝛽2,1 𝛽2,2
] [
𝑦1,𝑡−1
𝑦2,𝑡−1
] + [
𝜀1,𝑡
𝜀2,𝑡
]
Most important to us the regression coefficients are found for
the lags of the dependent and exogenous varia bles so that the
effect of the changes in the exogenous variables on the
dependent variable can be seen.
To test the effect of sentiment on our series we model the
indicators as the dependent variables and the sentiment as
regressor variables.
3.9 LexisNexis Corpus
Nesselhauf (2005) defines a corpus as “a systematic
collection of naturally occurring texts (of both written and
spoken language).” A corpus of news articles is created to
analyse the sentiment and try determine if the framing in these
articles follows market movement.
Figure 13
25 | P a g e
LexisNexis is a provider of legal, government, business and
high-tech information sources. TCD attendants are allowed free
use of the LexisNexis database. To find a corpus specific to
the project articles were filtere d based on the search imaged in
figure 13.
3.10 Sentiment Analysis
Sentiment from the corpus was analysed using the
Rocksteady program. Rocksteady is a text analytics system
created in Trinity College Dublin by Khurshid Ahmad and his
postgraduate students.
Rocksteady uses a bag of words approach; it breaks the
corpus down into the words constituting it, regardless of order,
and compares them against a specialised dictionary. The
dictionary contains weighting for what sentiment is inherent in
each word. A z-score based on this weighting is computed for
each type of sentiment expressed in a daily aggregation of
articles. A z-score, or standard score, indicates how many
standard deviations a raw score is from the mean. In the
context of Rocksteady it indicates h ow much stronger a
sentiment express over a day is compared to normal.
Figure 14
26 | P a g e
Rocksteady analyses positive, negative, active, passive,
strong, weak, economic, political and militant sentiment. Of
interest to the project are the first two types of sentiment.
Figure 14 shows sample section of the Rocksteady output for
the projects corpus. Red boxes indicate extremely high levels
of a sentiment expressed that day and yellow boxes show
moderately high sentiment.
3.11 Tableau
Tableau is an emerging standard for dat a visualisation. It is
offers a variety of options for graphing enabling users to
display information as intuitively as possible. It also integrates
with the R programming environment using RServe to open
communication between the programs. This combines t he power
and flexibility available through R to compute advanced
statistical processes with the ease of Tableaus visualisation
process. The original images Figure 3, Figure 4 and Figure 6
produced in this report were created in Tableau. Two examples
of the Tableau interface are provided. The data loading
procedure, showing a left join of two datasets, is in the figure
above while the graphing process is shown in the figure 16 on
the next page.
Figure 15
27 | P a g e
Figure 16
28 | P a g e
4 Case Study & Results
4.1 Stylised facts & Summary Statistics
The stylised facts of the DXY, S&P500 and 10 year Treasury
bond (T10) returns for the period from June 2009 to present
day are presented in the table below.
Indicator
Mean Return
(*10^4)
Standard Deviation
(*10^2) Skewness
Excess
Kurtosis
Z
statistic
T10 -3.89 2.27 0.15 0.81 -0.71
DXY 0.59 0.45 -0.03 1.54 0.54
S&P500 4.49 1.02 -0.42 3.56 1.83
All three series were leptokurtic meaning the distributions
were quite peaked. The DXY and the S&P 500 has slight
negative skew about their means. Due to a positive mean with
negative skew we can infer that daily returns over the period
were more likely to be positive. The T10 conversely had a
positive skew and negative mean . This means there was a
greater portion of negative returns over t he period.
The standardised distributions of the T10 and the S&P 500
are given in figure s 13 and 14, with a normal distribution curve
overlaid, to give context to the summary statistics.
Figure 17
29 | P a g e
Figure 18
Note in particular how peaked the S&P 500 d istribution is.
Standardised distributions are graphed in units of standard
deviation on the x-axis, we can see there is extreme outliers
outside of 4 standards deviations in the S&P500 compared to
the T10 which cause this.
4.2 Autoregression
Assumptions Tests:
Plots of the standardised residuals against fitted values
for each model are shown in figures 19, 20 and 21. These are
used for testing assumptions 1 and 3. A table containing the
mean residual value for each model is also provided for
assumption 2 tests. To validate assumption 4 plots of the
autocorrelation of residuals is given in figures 22, 23 and 24.
Assumption 1: Linearity. In each model there is an even
spread of standardised residuals about the fitted values,
therefore the assumptions that a linear relations exists
between the response and regressor variables are true for each
model.
30 | P a g e
Figure 19 – DXY model
Assumption 2: As we can see from the table the mean, or
expected, value of the error terms are all extremely small
numbers. They are c lose enough to zero to be sufficient to
meet this assumption.
Figure 20 – S&P 500 Model
T10 DXY S&P 500
E[ ε | x ] -4.18E-19 2.22E-17 -9.13E-19
31 | P a g e
Figure 21 – T10 Model
Assumption 3: Constant variance. There is no directional
growth of variance in any of the figures. The models are
homoscedastic, assumption 3 is met.
Figure 22 – ACF of DXY model residuals
32 | P a g e
Figure 23 – ACF of S&P500 model residuals
Assumption 4: As we can see from the autocorrelation
plots there is persistent serial correlation in the models lags.
The correlation of errors and certain lagged errors, in each
model, is big enough to suggest that autocorrelation may be
problem. The Durbin -W atson test was run on each model to
determine the statistical significance of this. The results are
presented in a table on the next page.
Figure 24 – ACF of T10 Model Residuals
33 | P a g e
Comparing the Durbin W atson test results to 2 we can see
that the autocorrelation is statistically insignificant. Assumption
is validated.
Assumption 5: Normality of errors. Thou gh this
assumption does not need to be fully met in order for the model
to be true it does assess the confidence we can have in the
model. The distribution of errors for the T10 model is shown
below compared to a normal distribution. There is a very close
fit to the normal distribution and so there can be confidence in
the results of the model. The distributions for the other models
are very similar and have been omitted.
Figure 25
DXY S&P 500 T10
DW-test 1.999 1.996 1.998
34 | P a g e
Results:
Indicator Regression Coefficient Reversion
T10 -0.0222 Yes
DXY -0.0433 Yes
S&P 500 -0.0614 Yes
All three series displayed mean reversion from one day to
the next. The magnitude of the return s from day to day shrunk
to zero as seen by the fractional coefficient. Also of interest is
the negative sign of the coefficient, returns from day to day
tends to be in the opposite direction to each other.
4.3 Vector Autoregression
The vector autoregressive model was ran between
negative/positive sentiment and the three finan cial time series.
The estimation results for the coefficients determine the effect
of the sentiment on each series. The t value of the model
measures the size of the errors relative to the variation in the
sample data. More simply it tests how well the mod el fit by
taking a ratio of the distance between the estimated value and
observed value and the standard error. The p statistic, noted
by Pr(>|t|) in the images, is a hypothesis test that determines
the significance of the result. Significance in this case refers to
how much effect a change in the regressor variable, sentiment,
had on the response variable, the financial indicators.
Assumptions Tests:
For the sake of brevity we present the assumption test
results for the VAR model run on the DXY and posi tive
sentiment and omit the tests for the other models as the results
were very similar to each other and the results in the
autoregressive section.
Figure 26 shows the model as linear and with constant
variance. The autocorrelation plot in figure 27 shows small
serial correlation, however the Durbin -W atson test result of
1.999 renders this insignificant. The histogram of the residuals
contains some negative skew but reflects a normal distribution
fairly well. The mean value of errors was -1.03e-21.
35 | P a g e
Figure 26
Figure 27
36 | P a g e
Figure 28
Results:
First the DXY:
Figure 29 – DXY and Negative Returns
37 | P a g e
Figure 30 – DXY and Positive Returns
The DXY has no significant correlation between it and
positive or negative sentiment.
Secondly the T10:
Figure 31 – T10 and Negative Sentiment
38 | P a g e
Figure 32 – T10 and Positive Sentiment
The T10 shows some effect from negative sentiment of
articles 5 days prior. This is small thou gh and may be an
artefact from modelling.
Finally the S&P500:
Figure 33 – S&P 500 and Negative Sentiment
39 | P a g e
Figure 34 – S&P 500 and Positive Sentiment
There is no significant correlation between sentiment
expressed about the FOMC meetings and the S&P500 either.
The hypothesis that major publications can swa y bias by
attribute framing, and influence market movement, has been
debunked under the parameters of this experiment.
40 | P a g e
5 Conclusion & Future Work
5.1 Work Completed
The project offered involved analysing sentiment in financial
markets using re gression analysis. E xtensive background
research into finance was necessary in order to create and
understand a context to analyse. Concurrently study was done
on statistical methods. After more basics statistical measures
were understood work turned to un derstanding regression
analysis. Particularly autoregression and vector autoregression
analysis. As these methods produce results regardless of
context a deep understanding of their properties was vital to
ensure the models created accurate results. Throug hout the
project the R programming language was learned to apply the
statistical methods to big data sets. Once the models were
created in R property tests were applied to validate them. The
result was the modelling of mean reversion and sentiment in
financial markets.
5.2 Future Work
Reassessing the filters for the corpus may reveal better
results for the VAR models. LexisNexis limited corpus
downloads to 500 articles, building a larger , more selective
corpus out of the limited corpus’ would also be benefic ial to the
project and attempted if more time was available. Additionally
we viewed volatility clustering in figure 4. Further modelling of
this through a GARCH model would be desirable.
5.3 Conclusion
The project gave a good grounding in regression and
sentiment analysis and the methods involved. Regression
analysis is a powerful flexible tool that can be applied to a
wide range of applications. As such it was very beneficial to
learn. The results did not display correlation of sentiment and
financial markets as expected however further work may prove
more revealing.
41 | P a g e
6 References
Del Negro, M. and Schorfheide, F. (2011). Bayesian
Macroeconomics. The Oxford Handbook of Bayesian
Econometrics, vol. 1, p.293–389.
Entman, R. (1993). Framing: Toward Clari fication of a
Fractured Paradigm. Journal of Communication, vol. 43,
pp.51-58.
Fahrmeir, L. (2013). Regression: Models, Methods and
Applications. Berlin, Heidelberg: Springer Berlin Heidelberg.
Heilbroner, R. and Milberg, W . (2012). The making of
economic society. Upper Saddle River, N.J.: Pearson.
Mandelbrot, B. B. (1963) The variation of certain speculative
prices. Journal of Business, vol. 36, pp. 392–41.
Miller, M. (2012). Mathematics and statistics for financial
risk management. Hoboken, N.J.: W ile y.
Nesselhauf, N. (2005). Corpus Linguistics: A Practical
Introduction. Available at: http://www.as.uni-
heidelberg.de/personen/Ne sselhauf/files/Corpus%20Linguisti
cs%20Practical%20Introduction.pdf
[Accessed: 04/04/2016]
Nicholas & Thaler, Richard, 2003. A survey of behavioral
finance. Handbook of the Economics of Finance , vol. 1, pp.
1053-1128.
Norusis, M. J. (1994). SPSS 6.1 base system user’s guide,
part 2. Chicago, IL: SPSS.
Panasiak, M. and Terry, E. (2013). Framing Effects and
Financial Decision Making. Proceedings of 8t h
Annual
London Business Research Conference. Imperial College,
London.
Rachev, S., Mittnik, S. and Fabo zzi, F. (2007). Financial
econometrics. Hoboken, New Jersey: John W iley & Sons.
Romer, C. and Romer, D. (2000). Federal Reserve
Information and the Behavior of Interest Rates. American
42 | P a g e
Economic Review, vol. 90, pp.429 -457. Available at:
http://www.cfapubs.org/doi/pdf/10.2469/dig.v31.n1.805
[Accessed: 06/03/2016]
Rouzet, D. (2010) Lectures slides on: Discounted Dividens
and Asset Prices. Available at:
http://isites.harvard.edu/fs/docs/icb.topic734133.files/Sectio
n6.pdf
[Accessed: 15/04/2016]
Ruppert, D. and Matteson, D. S. (2011) . Statistics and Data
Analysis for Financial Engineering . New York, NY: Springer
New York.
Scott, B. R. (2006). The Political Economy of Capitalism .
Available at:
http://www.hbs.edu/faculty/Publication%20Files/07 -037.pdf
[Accessed: 17/02/2016]
Taylor, S. (2007). Asset price dynamics, volatility, and
prediction. Princeton, N.J.: Princeton University Press.
Wooldridge, J. (2013). Introductory econometrics. Mason,
OH: South-W estern Cengage Learning.
43 | P a g e
44 | P a g e

Weitere ähnliche Inhalte

Was ist angesagt?

Stock market anomalies a study of seasonal effects on average returns of nair...
Stock market anomalies a study of seasonal effects on average returns of nair...Stock market anomalies a study of seasonal effects on average returns of nair...
Stock market anomalies a study of seasonal effects on average returns of nair...Alexander Decker
 
Analysis of Stock Market Anomalies worldwide
Analysis of Stock Market Anomalies worldwide Analysis of Stock Market Anomalies worldwide
Analysis of Stock Market Anomalies worldwide Aanchal Saxena
 
Stock market volatility and macroeconomic variables volatility in nigeria an ...
Stock market volatility and macroeconomic variables volatility in nigeria an ...Stock market volatility and macroeconomic variables volatility in nigeria an ...
Stock market volatility and macroeconomic variables volatility in nigeria an ...Alexander Decker
 
Market efficiency, market anomalies, causes, evidences, and some behavioral a...
Market efficiency, market anomalies, causes, evidences, and some behavioral a...Market efficiency, market anomalies, causes, evidences, and some behavioral a...
Market efficiency, market anomalies, causes, evidences, and some behavioral a...Alexander Decker
 
Fed Funds Futures (Kuttner Krueger)
Fed Funds Futures (Kuttner Krueger)Fed Funds Futures (Kuttner Krueger)
Fed Funds Futures (Kuttner Krueger)Joel Krueger
 
6.[43 53]stock market volatility and macroeconomic variables volatility in ni...
6.[43 53]stock market volatility and macroeconomic variables volatility in ni...6.[43 53]stock market volatility and macroeconomic variables volatility in ni...
6.[43 53]stock market volatility and macroeconomic variables volatility in ni...Alexander Decker
 
STABILITY OF THAI BAHT: TALES FROM THE TAILS
STABILITY OF THAI BAHT: TALES FROM THE TAILSSTABILITY OF THAI BAHT: TALES FROM THE TAILS
STABILITY OF THAI BAHT: TALES FROM THE TAILSNicha Tatsaneeyapan
 
QUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUES
QUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUESQUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUES
QUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUESIJDKP
 
MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012
MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012
MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012M. İbrahim Turhan
 
Articlesgggg
ArticlesggggArticlesgggg
ArticlesggggJOSHIRAJ
 
Macroeconomic Variables on Stock Market Interactions: The Indian Experience
Macroeconomic Variables on Stock Market Interactions: The Indian ExperienceMacroeconomic Variables on Stock Market Interactions: The Indian Experience
Macroeconomic Variables on Stock Market Interactions: The Indian ExperienceIOSR Journals
 
Japanese Short Term Interest Rates-2014
Japanese Short Term Interest Rates-2014Japanese Short Term Interest Rates-2014
Japanese Short Term Interest Rates-2014Viara G. Bojkova
 
FED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRY
FED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRYFED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRY
FED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRYOlivier Desbarres
 
Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...sammysammysammy
 

Was ist angesagt? (19)

Stock market anomalies a study of seasonal effects on average returns of nair...
Stock market anomalies a study of seasonal effects on average returns of nair...Stock market anomalies a study of seasonal effects on average returns of nair...
Stock market anomalies a study of seasonal effects on average returns of nair...
 
Analysis of Stock Market Anomalies worldwide
Analysis of Stock Market Anomalies worldwide Analysis of Stock Market Anomalies worldwide
Analysis of Stock Market Anomalies worldwide
 
Stock market volatility and macroeconomic variables volatility in nigeria an ...
Stock market volatility and macroeconomic variables volatility in nigeria an ...Stock market volatility and macroeconomic variables volatility in nigeria an ...
Stock market volatility and macroeconomic variables volatility in nigeria an ...
 
Market efficiency, market anomalies, causes, evidences, and some behavioral a...
Market efficiency, market anomalies, causes, evidences, and some behavioral a...Market efficiency, market anomalies, causes, evidences, and some behavioral a...
Market efficiency, market anomalies, causes, evidences, and some behavioral a...
 
Fed Funds Futures (Kuttner Krueger)
Fed Funds Futures (Kuttner Krueger)Fed Funds Futures (Kuttner Krueger)
Fed Funds Futures (Kuttner Krueger)
 
Examining the Relationship between Term Structure of Interest Rates and Econo...
Examining the Relationship between Term Structure of Interest Rates and Econo...Examining the Relationship between Term Structure of Interest Rates and Econo...
Examining the Relationship between Term Structure of Interest Rates and Econo...
 
6.[43 53]stock market volatility and macroeconomic variables volatility in ni...
6.[43 53]stock market volatility and macroeconomic variables volatility in ni...6.[43 53]stock market volatility and macroeconomic variables volatility in ni...
6.[43 53]stock market volatility and macroeconomic variables volatility in ni...
 
5.john kofi mensah 49 63
5.john kofi mensah   49 635.john kofi mensah   49 63
5.john kofi mensah 49 63
 
STABILITY OF THAI BAHT: TALES FROM THE TAILS
STABILITY OF THAI BAHT: TALES FROM THE TAILSSTABILITY OF THAI BAHT: TALES FROM THE TAILS
STABILITY OF THAI BAHT: TALES FROM THE TAILS
 
PresentationF
PresentationFPresentationF
PresentationF
 
QUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUES
QUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUESQUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUES
QUALITY ASSURANCE FOR ECONOMY CLASSIFICATION BASED ON DATA MINING TECHNIQUES
 
Are Size and January Effects Related? Evidence from the Tunisian Stock Exchange
Are Size and January Effects Related? Evidence from the Tunisian Stock ExchangeAre Size and January Effects Related? Evidence from the Tunisian Stock Exchange
Are Size and January Effects Related? Evidence from the Tunisian Stock Exchange
 
MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012
MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012
MPRA, Mitigating Turkey's trilemma trade-offs, M. İbrahim Turhan, June 2012
 
Articlesgggg
ArticlesggggArticlesgggg
Articlesgggg
 
Macroeconomic Variables on Stock Market Interactions: The Indian Experience
Macroeconomic Variables on Stock Market Interactions: The Indian ExperienceMacroeconomic Variables on Stock Market Interactions: The Indian Experience
Macroeconomic Variables on Stock Market Interactions: The Indian Experience
 
Japanese Short Term Interest Rates-2014
Japanese Short Term Interest Rates-2014Japanese Short Term Interest Rates-2014
Japanese Short Term Interest Rates-2014
 
FED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRY
FED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRYFED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRY
FED: THIS IS WHAT IT SOUNDS LIKE WHEN DOVES CRY
 
Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...
 
Global economics report 2017 07-17
Global economics report 2017 07-17Global economics report 2017 07-17
Global economics report 2017 07-17
 

Andere mochten auch

IAF Conference in Taiwan 2016 Hybrid Online Facilitation
IAF Conference in Taiwan 2016 Hybrid Online FacilitationIAF Conference in Taiwan 2016 Hybrid Online Facilitation
IAF Conference in Taiwan 2016 Hybrid Online FacilitationTahara Masato
 
IOT Made Simple: Preparing for Success with IoT
IOT Made Simple: Preparing for Success with IoTIOT Made Simple: Preparing for Success with IoT
IOT Made Simple: Preparing for Success with IoTRogersBiz
 
Identidad e interacción simbólica
Identidad e interacción simbólicaIdentidad e interacción simbólica
Identidad e interacción simbólicadayana ariza
 
BOBPIPE 20160816
BOBPIPE 20160816BOBPIPE 20160816
BOBPIPE 20160816勃 鲍
 
Contentmarketing congres
Contentmarketing congresContentmarketing congres
Contentmarketing congresLara Ankersmit
 

Andere mochten auch (12)

IAF Conference in Taiwan 2016 Hybrid Online Facilitation
IAF Conference in Taiwan 2016 Hybrid Online FacilitationIAF Conference in Taiwan 2016 Hybrid Online Facilitation
IAF Conference in Taiwan 2016 Hybrid Online Facilitation
 
Funciones calculo dif
Funciones calculo difFunciones calculo dif
Funciones calculo dif
 
Material educativo
Material educativoMaterial educativo
Material educativo
 
IOT Made Simple: Preparing for Success with IoT
IOT Made Simple: Preparing for Success with IoTIOT Made Simple: Preparing for Success with IoT
IOT Made Simple: Preparing for Success with IoT
 
Acanthocephalus Dirus
Acanthocephalus DirusAcanthocephalus Dirus
Acanthocephalus Dirus
 
Thesis_Lead
Thesis_LeadThesis_Lead
Thesis_Lead
 
Trabajo de TIC
Trabajo de TICTrabajo de TIC
Trabajo de TIC
 
Identidad e interacción simbólica
Identidad e interacción simbólicaIdentidad e interacción simbólica
Identidad e interacción simbólica
 
Audience pleasures 2
Audience pleasures 2Audience pleasures 2
Audience pleasures 2
 
Dis paso1 grupo207102 432016 04
Dis paso1 grupo207102 432016 04Dis paso1 grupo207102 432016 04
Dis paso1 grupo207102 432016 04
 
BOBPIPE 20160816
BOBPIPE 20160816BOBPIPE 20160816
BOBPIPE 20160816
 
Contentmarketing congres
Contentmarketing congresContentmarketing congres
Contentmarketing congres
 

Ähnlich wie FYP

Martin Reilly, 2168944, Final Version
Martin Reilly, 2168944, Final VersionMartin Reilly, 2168944, Final Version
Martin Reilly, 2168944, Final VersionMartin Reilly
 
Fundamental and Technical Analysis with The Global Financial Crisi.docx
Fundamental and Technical Analysis with The Global Financial Crisi.docxFundamental and Technical Analysis with The Global Financial Crisi.docx
Fundamental and Technical Analysis with The Global Financial Crisi.docxbudbarber38650
 
Dr Haluk F Gursel, A Monetary Base Analysis and Control Model
Dr Haluk F Gursel, A Monetary Base Analysis and Control ModelDr Haluk F Gursel, A Monetary Base Analysis and Control Model
Dr Haluk F Gursel, A Monetary Base Analysis and Control ModelHaluk Ferden Gursel
 
Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...
Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...
Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...iosrjce
 
Value investing and emerging markets
Value investing and emerging marketsValue investing and emerging markets
Value investing and emerging marketsNavneet Randhawa
 
学年论文english version
学年论文english version学年论文english version
学年论文english version?? ?
 
How Vietnam Stock Returns Response to Events Announcement
How Vietnam Stock Returns Response to Events AnnouncementHow Vietnam Stock Returns Response to Events Announcement
How Vietnam Stock Returns Response to Events AnnouncementBang Vu
 
Impact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock Market
Impact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock MarketImpact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock Market
Impact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock Markettheijes
 
Foreign ExchangeRate Deterrntnattonand ForecastlngThe .docx
Foreign ExchangeRate Deterrntnattonand ForecastlngThe .docxForeign ExchangeRate Deterrntnattonand ForecastlngThe .docx
Foreign ExchangeRate Deterrntnattonand ForecastlngThe .docxbudbarber38650
 
Concept-of-macroeconomics
Concept-of-macroeconomicsConcept-of-macroeconomics
Concept-of-macroeconomicsOm Mallik
 
Security Analysis and Portfolio Management
Security Analysis and Portfolio ManagementSecurity Analysis and Portfolio Management
Security Analysis and Portfolio ManagementVishnu Rajendran C R
 
Forecasting Economic Activity using Asset Prices
Forecasting Economic Activity using Asset PricesForecasting Economic Activity using Asset Prices
Forecasting Economic Activity using Asset PricesPanos Kouvelis
 
Forecasting Stocks with Multivariate Time Series Models.
Forecasting Stocks with Multivariate Time Series Models.Forecasting Stocks with Multivariate Time Series Models.
Forecasting Stocks with Multivariate Time Series Models.inventionjournals
 
Dissertation final1
Dissertation final1Dissertation final1
Dissertation final1Arinze Nwoye
 
A Global Perspective of Varying Interest Rates in International Markets
A Global Perspective of Varying Interest Rates in International MarketsA Global Perspective of Varying Interest Rates in International Markets
A Global Perspective of Varying Interest Rates in International MarketsAssociate Professor in VSB Coimbatore
 

Ähnlich wie FYP (20)

Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Martin Reilly, 2168944, Final Version
Martin Reilly, 2168944, Final VersionMartin Reilly, 2168944, Final Version
Martin Reilly, 2168944, Final Version
 
Fundamental and Technical Analysis with The Global Financial Crisi.docx
Fundamental and Technical Analysis with The Global Financial Crisi.docxFundamental and Technical Analysis with The Global Financial Crisi.docx
Fundamental and Technical Analysis with The Global Financial Crisi.docx
 
Dr Haluk F Gursel, A Monetary Base Analysis and Control Model
Dr Haluk F Gursel, A Monetary Base Analysis and Control ModelDr Haluk F Gursel, A Monetary Base Analysis and Control Model
Dr Haluk F Gursel, A Monetary Base Analysis and Control Model
 
Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...
Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...
Foundations of Financial Sector Mechanisms and Economic Growth in Emerging Ec...
 
revHOS-Hilary
revHOS-HilaryrevHOS-Hilary
revHOS-Hilary
 
Value investing and emerging markets
Value investing and emerging marketsValue investing and emerging markets
Value investing and emerging markets
 
Final Draft
Final DraftFinal Draft
Final Draft
 
学年论文english version
学年论文english version学年论文english version
学年论文english version
 
How Vietnam Stock Returns Response to Events Announcement
How Vietnam Stock Returns Response to Events AnnouncementHow Vietnam Stock Returns Response to Events Announcement
How Vietnam Stock Returns Response to Events Announcement
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Impact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock Market
Impact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock MarketImpact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock Market
Impact of Macroeconomic Factors on Share Price Index in Vietnam’s Stock Market
 
Foreign ExchangeRate Deterrntnattonand ForecastlngThe .docx
Foreign ExchangeRate Deterrntnattonand ForecastlngThe .docxForeign ExchangeRate Deterrntnattonand ForecastlngThe .docx
Foreign ExchangeRate Deterrntnattonand ForecastlngThe .docx
 
Concept-of-macroeconomics
Concept-of-macroeconomicsConcept-of-macroeconomics
Concept-of-macroeconomics
 
Security Analysis and Portfolio Management
Security Analysis and Portfolio ManagementSecurity Analysis and Portfolio Management
Security Analysis and Portfolio Management
 
Forecasting Economic Activity using Asset Prices
Forecasting Economic Activity using Asset PricesForecasting Economic Activity using Asset Prices
Forecasting Economic Activity using Asset Prices
 
Forecasting Stocks with Multivariate Time Series Models.
Forecasting Stocks with Multivariate Time Series Models.Forecasting Stocks with Multivariate Time Series Models.
Forecasting Stocks with Multivariate Time Series Models.
 
Dissertation final1
Dissertation final1Dissertation final1
Dissertation final1
 
A Global Perspective of Varying Interest Rates in International Markets
A Global Perspective of Varying Interest Rates in International MarketsA Global Perspective of Varying Interest Rates in International Markets
A Global Perspective of Varying Interest Rates in International Markets
 
C1-Overview.pptx
C1-Overview.pptxC1-Overview.pptx
C1-Overview.pptx
 

FYP

  • 1. 1 | P a g e University of Dublin TRINITY COLLEGE Modelling Financial Market Forces Using Regression and Sentiment Analysis Mark John Lyons B.A.I. Engineering Final Year Project April 2016 Supervisor: Professor Khurshid Ahmad School of Computer Science and Statistics O’Reilly Institute, Trinity College, Dublin 2, Ireland
  • 2. 2 | P a g e Abstract The aim of the project was to model the dynamics of financial markets so as to try observe and test the validity of financial theories. Specifically the theories mean reversion, volatility clustering and attribute framing. Regression and sentiment analysis were combined to achieve this. The R programming language was used to compute the statistics involved in the project.
  • 3. 3 | P a g e Acknowledgements I would like to thank my supervisor, Khurshid Ahmad, for giving me this opportunity. He provided great advice and helpful nudges throughout the project. I enjoyed the discussions on a range of topics; finance, computer science, maths and a small bit of history. Thank you also to Stephen Kelly, PhD student at Trinity College, for his help along the way. Particularly for introducing me to the Rocksteady program and discussing R with me. Finally thank you to my family and friends for their support throughout. In particular my Mother and Father Margaret and Pat Lyons for all the help they gave me.
  • 4. 4 | P a g e Table of Contents: Abstract...........................................................2 Acknowledgements .........................................3 1. Introduction..................................................6 2. Motivation and Lit Review 2.1. Economics…………………………………….7 2.2. Financial Markets & Monetary Policy…8 2.3. Behavioural Finance………………………11 2.4. Conclusion…………………………………..12 3. Method 3.1. Stochastics & Financial Series………..14 3.2. Stationarity & Returns……………………14 3.3. Stylised Facts & Summary Statistics..15 3.4. Linear Regression…………………………17 3.5. Ordinary Least Squares………………….19 3.6. OLS Assumptions………………………….20 3.7. Autoregression…………………………….23 3.8. Vector Autoregression…………………..23 3.9. LexisNexis Corpus………………………..24 3.10. Sentiment Analysis……………………….25 3.11. Tableau……………………………………….26 4. Case Studies and Results 4.1. Summary Statistics……………………….28 4.2. Autoregression…………………………….29 4.3. Vector Autoregression…………………..34 5. Conclusion and Future Work. 5.1. Work Completed……………………………40 5.2. Conclusions…………………………………40 5.3. Future Work…………………………………40
  • 5. 5 | P a g e 6. References…………………………………………….41
  • 6. 6 | P a g e 1 Introduction: This project began from a desire to pursue a project under an area of personal interest, currency, or forex, markets. The ever-changing value of money and the effect it could have on whole countries has always fascinated me. The wish was to learn Computer Engineering skills to better understand the market. First statistical research on the forex market was done and then evolved to financial markets as a whole. A comparison of the behaviour of 3 of the major financial markets (the bond market, the stock market and as stated the forex market) will be made. Traditional Finance theory states that markets should usually be rational and exhibit mean reversion to reassert an assets price. The newer emerging field of Behavioural Finance states that the market isn’t as efficient as theorised and the sentiment of traders will affect it. The project aimed to model post Global Financial Crisis market behaviour to see whether it, as theoretically proposed, performed mean reversion over the period. Behavioural finance theorems were assessed to see whether sentiment effects market movement when it acted irrationally. Volatility clustering was also of interest and observed in the market. Modelling the nature of the market is essential for finding investment opportunities and risk management in financial portfolios. The project involved learning data analysis techniques such as the ETL (extract, transform and load) process. Text analysis was applied to determine the implicit sentiment on the markets in major publications around the world. Knowledge of statistical theory in the context of finance was essential for preceding accurate modelling of the data. The R programming environment and language was learned to warehouse the large data sets and apply the statistical m ethods. Lastly emerging data visualisation standard Tableau was used to display the results as pleasingly as possible.
  • 7. 7 | P a g e 2 Motivation and Lit Review: 2.1 Economics Economics is a social science studying the movement of goods, services and wealth. BusinessDictionary.com defines economics as “the theories, principles, and models that deal with how the market process work s”. Economies and financial markets are then intrinsically linked in the capitalist system. The theoretical relationship between the US economy and its stock market for example is shown in figure 1 below. Also of note from the figure is the idea of econom ic cycles, a periodical cycle through recession and recovery. F ig u re 1 The economic crisis of 2007 –2008, from which many countries including the U.S. are still only slowly recovering, was of historic proportions, involving a f inancial market collapse, a rapid rise in unemployment, an unprecedented decline in world trade and massive government intervention aimed at reversing the downturn (Hielbroner and Milberg, 2011). This economic crisis, called the Global Financial Crisis, was the worst seen since the great depression of the 1930s. Of interest to this project was the behavior of the stock markets as an economic indicator post -crisis.
  • 8. 8 | P a g e The U.S. National Bureau of Economic Research (NBER) determined the economic trough of the Global Financial crisis to be June 2009. This trough is marked in a graph, figure 2, containing the stock market indices the S&P500 and the NASDAQ composite index. The figure shows empirical evidence of the relationship graphed above. The S&P 500 index wil l be used as the indicator when m odelling stock market behavior. F ig u re 2 2.2 Financial Markets & Monetary Policy The U.S. Federal Reserve (Fed) sets monetary policy in order to control this economic growth and contraction. Before discussing monetary polic y we need to understand how financial assets are priced in the market. Asset prices in the market are determined through price discovery. The invisible hand of the pricing mechanism coordinates supply and demand in markets in a way that is automatically in the best interests of society (Scott, 2006). Traditional finance theory would have us believe that the free market will keep prices fair and balanced and that arbitrageurs (people who utilise arbitrage*) will take advantage of any deviations thus restorin g the equilibrium. The return to equilibrium occurs through mean reversion which is defined as “the theory that interest rates, security prices, and *Investopedia: Arbitrage is the simultaneous purchase and sale of an asset in order to profit from a difference in the price.
  • 9. 9 | P a g e various economic indicators will, over time, return to their long - term averages after a significant short -term move”. This is called the efficient market hypothesis. F ig u re 3 Intuitively mean reversion can be seen as a positive change in price will be followed by a negative change and vice versa. Figure 3 is a picture depicting mean reversion. The three primary methods of im plementing monetary policy is setting interest rates, buying/selling U.S. treasuries on the open market and changing dollar reserve requirements. These actions affect the dollar value and the return rate on treasury bonds. A deeper discussion of monetary policy and the means by which it is carried out is not presented here. The rate of return on U.S. treasuries is an important indicator in the trust of the market in the U.S economy and as such we will assess it also. The Fed discusses monetary policy in Federal Open Market Committee (FOMC) meetings 8 of which are scheduled every year, after the 2 d ay meeting the Fed reveals its view on economic activity, forecasts for future activity and changes to monetary policy in an announcement afterwards. It is stated that volatility of asset prices such as the S&P 500 and dollar foreign exchange rates increas es on announcement days and in particular around the release time of the announcement. To test this statement the minute by minute rate of change, or volatility, of the EUR/USD exchange rate on both an average day and a FOMC announcement day was calculated . This is shown in figure 4. Unfortunately intraday data of the S&P500 and Treasury Bonds could not be obtained freely to do similar
  • 10. 10 | P a g e calculations. The graphs y axes are normalised to the maxim um return value of both series. F ig u re 4 Note the overall in crease in volatility during the day and the particular increases at the announcement time and the end of the trading day. Per noble prize winning economist Mandelbrot (1963) “large changes tend to be followed by large changes, of either sign, and small cha nges tend to be followed by small changes.” This is known as volatility clustering. The DXY will be the final economic indicator modelled. A paper by Romer and Romer (2000) discusses the Feds FOMC forecasts and concludes that “the Fed has information about future inflation that market participants do not have”. Future inflation levels determine the future value of the dollar. The FOMC announcement is then heavily dissected by market participants and the future prospects of the dollar speculated. The asymmetric holding of information by the Fed therefore can cause movement in the dollar price as arbitrageurs buy/sell dollars to capitalise on its long term change . As previously described traditional finance theory assumes the supply and demand equilibrium wil l be restored by the rational market and
  • 11. 11 | P a g e the true price rediscovered. However the speculation about the future value also has an effect beyond supply and demand on the current price. This affect falls under the branch of behavioural finance which attempts to explain price anomalies in terms of the biased behaviour of individuals. 2.3 Behavioural Finance Behavioural finance is “a new approach to financial markets that has emerged, at least in part, in response to the difficulties faced by the traditional parad igm. In broad terms, it argues that some financial phenomena can be better understood using models in which some agents are not fully rational” (Barbaris and Thaller, 2003). It states that the market is exuberant rather than rational and arbitrage may not always offset shocks to the market. Figure 5 shows empirical evidence that assets demonstrate high volatility much more frequently than expected. These instances of irrationality are of concern to market participants. From figure 4 it can be seen that volatility on FOMC days regularly exceeds expectation, we infer the market is acting exuberantly and turn to behavioural finance theory to see if it can explain the activity. F ig u re 5 ( re pr o du ce d fro m K h u r s hi d A h ma d s B e hav io ura l Fi nan c e l e ct ur e s)
  • 12. 12 | P a g e One important observation in behavioural finance is framing. Entman (1993) summarises framing as “selecting some aspects of perceived reality and make them more salient in the communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation and/or treatment recommendation for the item described". W e hypothesise that the framing of information about the FOMC announcement can influence the uncertainty in speculators towards a certain bias. This type of frami ng is called attribute framing. Panasiak and Terry (2013) say of attribute framing “an event can receive different reviews when it is framed in a positive vs negative light”. To test the hypothesis that positive or negative framing can sway the bias, and cons equently activity, of the market we evaluate the sentiment of major worldwide publications when reviewing the FOMC meetings. If there is correlation between market movement and the sentiment over a long period of time our hypothesis is confirmed. 2.4 Conclusion So the goal is to model mean reversion of the market over a long period and also analyse the sentiments effect on market movement, how do we achieve this? Using regression analysis. Regression analysis is a branch of statistical modelling which aims to estimate the relationship between variables. In order to model mean reversion the autoregressive area of regression analysis was researched. Autoregressive models of time series estimate the effect of previous values on future values of a variable. Using the definition of mean reversion that a positive change will be followed by a negative one (and vice versa) we see it is necessary to model the relationship between price changes and their prior price change. Therefore autoregression is a suitable model for testing for mean reversion. Modelling the influence of sentiment on price movement involves the simultaneous analysis of multiple variables, termed multivariate analysis. The vector autoregressive model explains an endogenous variable by a range of e xogenous variables. In the words of Del Negro and Schorfheide (2011) “at first glance, VARs appear to be straightforward multivariate generalisations of univariate autoregressive models. At second sight, they turn out to be one of the key empirical tools i n modern macroeconomics”. The power of vector autoregressive models comes from the ability to model seemingly unrelated variables and determine their interdependencies. Vector
  • 13. 13 | P a g e autoregression is chosen to model the correlation between sentiment and price change.
  • 14. 14 | P a g e 3 Method: 3.1 Stochastic Processes & Financial Series A stochastic process is a sequence of random variables, {Xt }, indexed by t where t is usually a subset, T, of time [0, ∞). Many natural processes are modelled as stochastic due to their random behaviour. Since the closing price of an asset tomorrow, Pt + 1 , cannot be predicted today we regard P t + 1 as a random variable (Taylor, 2005). The set of prices, {Pt }, can then be thought of as a set of random variables or a realisation of a stochastic process. Price changes can occur at any point on the time scale during the trading day; therefore P t is a continuous function of time. The financial time series analysed here will be sampled at regular time intervals and so are discrete stochastic processes. Discretising the data makes for easier computation and analysis of behaviour over specific periods. 3.2 Stationarity & Returns Stationarity describes a property of the process to achieve a certain state of statistical equilibrium so that the distribution of the process does not change much (Rachev et al, 2007). Put simpler a stationary series can be defined as one with a constant mean, constant variance and constant autocovariances for each given lag. The probability distribution of financial time series over a period is heavily time period dependent as prices naturally rise due to inflation. The mean and standard deviation of the series over a long period of time will not give an accurate representation of the series behaviour over the period . To achieve stationarity of our series we find price returns. Price returns are the change in price over a time period. The formula for log returns, denoted by 𝑟𝑡, is defined by Ruppert and Matteson (2011) as: 𝑟𝑡 = log(1 + 𝑅𝑡) = log ( 𝑃𝑡 𝑃𝑡−1 ) where 𝑅𝑡is the net return 𝑅𝑡 = (𝑃𝑡/𝑃𝑡−1) − 1.
  • 15. 15 | P a g e Taking the return instead of raw price achiev es time invariance of the series, a con stant mean and constant variance for the series . Figure 6 shows the EUR/GBP exchange rate, from 2013 to 2016, in orange and the return series generated from it in blue. Note how the EUR/GBP rate has been detrended to a constant nature over time in the return series. The probability distribution for the return series gives a more accurate indication of market behaviour than the raw series distribution. The expected mean of the return distribution is 0 with some constant variance 𝜎2 . Figure 6 Further mention of the economic indicator series will reference their return series. 3.3 Stylised Facts & Summary Statistics Stylised facts are “general properties that are expected to be present in any set of returns” and “are pervasive a cross time as well as across markets” (Taylor, 2005). One importan t stylised fact Taylor states is “the distribution of returns is not
  • 16. 16 | P a g e normal”; the assumption of normality of returns is important for many financial techniques so the returns distribution is analysed. The summary statistics mean ( 𝑟̅), standard deviation ( s), skewness (b) and kurtosis (k) are used to describe the characteristics of a distribution. They are defined, for a set of n returns to be: 𝑟̅ = 1 𝑛 ∑ 𝑟𝑡 𝑛 𝑡=1 , 𝑠2 = 1 𝑛 − 1 ∑(𝑟𝑡 − 𝑛 𝑡=1 𝑟̅)2 , 𝑏 = 1 𝑛 − 1 ∑ (𝑟𝑡 − 𝑟̅)3 𝑠3 , 𝑛 𝑡=1 𝑘 = 1 𝑛 − 1 ∑ (𝑟𝑡 − 𝑟̅)4 𝑠4 𝑛 𝑡=1 . The summary statistics , also called the moments of data, are used to find the closeness of the distribution of returns to a normal distribution. Mean, the first moment, and standard deviation, the square root of the second moment variance, are elementary probability measures and it is assumed the reader underst ands them already. Briefly to note, the mean indicates the central tendency point of the distribution and the standard deviation reveals the dispersion of data points. The standard deviation is also important for standardising the distribution using z -scores particularly in multivariate analysis. Skewness is a measure of the asymmetry of the distribution about the central tendency. Outliers produce skewed distributions. A visual display of skewness measurement is shown below in figure 7. Figure 7
  • 17. 17 | P a g e Kurtosis is the relative concentration of scores in the center, the upper and lower ends (tails), and the shoulders (between the center and the tails) of a distribution (Norusis, 1994). Kurtosis measures how peaked a distribution is. In a normal distribution kurtosis is equal to three, to compare a distributions kurtosis to the normal the “excess kurtosis” is found by negating three from the measured kurtosis. A distribution is called leptokurtic if the excess kurtosis is positive, mesokurtic if there is no exce ss kurtosis and platykurtic if excess kurtosis is negative. A visual representation of kurtosis is given in figure 8. Figure 8 A final summary statistic, the z -statistic, defined as: 𝑧 = 𝑟̅ 𝑠/√ 𝑛 is used to “assess the null hypothesis that the expected return is zero” (Taylor, 2005). 3.4 Linear Regression Regression analysis is an area of statistics which aims to model the effect of a given set of explanatory random variables x, {x1 ,...,xk }, also called regressors, on a variable of primary interest y. “A main characteristic of regression models is that the relationship between the response variable y is not a deterministic function f (x) of x (as often is the case in
  • 18. 18 | P a g e classical physics), but rather shows random errors ” (Fahrmeir et al, 2013). Linear regression methods estimate the relationship between y and x by modelling the best fitting linear relationship between the response and explanatory variables. The ordinary least squares method is a popular technique to model the best linear fit and will be discussed shortly. Linear regression models are composed of a “systematic (or deterministic) component, 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘 , and an idiosyncratic (or stochastic) co mponent, ε” (Miller, 2014). The deterministic part consists of the vertical axis intercept 𝛽0 and a summation of the regressors weighted by a set of m atching regression coefficients {𝛽1, … , 𝛽 𝑘}. The regression coefficients are unknown parameters which weight how much effect each variable in the set 𝒙 has on 𝑦. “More precisely, 𝛽𝑖 is the partial derivative of the expected response with respect to the ith regressor” (Ruppert, Matteson 2011 ). The stochastic component is a set of error (or re sidual) terms, ε, to account for the error between the line and data points. The linear regressive model equation: 𝑦 = 𝛽0 + ∑ 𝛽𝑖 𝑥𝑖 𝑘 𝑖=1 + 𝜺 Figure 9 shows a plot of the response of y to a variable x and a line fitted as a linear response o f y to x. The distance of the thin lines connecting the points above and below the line are the error terms for the points. Figure 9
  • 19. 19 | P a g e The unknown regression coefficients are solved for by the ordinary least squares (OLS) method. Once these are determined they’re plugged back into the equation above to find the OLS linear regressive model. 3.5 Ordinary Least Squares The process undertaken by the OLS method to estimate the optimal regression coefficients and the slope is a minimisation of the difference be tween the observed response variable data points, yi , and their linearly predicted values 𝑦i – εi . Wooldridge (2000) gives a good description of OLS and so to detail the process we follow his explanation. Paraphrasing his discussion of a 2 regressor vari able system to a general system he states “given n observations on y, x1 , x2 , … xk , {(xi 1 , xi 2 , … xi k , yi ): i = 1, 2, … , n}, the estimates of β,{ 𝛽̂0, 𝛽̂1, … , 𝛽̂ 𝑘}, are chosen simultaneously to make: 𝑆𝑆𝐸 = ∑( 𝑛 𝑖=1 𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖1 − ⋯ − 𝛽̂ 𝑘 𝑥𝑖𝑘)2 = ∑ 𝜀𝑖 2 𝑛 𝑖=1 as small as possible.” The residuals are squared to account for positive and negative values negating. That is, for all observation points i= 1,…,n of the explanatory variables the squared error terms ( 𝜀𝒊 = 𝑦𝑖 − 𝛽0 − ∑ 𝛽𝑖 𝑥𝑖 𝑁 𝑖=1 , from the linear regressive model equation) are summed up so that the minimum solution can be found. Multivariable calculus is used to solve thi s minimisation problem to a system of k+1 linear equations in k+1 unknown’s 𝛽̂0, 𝛽̂1, … , 𝛽̂ 𝑘. W e want to find the critical points of the SSE equation in order to minimise it. Taking the first partial derivative of the equation with respect to each of the 𝛽̂𝑗, evaluating them at the solutio ns, and setting them equal to zero gives: −2 ∑ 𝑥𝑖𝑗( 𝑛 𝑖=1 𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖1 − 𝛽̂2 𝑥𝑖2 − ⋯ − 𝛽̂ 𝑘 𝑥𝑖𝑘) = 0, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗 = 0, … , 𝑘. Cancel the -2 and we have the desired system of linear equations. This extremely large s ystem, called the OLS first order conditions, can be solved through standard linear equation methods by R, the statistics software used in this project, very quickly.
  • 20. 20 | P a g e 3.6 OLS Assumptions OLS will resolve the regression coefficients given any arbitrary response variable or set of explanatory variables. However in order to uniquely determine the regression parameters, and be confident of the inferences we make based on the model, we need to make assumptions about our variables. Per Miller (2014) these assumptions are: A1: The relationship between the regressor and the regressand is linear. A2: 𝐸[𝜀 | 𝑥] = 0 A3: 𝑉𝑎𝑟[𝜀 | 𝑥] = 𝜎2 A4: 𝐶𝑜𝑣[𝜀𝑖 | 𝜀𝑗 ] = 0 ∀ 𝑖 ≠ 𝑗 A5: 𝜀𝑖 ~ 𝑁(0, 𝜎2) ∀ 𝜀𝑖 A6: The regressor is nonstochastic . It is important to test the model for these assumptions to satisfy that it is valid. The images in this section are sourced from a tutorial on R-bloggers.com called “Graphic Analysis of Regression Assumptions”. Assumption 1 should be self-explanatory. To linearly model y against x there must be a linear relationship between them. To test this assumption the residuals are plotted against the values predicted by the model, if the graph shows an even spread of data points about the x -axis then the linearity assumption is met. Figure 10 shows an even spread and thus linearity is confirmed. Figure 10
  • 21. 21 | P a g e Assumption 2 states that the expected value of the models error terms should be zero; this refers to the fact that a perfectly fitted line will have residuals distributed evenly above and below the line leading to a mean value of zero. The mean value of the models residuals is found to test this. Assumptions 3 and 4 are sometimes grouped together. They state the error terms must have constant variance (3) and be uncorrelated (4). If this is true the error terms are called spherical errors. Constant variance, also called homoscedasticity, assumes that the variance of the error terms does not change over time. To test this we look at the same plot as assumption 1 to see if the vertical distance between error terms grows consistently in either direction. If they don’t then we have homoscedasticity. Figure 11 shows homoscedasticity, note how the variance rises and falls but doesn’t do so persistently. Figure 11 Error terms should be completely random; having correlation among the terms means the OLS process made a systematic error judging the line. The autocorrelation function is run on the residuals to determine if there is correlation. If autocorrelation, or serial co rrelation, is found this can violate trust in the model. The Durbin -W atson statistic, d, is used to test the significance of this autocorrelation and consequently the accuracy of the model.
  • 22. 22 | P a g e 𝑑 = ∑ (𝜀𝑡 − 𝜀𝑡−1)2𝑁 𝑛=2 ∑ 𝜀2𝑁 𝑛=1 The value of d always lies between 0 and 4. If d is 2 there is no autocorrelation, values below this imply positive autocorrelation (successive error terms are close in value to one another) and valu es above indicate negative autocorrelation (successive error terms are different). Assumption 5 says that the residuals must be normally distributed. However as Miller states “many of the results of the OLS model are true, regardless of this assumption.” T he assumption is therefore mostly useful for defining confidence levels for the model parameters. A probability distribution of the errors is graphed and analysed to determine the closeness to a normal distribution . Figure 12 shows this comparison. Figure 12 The summary statistics could also be useful here; they can be helpful in comparing the distribution to a normal distribution. W e omit the summary statistics for the error terms and instead present the distribution visually. The less similar the error terms distribution is to the normal distribution the less accurate the OLS will be.
  • 23. 23 | P a g e From a practical standpoint the OLS model holds true regardless of the 6t h assumption so it will not be discussed here nor will we test for it. 3.7 Autoregression An autoregression model, notation AR(p), is a form of linear regression model where the set of regressor variables is p lags of the response variable. The equation for an AR(p) model: 𝑦𝑡 = 𝛽0 + ∑ 𝛽𝑖 𝑦𝑡−𝑖 𝑝 𝑖=1 + 𝜀𝑡 We wish to use the autoregressive model to detect if mean reversion occurred therefore a lag of 1 is chosen for our model. Taking the first lag defines the effect on a time period, t, of the period before. The AR(1) model is: 𝑦𝑡 = 𝛽0 + 𝛽1 𝑦𝑡−1 + 𝜀𝑡 The regression coefficient 𝛽1 determines whether there is mean reversion. Rouzet (2010) states if |𝛽1| < 1 in an AR(1), the process is mean reverting. This can be seen if you think of a realisation of 𝑦𝑡−1 being a non-zero number then the 𝛽1 coefficient will “shrink” 𝑦𝑡 towards our mean of zero. There is then an inverse relationship between 𝛽1 and mean reversion, the smaller the absolute value of 𝛽1 the more reversion has occurred. If 𝛽1 is negative then we can say a positive change is usually followed by a negative one. 3.8 Vector Autoregression The vector autoregressive model, notation VAR(p), is a multivariate generalised version of th e autoregressive model. It extends the set of regressor values from the lags of the dependent variable to the lags of exogenous variables as well as the dependent variable lags. The equation for a VAR(p) model with k variables: 𝑦𝑡 = 𝛽0 + ∑ 𝛽𝑖 𝑦𝑡−𝑖 𝑝 𝑖=1 + 𝜀𝑡
  • 24. 24 | P a g e the same form as the AR(p) model except that 𝛽0, 𝜀𝑡, and each 𝑦𝑖 is a vector of length k and each 𝛽𝑖 is a kxk matrix. To illustrate, the general example of a VAR(1) in 2 variables is given: [ 𝑦1,𝑡 𝑦2,𝑡 ] = [ 𝛽1,0 𝛽2,0 ] + [ 𝛽1,1 𝛽1,2 𝛽2,1 𝛽2,2 ] [ 𝑦1,𝑡−1 𝑦2,𝑡−1 ] + [ 𝜀1,𝑡 𝜀2,𝑡 ] Most important to us the regression coefficients are found for the lags of the dependent and exogenous varia bles so that the effect of the changes in the exogenous variables on the dependent variable can be seen. To test the effect of sentiment on our series we model the indicators as the dependent variables and the sentiment as regressor variables. 3.9 LexisNexis Corpus Nesselhauf (2005) defines a corpus as “a systematic collection of naturally occurring texts (of both written and spoken language).” A corpus of news articles is created to analyse the sentiment and try determine if the framing in these articles follows market movement. Figure 13
  • 25. 25 | P a g e LexisNexis is a provider of legal, government, business and high-tech information sources. TCD attendants are allowed free use of the LexisNexis database. To find a corpus specific to the project articles were filtere d based on the search imaged in figure 13. 3.10 Sentiment Analysis Sentiment from the corpus was analysed using the Rocksteady program. Rocksteady is a text analytics system created in Trinity College Dublin by Khurshid Ahmad and his postgraduate students. Rocksteady uses a bag of words approach; it breaks the corpus down into the words constituting it, regardless of order, and compares them against a specialised dictionary. The dictionary contains weighting for what sentiment is inherent in each word. A z-score based on this weighting is computed for each type of sentiment expressed in a daily aggregation of articles. A z-score, or standard score, indicates how many standard deviations a raw score is from the mean. In the context of Rocksteady it indicates h ow much stronger a sentiment express over a day is compared to normal. Figure 14
  • 26. 26 | P a g e Rocksteady analyses positive, negative, active, passive, strong, weak, economic, political and militant sentiment. Of interest to the project are the first two types of sentiment. Figure 14 shows sample section of the Rocksteady output for the projects corpus. Red boxes indicate extremely high levels of a sentiment expressed that day and yellow boxes show moderately high sentiment. 3.11 Tableau Tableau is an emerging standard for dat a visualisation. It is offers a variety of options for graphing enabling users to display information as intuitively as possible. It also integrates with the R programming environment using RServe to open communication between the programs. This combines t he power and flexibility available through R to compute advanced statistical processes with the ease of Tableaus visualisation process. The original images Figure 3, Figure 4 and Figure 6 produced in this report were created in Tableau. Two examples of the Tableau interface are provided. The data loading procedure, showing a left join of two datasets, is in the figure above while the graphing process is shown in the figure 16 on the next page. Figure 15
  • 27. 27 | P a g e Figure 16
  • 28. 28 | P a g e 4 Case Study & Results 4.1 Stylised facts & Summary Statistics The stylised facts of the DXY, S&P500 and 10 year Treasury bond (T10) returns for the period from June 2009 to present day are presented in the table below. Indicator Mean Return (*10^4) Standard Deviation (*10^2) Skewness Excess Kurtosis Z statistic T10 -3.89 2.27 0.15 0.81 -0.71 DXY 0.59 0.45 -0.03 1.54 0.54 S&P500 4.49 1.02 -0.42 3.56 1.83 All three series were leptokurtic meaning the distributions were quite peaked. The DXY and the S&P 500 has slight negative skew about their means. Due to a positive mean with negative skew we can infer that daily returns over the period were more likely to be positive. The T10 conversely had a positive skew and negative mean . This means there was a greater portion of negative returns over t he period. The standardised distributions of the T10 and the S&P 500 are given in figure s 13 and 14, with a normal distribution curve overlaid, to give context to the summary statistics. Figure 17
  • 29. 29 | P a g e Figure 18 Note in particular how peaked the S&P 500 d istribution is. Standardised distributions are graphed in units of standard deviation on the x-axis, we can see there is extreme outliers outside of 4 standards deviations in the S&P500 compared to the T10 which cause this. 4.2 Autoregression Assumptions Tests: Plots of the standardised residuals against fitted values for each model are shown in figures 19, 20 and 21. These are used for testing assumptions 1 and 3. A table containing the mean residual value for each model is also provided for assumption 2 tests. To validate assumption 4 plots of the autocorrelation of residuals is given in figures 22, 23 and 24. Assumption 1: Linearity. In each model there is an even spread of standardised residuals about the fitted values, therefore the assumptions that a linear relations exists between the response and regressor variables are true for each model.
  • 30. 30 | P a g e Figure 19 – DXY model Assumption 2: As we can see from the table the mean, or expected, value of the error terms are all extremely small numbers. They are c lose enough to zero to be sufficient to meet this assumption. Figure 20 – S&P 500 Model T10 DXY S&P 500 E[ ε | x ] -4.18E-19 2.22E-17 -9.13E-19
  • 31. 31 | P a g e Figure 21 – T10 Model Assumption 3: Constant variance. There is no directional growth of variance in any of the figures. The models are homoscedastic, assumption 3 is met. Figure 22 – ACF of DXY model residuals
  • 32. 32 | P a g e Figure 23 – ACF of S&P500 model residuals Assumption 4: As we can see from the autocorrelation plots there is persistent serial correlation in the models lags. The correlation of errors and certain lagged errors, in each model, is big enough to suggest that autocorrelation may be problem. The Durbin -W atson test was run on each model to determine the statistical significance of this. The results are presented in a table on the next page. Figure 24 – ACF of T10 Model Residuals
  • 33. 33 | P a g e Comparing the Durbin W atson test results to 2 we can see that the autocorrelation is statistically insignificant. Assumption is validated. Assumption 5: Normality of errors. Thou gh this assumption does not need to be fully met in order for the model to be true it does assess the confidence we can have in the model. The distribution of errors for the T10 model is shown below compared to a normal distribution. There is a very close fit to the normal distribution and so there can be confidence in the results of the model. The distributions for the other models are very similar and have been omitted. Figure 25 DXY S&P 500 T10 DW-test 1.999 1.996 1.998
  • 34. 34 | P a g e Results: Indicator Regression Coefficient Reversion T10 -0.0222 Yes DXY -0.0433 Yes S&P 500 -0.0614 Yes All three series displayed mean reversion from one day to the next. The magnitude of the return s from day to day shrunk to zero as seen by the fractional coefficient. Also of interest is the negative sign of the coefficient, returns from day to day tends to be in the opposite direction to each other. 4.3 Vector Autoregression The vector autoregressive model was ran between negative/positive sentiment and the three finan cial time series. The estimation results for the coefficients determine the effect of the sentiment on each series. The t value of the model measures the size of the errors relative to the variation in the sample data. More simply it tests how well the mod el fit by taking a ratio of the distance between the estimated value and observed value and the standard error. The p statistic, noted by Pr(>|t|) in the images, is a hypothesis test that determines the significance of the result. Significance in this case refers to how much effect a change in the regressor variable, sentiment, had on the response variable, the financial indicators. Assumptions Tests: For the sake of brevity we present the assumption test results for the VAR model run on the DXY and posi tive sentiment and omit the tests for the other models as the results were very similar to each other and the results in the autoregressive section. Figure 26 shows the model as linear and with constant variance. The autocorrelation plot in figure 27 shows small serial correlation, however the Durbin -W atson test result of 1.999 renders this insignificant. The histogram of the residuals contains some negative skew but reflects a normal distribution fairly well. The mean value of errors was -1.03e-21.
  • 35. 35 | P a g e Figure 26 Figure 27
  • 36. 36 | P a g e Figure 28 Results: First the DXY: Figure 29 – DXY and Negative Returns
  • 37. 37 | P a g e Figure 30 – DXY and Positive Returns The DXY has no significant correlation between it and positive or negative sentiment. Secondly the T10: Figure 31 – T10 and Negative Sentiment
  • 38. 38 | P a g e Figure 32 – T10 and Positive Sentiment The T10 shows some effect from negative sentiment of articles 5 days prior. This is small thou gh and may be an artefact from modelling. Finally the S&P500: Figure 33 – S&P 500 and Negative Sentiment
  • 39. 39 | P a g e Figure 34 – S&P 500 and Positive Sentiment There is no significant correlation between sentiment expressed about the FOMC meetings and the S&P500 either. The hypothesis that major publications can swa y bias by attribute framing, and influence market movement, has been debunked under the parameters of this experiment.
  • 40. 40 | P a g e 5 Conclusion & Future Work 5.1 Work Completed The project offered involved analysing sentiment in financial markets using re gression analysis. E xtensive background research into finance was necessary in order to create and understand a context to analyse. Concurrently study was done on statistical methods. After more basics statistical measures were understood work turned to un derstanding regression analysis. Particularly autoregression and vector autoregression analysis. As these methods produce results regardless of context a deep understanding of their properties was vital to ensure the models created accurate results. Throug hout the project the R programming language was learned to apply the statistical methods to big data sets. Once the models were created in R property tests were applied to validate them. The result was the modelling of mean reversion and sentiment in financial markets. 5.2 Future Work Reassessing the filters for the corpus may reveal better results for the VAR models. LexisNexis limited corpus downloads to 500 articles, building a larger , more selective corpus out of the limited corpus’ would also be benefic ial to the project and attempted if more time was available. Additionally we viewed volatility clustering in figure 4. Further modelling of this through a GARCH model would be desirable. 5.3 Conclusion The project gave a good grounding in regression and sentiment analysis and the methods involved. Regression analysis is a powerful flexible tool that can be applied to a wide range of applications. As such it was very beneficial to learn. The results did not display correlation of sentiment and financial markets as expected however further work may prove more revealing.
  • 41. 41 | P a g e 6 References Del Negro, M. and Schorfheide, F. (2011). Bayesian Macroeconomics. The Oxford Handbook of Bayesian Econometrics, vol. 1, p.293–389. Entman, R. (1993). Framing: Toward Clari fication of a Fractured Paradigm. Journal of Communication, vol. 43, pp.51-58. Fahrmeir, L. (2013). Regression: Models, Methods and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg. Heilbroner, R. and Milberg, W . (2012). The making of economic society. Upper Saddle River, N.J.: Pearson. Mandelbrot, B. B. (1963) The variation of certain speculative prices. Journal of Business, vol. 36, pp. 392–41. Miller, M. (2012). Mathematics and statistics for financial risk management. Hoboken, N.J.: W ile y. Nesselhauf, N. (2005). Corpus Linguistics: A Practical Introduction. Available at: http://www.as.uni- heidelberg.de/personen/Ne sselhauf/files/Corpus%20Linguisti cs%20Practical%20Introduction.pdf [Accessed: 04/04/2016] Nicholas & Thaler, Richard, 2003. A survey of behavioral finance. Handbook of the Economics of Finance , vol. 1, pp. 1053-1128. Norusis, M. J. (1994). SPSS 6.1 base system user’s guide, part 2. Chicago, IL: SPSS. Panasiak, M. and Terry, E. (2013). Framing Effects and Financial Decision Making. Proceedings of 8t h Annual London Business Research Conference. Imperial College, London. Rachev, S., Mittnik, S. and Fabo zzi, F. (2007). Financial econometrics. Hoboken, New Jersey: John W iley & Sons. Romer, C. and Romer, D. (2000). Federal Reserve Information and the Behavior of Interest Rates. American
  • 42. 42 | P a g e Economic Review, vol. 90, pp.429 -457. Available at: http://www.cfapubs.org/doi/pdf/10.2469/dig.v31.n1.805 [Accessed: 06/03/2016] Rouzet, D. (2010) Lectures slides on: Discounted Dividens and Asset Prices. Available at: http://isites.harvard.edu/fs/docs/icb.topic734133.files/Sectio n6.pdf [Accessed: 15/04/2016] Ruppert, D. and Matteson, D. S. (2011) . Statistics and Data Analysis for Financial Engineering . New York, NY: Springer New York. Scott, B. R. (2006). The Political Economy of Capitalism . Available at: http://www.hbs.edu/faculty/Publication%20Files/07 -037.pdf [Accessed: 17/02/2016] Taylor, S. (2007). Asset price dynamics, volatility, and prediction. Princeton, N.J.: Princeton University Press. Wooldridge, J. (2013). Introductory econometrics. Mason, OH: South-W estern Cengage Learning.
  • 43. 43 | P a g e
  • 44. 44 | P a g e