SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
"Youknow,I've been dealingwith these big
mathematical modelsfor forecastingthe economy...ifI
couldfigure outa wayto determine whether or not
people are more fearful,or changingto more euphoric,
and havea thirdway of figuringoutwhichof the two
thingsare working, I don'tneed any ofthis other stuff.
I couldforecast the economy better thanany way I
know how" AlanGreenspan,November 2007
Abstract
This quote fromformer Fed Chairmanalludes
to a theoretical understandingofmarket dynamics
proposedby Behavioral Economistssuggesting that
“psychological,social,cognitive,andemotional factors
influencethe economicdecisionsof individualsand
institutionshaveconsequencesfor market prices,
returns, and resource allocation”( Wikipedia).
Criticsof Behavioral Economics cite the
EfficientMarket Hypothesis statingthat “itis
impossibleto"beat the market" because stock market
efficiencycauses existingshareprices to always
incorporateand reflect all relevant information.
Accordingto the EMH, stocks alwaystrade at their fair
valueon stock exchanges,makingit impossiblefor
investorsto either purchaseundervaluedstocks or sell
stocks for inflatedprices.As such, it shouldbe
impossibletooutperformthe overall market through
expert stock selection or market timing,and that the
onlyway an investor canpossiblyobtainhigherreturns
is by purchasingriskierinvestments”
The debate between proponentsof Behavioral
Economicsandthe EfficientMarket Theory spansa
vast range oftopics that are far beyondthe scope of
this paper. Whatfollowsrather isan introductory
explorationofwhether or not sentimentdata—
extracted from Twitter, Stock Twits,and market
focusedchat rooms—canbe used to predict dailystock
market returns. Analysiswasperformed on3 years of
a sentiment data from 2012-2015 providedby
PsychSignal© forresearchpurposes.
The dataconsistsof two sentimentbased time
series indicatorsderived froma proprietary sentiment
miningprocessdeveloped by PsychSignal. TheSpider
S&P500 ETF (SPY) was used as the target variable for
the forecastinganalysis.
Both contemporaneousandlagged
relationshipsbetweenthe SPYand PsychSignal
indicators were explored as well as relationships
between the SPYand two compositeindicators
derived from manipulationsofthe raw PsychSignal
data.
Finally,MultivariateOLSRegressionmodels
were appliedto the data to determine whether
PsychSignal’ssentimentbasedindicatorscouldbe used
to predict next day returns in the Spider S&P500 ETF
Data
The data for thisanalysiswasobtainedfrom
Quandl.comandPsychSignal’swebsite. Accordingto
PsychSignal’sWebsite:
• We measure twotypesof sentiment.Bullish
and Bearish.Theymeasure veryspecific
financial language andnotsimplypositiveand
negative.
• Bullishsentimentcanbe interpretedasbuyers
or those whofavorthe buyside.
• Bearishsentimentcanbe interpretedassellers
or those whofavorthe sell side.
• Our sentimentismeasuredona0 - 4 scale.0
beingthe lowest4beingthe highest.We
sometimesconvertthe 0-4 scale to %.
• The 0 - 4 sentimentscale measuresboththe
intensity of the sentimentexpressedby
individualsbutalsothe collective volumeof that
sentimentexpressedoverthe crowdmood.
A clarificationwasmade viaanemail exchange with
PsychSignal’sfounderconfirmingthatintensity refersto
the degree of sentimentpresentinastatement. For
example, compare the followingstatements:
1. The SPY is reallystrongtoday
2. The SPY doesn’tlook like abadbuy here
The firststatementwouldreceiveahigherintensity
rating.
The sentimentdataisthenaggregatedinto4 indicators
and are chartedinFigures1 and 2 alongside the SPY:
1. BearishSentiment (Intensity)
2. BullishSentiment (Intensity)
3. BearishVolume (Quantity)
4. BullishVolume (Quantity)
Figure 1 SPY vs. Bullish and Bearish Sentiment (weekend sentiment
data removed)
Figure 2 SPY vs. Bullish and Bearish Volume (weekend sentiment
data removed)
Data Challenges
Upon visual inspectionof the original Psychsignal data
setsshownbelowinFigure 3, it appearedasif there
were 3 distincttrendsinthe data that seemedtobe the
resultof some systematicprocessassociatedwiththe
waythe data wascollected.
Figure 3 Original PsychSignal Dataset revealing 3 systematic trends
in the data prior to 2012 and after 2015
You can see that priorto 2012 there isconsistently
extreme variance inthe dataset. Additionally,there
appearsto be a systematicjumpinbothseries
subsequentto2015. Neitherphenomenoncoincided
witha similarinthe observedSPYdata. Therefore,it
was assumedtobe the resultof a systematicprocess
associatedwiththe datacollection processperformed
by PsychSignal. Forthat reason,analysiswas confined
to a 3 yearperiodbetween2012 and 2015. There are
over700 data pointsinthistime periodwhichshouldbe
sufficientforthisanalysis.
Secondly,the original PsychSignal dataincluded
sentimentdatacollectedduringthe weekend. A
zoomedininspectionof the 10 daycorrelationbetween
the PsychSignal indicatorsshownbelowreveala
consistentseasonal decline incorrelation withthe SPY
that coincideswiththe weekendsentimentdata.
Figure 4. Rolling 10 day correlation between SPY and Bullish/Bearish
indicators. Cyclical pattern coincides with weekend sentiment
measurement.
A choice wasmade to remove the weekendsentiment
data basedon the assumptionthatmarketparticipants
discussingthe marketsduringthe weekendwere likely
ruminatingovercarriedtradingpositionswhichthey
were unsure of potentiallyresultinginbias.
Data Transformations
A visual inspectionof the datarevealscleartrendsin
the meanand variance of each series. Eachserieswas
differencedto induce trendstationaryandthen
standardizedtoz scorescenteredaroundthe mean to
minimize the impactbetweendifferentscalesof
measurement. The resultingdistributionsare shown
below.
Figure 5. Distribution of data after differencing and standardization
It isnecessarytopointout that bothdailyand
annualizedreturnsin S&P500 are platykurtic,witha
large concentrationof valuescenteredatthe meanand
largertailsthan wouldbe expectedinanormal
distribution. The SPYdata usedforthisanalysisis
consistentwiththishistorical tendencyasshownbelow.
Figure 6. Platykurtic Distribution of Daily SPY Returns between 2012
and 2015
It appearsas if thisphenomenonisconsistentin the
PsychSignal dataaswell. The figure belowshowsan
overlayof all of the distributionsusedinthisanalysis.
Figure 7. Overlay of data distributions
One of the assumptionsthatwill be made throughout
the followinganalysisisthatviolatingthe assumptionof
normalityinthese datawill have anegligibleimpacton
resultsdue tothe similarity inshape of eachdataset’s
distribution.
Additionally, itwill be assumedthatthe data provided
by PsychSignal isanaccurate reflectionof the true
sentimentexpressedbymarketparticipantsregardless
of the potential impactonmarket prices. Inother
words,the data reflectsthe true feelingsexpressedby
tradersevenif those feelingscanorcannot predict
marketmovement.
Figure 8. A visual inspection of the chart above reveals a consistent
visual relationship between the PsychSignal index and survey based
data collected in the weeklyNAAIM and AAII Sentiment Surveys
All otherassumptions regardingmodelingwill be
discussedin latersections.
ContemporaneousAnalysis
Afterthe necessarydatacleaningandtransformation,
analysiswasperformedtodetermine if there were any
contemporaneousrelationshipsevidentinthe data. For
clarification,contemporaneousinthiscontextrefersto
the comparisonof the standardizeddailynetchange
(closingprice toclosingprice) inthe SPYto the
standardizeddailychange insentimentindicatorsas
reportedbyPsychSignal. Contemporaneous doesnot
referto intradaydatain thiscontext.
Figure 9 is a CorrelationMatrix useful forvisualizingthe
degree of correlationbetweenvariables. Blue values
indicate positive correlationsandredvaluesindicate
negative correlations. The size of the dotalsoreflects
the degree of correlation. Mostof what isseeninthis
graphicis inline withwhatone wouldexpectwiththis
data ie.Bullishindicatorsare positivelyrelatedtothe
SPY and BearishIndicatorsare negativelycorrelated.
However,the Bull Volumeseriesrevealsaslight
negative correlationtoSPYsuggestingthatas prices
increase,the volume of PositiveSentimentdecreases
slightly. Additionally,BullVolume andBearVolume
exhibitastrongpositivecorrelationratherthana
negative correlation.
Figure 9. Contemporaneous Correlation Matrix using Pearson
Method. Scale is to the right. Both size and color of circle reflect
degree of correlation.
PairedSample t-testsforCorrelationwere performedto
determine if thesecorrelationsweresignificant. The
followingtable reveals we mustrejectthe null
hypothesisthatr=0 and acceptthe alternate hypothesis
that there isa non-zerocorrelationbetweenthese
variable. All correlationssuggestall findingsare highly
significantwithextremelylowp-values.
The scatterplotmatrix inFigure 10 isfurtherhelpful in
visualizingthe relationshipsbetweenthe data. The red
line isa LoessRegressionSmoothingline usedtobetter
visualize the nature of the relationshipsineachplot.
The color coded rectangles are used to
designate specific plots of interest. The blue rectangle
highlights a very strong positive correlation between
Bear Volume andBull Volume. The greenbox highlights
the relationship between the Bullish and Bearish
intensity based indicators and the SPY revealing a
moderate positiveandnegativecorrelationrespectively
as would be expected. The two red boxes both show
the relationship between SPY and the PsychSignal
Volume indicators. Note the “V” or curve-linear
relationshipbetweenSentimentVolume and S&P Price.
Figure 11 and 12 below are helpful visualizing this
relationship in more depth.
Figure 11. Density Plot comparing SPY on the X-axis to Bull Volume
on the Y-axis. Notice V shaped pattern
Figure 13 on the following page provides perhaps the
best snapshot of the entire contemporaneous dataset.
It stratifies the SPY data into 4 equal quadrants (based
on frequency) and then displays the resulting
distributionforall the otherdatasetsthatcorrespondto
the data in each quadrant.
CorrelationwithSPY TwoTailedPValue DegreesofFreedom 95%ConfidenceInterval
Bullish 0.4095 2.20E-16 717 .3467<R<.4686
Bearish -0.3818 2.20E-16 717 -0.4425<R<-.3175
BullVolume -0.1636 0.00001044 717 -0.2339<R <-0.0915
BearVolume -0.4405 2.20E-16 717 -0.4976< R<-0.3796
Figure 10. ScatterPlot Matrix of Contemporaneous
Pairwise Relationships. Colored rectangles
highlight 3 primary areas of interest
Figure 12 Density Plot comparing SPY (Xaxis) to Bear Volume (Y
axis). Note that while the Bullish Volume tends to increase for both
large negative and positive SPY returns, Bearish Volume exhibits a
similar V shape but with far less densityin the positive return tails of
the SPY
Figure 13. Marginal Distribution Plot showing the distribution of
PsychSignal data corresponding to 4 equal sized quadrants of SPY
Data. Note the consistency in the order of the horizontal shifting of
the Bullish and Bearish Sentiment Indicators. Additionally, the
Volume Indicators are largely centered at zero for the interquartile
range of the SPY but differentiate at both tails. Bullish Volume
increases uniformly while Bearish Volume increases more for the
negative return tail.
Thisis reflectedinthe datatable belowwhichdisplays
the descriptive statisticsforBullishandBearishVolume
on positive returndaysandnegative returndays
revealingthatthe meanof Bull Volume onUpdays and
Downdays isnot significantlydifferentfromzerowhile
Bear Volume is.
Summary of Key Findings of Contemporaneous
Analysis
1. All foursentimentbasedindicatorsexhibit
statisticallysignificantnon-zerocorrelationsto
DailySPY Returns.
2. Bear Volume andBull Volumeindicators are
positivelycorrelated contrarytoexpectation
3. Bear Volume andBull VolumeIndicatorsare
stagnantfor dayswhenSPY volatilityislowand
bothincrease duringhighvolatilitydays
regardlessof the directionof the volatility
4. BullishVolume increasesmore uniformlyfor
bothpositive andnegative returndayswhile
Bear Volume increasesmuchmore fornegative
daysthan it doespositive days.
Predictive Analysis
The secondstage of my analysiswasfocusedon
exploringwhetherthe sentimentbaseddataprovided
by PsychSignal couldbe usedto forecastnextday
returnsof the SPY.
Lags of (t-1,t-2, t-3, andt-4) were usedtopredictthe
value of the target at time t inboth analyses. A brief
visual inspectionof the data,showninFigure 13 and
Figure 14 raisedimmediate doubtregardingthe
predictive powerof the indicators.
Figure 14 Correlation Matrix of all lagged series. Notice there is
almost no correlation present between the SPY and any of the lagged
sentiment indicators.
Figure 15. Marginal Distribution Plot of PsychSignal indicators at Lag
1 conditioned on the following day’s returns in the SPY. Notice the
lack of differentiation in the distributions compared to the
contemporaneous distributions in Figure 13
A multivariate OLSmodel wasthenappliedtothe
datasetusingall foursentimentindicatorsateachlag
individuallyandthenwithall 4 lagsat the same time.
The followingassumptionswere made:
 There isa linearrelationshipbetweenvariables
 Multivariate normality.
 Little multicollinearity.
 No auto-correlation.
 Homoscedasticity.
Figure 16. Results of the OLS Multivariate Regression Model
Onlytwolaggedseries,Lag1_BearishandLag4_Bearish
were foundtobe significant andthe model performed
poorlyoverall withanR-Squaredof .03
Additional regressionmodelswerethenapplied
iterativelyconsistingof only1lag at a time. In addition
to the laggedmodels,twomodelswere builtusing
forwardlagsi.e.I ran a regressionbackwardsintime
usingthe sentimentdataatforwardlags (t+1) and (t+2)
to predictSPYat time t. The reasoningforthiswasthat
if the sentimentdatacouldnotpredictSPY forwardin
time butcouldpredictbackwardsin time,itwould
suggestthatthe sentimentdataisreactionaryto price
movementratherthananticipatoryof it. The R-Squared
valuesof these modelsare shownbelow.
Figure 17. R-Squared of Lagged, Contemporaneous, and Reverse
Lagged Models.
As youcan see,all laggedmodelsproduce R-Squared
valuesof nearly0 while the contemporaneousmodels
and reverse laggedmodelsexhibit largerandlinearly
descendingR-Squaredvaluesindicatingthatthe
sentimentreflectedinthe dataisreactionaryrather
than anticipatory.
Summaryof Key Findings
1. All foursentiment basedindicatorsexhibit
statisticallysignificantnon-zero
contemporaneous correlationstoDailySPY
Returns.
2. Bear Volume andBull Volumeindicatorsare
positivelycorrelatedcontrarytoexpectation
3. Bear Volume andBull VolumeIndicatorsare
stagnantfor dayswhenSPY volatilityislowand
bothincrease duringhighvolatilitydays
regardlessof the directionof the volatility
4. BullishVolume increasesmore uniformlyfor
bothpositive andnegative returndayswhile
Bear Volume increasesmuchmore fornegative
returndays thanit does forpositive days.
5. Correlation betweenSPYdataandSentiment
Data drop off significantlyinlags1-5
SUMMARY OUTPUT Standardized
Regression Statistics
Multiple R 0.182102668
R Square 0.033161382
Adjusted R Square 0.011125174
Standard Error 0.998003864
Observations 719
ANOVA
df SS MS F Significance F
Regression 16 23.98171227 1.498857 1.504859 0.091512499
Residual 702 699.2002223 0.996012
Total 718 723.1819345
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -0.001966444 0.037222494 -0.05283 0.957883 -0.075047192 0.071114
Lag1_Bullish 0.020831277 0.056744276 0.367108 0.713649 -0.090577543 0.13224
Lag1_Bearish -0.1554315 0.068882939 -2.25646 0.024349 -0.290672751 -0.02019
Lag1_BullVolum -0.064300473 0.10332598 -0.62231 0.533942 -0.267165435 0.138564
Lag1_BearVolume 0.12832188 0.121951763 1.052235 0.293054 -0.111111996 0.367756
Lag2_Bullish -0.04651744 0.064697438 -0.719 0.47238 -0.173541091 0.080506
Lag2_Bearish -0.12377006 0.076165343 -1.62502 0.104608 -0.273309211 0.025769
Lag2_BullVolum 0.05869661 0.120948614 0.485302 0.627614 -0.178767734 0.296161
Lag2_BearVolume -0.045350618 0.145316264 -0.31208 0.755071 -0.330657163 0.239956
Lag3_Bullish -0.072203302 0.064672649 -1.11644 0.264615 -0.199178284 0.054772
Lag3_Bearish -0.143175415 0.075859529 -1.88738 0.059522 -0.292114147 0.005763
Lag3_BullVolum 0.013903224 0.121460004 0.114468 0.9089 -0.224565155 0.252372
Lag3_BearVolume 0.004262675 0.1460306 0.02919 0.976721 -0.28244636 0.290972
Lag4_Bullish -0.089134713 0.056103892 -1.58874 0.112568 -0.199286235 0.021017
Lag4_Bearish -0.226359518 0.067750585 -3.34107 0.000879 -0.359377564 -0.09334
Lag4_BullVolum 0.028900516 0.103969722 0.277971 0.781117 -0.175228335 0.233029
Lag4_BearVolume 0.08721789 0.12336881 0.706969 0.479821 -0.154998142 0.329434
6. OLS Multivariate Modelswere ineffectivein
predictingfuture returnsinthe SPYyetwere
marginallyeffective whenrun backwards,
suggestingthatsentimentdataisreactive
rather thananticipatory.
7. Basedon contemporaneousanalysis,there is
evidence tosuggestthatmarketagentsexhibit
strongertendenciestoreactto negative market
movementthanpositive marketmovement.
OngoingResearch
1. Applyingmore sophisticateddatamining
techniquestoanalyze the data
2. Explorationof intradaydata
3. Exploringdirectional predictionmodels.i.e.
“Up” or “Down”days
4. Developmentof aHeuristicbasedtrading
indicator

Más contenido relacionado

Was ist angesagt?

Flipside ba-block-chain-week
Flipside ba-block-chain-weekFlipside ba-block-chain-week
Flipside ba-block-chain-weekslandry
 
FQH Experimental Economics Final Paper
FQH Experimental Economics Final PaperFQH Experimental Economics Final Paper
FQH Experimental Economics Final PaperFaisal Haider
 
EGT and economics
EGT and economicsEGT and economics
EGT and economicsSSA KPI
 
Value investing and emerging markets
Value investing and emerging marketsValue investing and emerging markets
Value investing and emerging marketsNavneet Randhawa
 
The art of Forecast - Improving Forecasting accuracy
The art of Forecast - Improving Forecasting accuracyThe art of Forecast - Improving Forecasting accuracy
The art of Forecast - Improving Forecasting accuracyAndrea Terzaghi
 
Centre4 Testing Market Watch Autumn 2009
Centre4 Testing   Market Watch Autumn 2009Centre4 Testing   Market Watch Autumn 2009
Centre4 Testing Market Watch Autumn 2009ryanhannigan
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsMayank Johri
 
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12Proactive Advisor Magazine
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 

Was ist angesagt? (10)

Flipside ba-block-chain-week
Flipside ba-block-chain-weekFlipside ba-block-chain-week
Flipside ba-block-chain-week
 
FQH Experimental Economics Final Paper
FQH Experimental Economics Final PaperFQH Experimental Economics Final Paper
FQH Experimental Economics Final Paper
 
EGT and economics
EGT and economicsEGT and economics
EGT and economics
 
Value investing and emerging markets
Value investing and emerging marketsValue investing and emerging markets
Value investing and emerging markets
 
The art of Forecast - Improving Forecasting accuracy
The art of Forecast - Improving Forecasting accuracyThe art of Forecast - Improving Forecasting accuracy
The art of Forecast - Improving Forecasting accuracy
 
Surstromming
Surstromming Surstromming
Surstromming
 
Centre4 Testing Market Watch Autumn 2009
Centre4 Testing   Market Watch Autumn 2009Centre4 Testing   Market Watch Autumn 2009
Centre4 Testing Market Watch Autumn 2009
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule Thresholds
 
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 

Ähnlich wie PsychSignal_IntroductoryAnalysis-2

The Contribution Of Drug Abuse On Domestic Violence Essay
The Contribution Of Drug Abuse On Domestic Violence EssayThe Contribution Of Drug Abuse On Domestic Violence Essay
The Contribution Of Drug Abuse On Domestic Violence EssayErica Baldwin
 
Relationship Between Expectations And Price Movements Essay
Relationship Between Expectations And Price Movements EssayRelationship Between Expectations And Price Movements Essay
Relationship Between Expectations And Price Movements EssayKara Liu
 
Multiple Regression Model Essay
Multiple Regression Model EssayMultiple Regression Model Essay
Multiple Regression Model EssayMaria Padilla
 
The Dataset Diabetes Details From Efron Et Al
The Dataset Diabetes Details From Efron Et AlThe Dataset Diabetes Details From Efron Et Al
The Dataset Diabetes Details From Efron Et AlJenny Hill
 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics ProjectAndrea Turner
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data SetSarah Jimenez
 
ACTIVITY-SHEET-WEEK-4-DAY-1.docx
ACTIVITY-SHEET-WEEK-4-DAY-1.docxACTIVITY-SHEET-WEEK-4-DAY-1.docx
ACTIVITY-SHEET-WEEK-4-DAY-1.docxJOANGUERRERO16
 
Aj Davis Department Stores
Aj Davis Department StoresAj Davis Department Stores
Aj Davis Department StoresDawn Robertson
 
Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlationRashid Hussain
 
Basic Statistics Essay Examples
Basic Statistics Essay ExamplesBasic Statistics Essay Examples
Basic Statistics Essay ExamplesHelp Paper UK
 
Statistics orientation
Statistics orientationStatistics orientation
Statistics orientationdarrincoe
 
raprap-opaw.pptx
raprap-opaw.pptxraprap-opaw.pptx
raprap-opaw.pptxLouieCase
 
Casual modelling in sociology carmine gelormini
Casual modelling in sociology   carmine gelorminiCasual modelling in sociology   carmine gelormini
Casual modelling in sociology carmine gelorminiCarmineGelormini
 
Statistics Is The Science Of Collecting Data And Analyzing It
Statistics Is The Science Of Collecting Data And Analyzing ItStatistics Is The Science Of Collecting Data And Analyzing It
Statistics Is The Science Of Collecting Data And Analyzing ItClaudia Brown
 
Completely Randomized Factorial Anova
Completely Randomized Factorial AnovaCompletely Randomized Factorial Anova
Completely Randomized Factorial AnovaDotha Keller
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdfAadarshSah1
 

Ähnlich wie PsychSignal_IntroductoryAnalysis-2 (20)

The Contribution Of Drug Abuse On Domestic Violence Essay
The Contribution Of Drug Abuse On Domestic Violence EssayThe Contribution Of Drug Abuse On Domestic Violence Essay
The Contribution Of Drug Abuse On Domestic Violence Essay
 
Relationship Between Expectations And Price Movements Essay
Relationship Between Expectations And Price Movements EssayRelationship Between Expectations And Price Movements Essay
Relationship Between Expectations And Price Movements Essay
 
Multiple Regression Model Essay
Multiple Regression Model EssayMultiple Regression Model Essay
Multiple Regression Model Essay
 
The Dataset Diabetes Details From Efron Et Al
The Dataset Diabetes Details From Efron Et AlThe Dataset Diabetes Details From Efron Et Al
The Dataset Diabetes Details From Efron Et Al
 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics Project
 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics Project
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
ACTIVITY-SHEET-WEEK-4-DAY-1.docx
ACTIVITY-SHEET-WEEK-4-DAY-1.docxACTIVITY-SHEET-WEEK-4-DAY-1.docx
ACTIVITY-SHEET-WEEK-4-DAY-1.docx
 
Aj Davis Department Stores
Aj Davis Department StoresAj Davis Department Stores
Aj Davis Department Stores
 
Covariance and correlation
Covariance and correlationCovariance and correlation
Covariance and correlation
 
Basic Statistics Essay Examples
Basic Statistics Essay ExamplesBasic Statistics Essay Examples
Basic Statistics Essay Examples
 
Statistics orientation
Statistics orientationStatistics orientation
Statistics orientation
 
raprap-opaw.pptx
raprap-opaw.pptxraprap-opaw.pptx
raprap-opaw.pptx
 
Casual modelling in sociology carmine gelormini
Casual modelling in sociology   carmine gelorminiCasual modelling in sociology   carmine gelormini
Casual modelling in sociology carmine gelormini
 
Statistics Is The Science Of Collecting Data And Analyzing It
Statistics Is The Science Of Collecting Data And Analyzing ItStatistics Is The Science Of Collecting Data And Analyzing It
Statistics Is The Science Of Collecting Data And Analyzing It
 
Statistics an introduction (1)
Statistics  an introduction (1)Statistics  an introduction (1)
Statistics an introduction (1)
 
Basic concept of statistics
Basic concept of statisticsBasic concept of statistics
Basic concept of statistics
 
Completely Randomized Factorial Anova
Completely Randomized Factorial AnovaCompletely Randomized Factorial Anova
Completely Randomized Factorial Anova
 
Sampling Distribution
Sampling DistributionSampling Distribution
Sampling Distribution
 
Correlation and Regression.pdf
Correlation and Regression.pdfCorrelation and Regression.pdf
Correlation and Regression.pdf
 

PsychSignal_IntroductoryAnalysis-2

  • 1. "Youknow,I've been dealingwith these big mathematical modelsfor forecastingthe economy...ifI couldfigure outa wayto determine whether or not people are more fearful,or changingto more euphoric, and havea thirdway of figuringoutwhichof the two thingsare working, I don'tneed any ofthis other stuff. I couldforecast the economy better thanany way I know how" AlanGreenspan,November 2007 Abstract This quote fromformer Fed Chairmanalludes to a theoretical understandingofmarket dynamics proposedby Behavioral Economistssuggesting that “psychological,social,cognitive,andemotional factors influencethe economicdecisionsof individualsand institutionshaveconsequencesfor market prices, returns, and resource allocation”( Wikipedia). Criticsof Behavioral Economics cite the EfficientMarket Hypothesis statingthat “itis impossibleto"beat the market" because stock market efficiencycauses existingshareprices to always incorporateand reflect all relevant information. Accordingto the EMH, stocks alwaystrade at their fair valueon stock exchanges,makingit impossiblefor investorsto either purchaseundervaluedstocks or sell stocks for inflatedprices.As such, it shouldbe impossibletooutperformthe overall market through expert stock selection or market timing,and that the onlyway an investor canpossiblyobtainhigherreturns is by purchasingriskierinvestments” The debate between proponentsof Behavioral Economicsandthe EfficientMarket Theory spansa vast range oftopics that are far beyondthe scope of this paper. Whatfollowsrather isan introductory explorationofwhether or not sentimentdata— extracted from Twitter, Stock Twits,and market focusedchat rooms—canbe used to predict dailystock market returns. Analysiswasperformed on3 years of a sentiment data from 2012-2015 providedby PsychSignal© forresearchpurposes. The dataconsistsof two sentimentbased time series indicatorsderived froma proprietary sentiment miningprocessdeveloped by PsychSignal. TheSpider S&P500 ETF (SPY) was used as the target variable for the forecastinganalysis. Both contemporaneousandlagged relationshipsbetweenthe SPYand PsychSignal indicators were explored as well as relationships between the SPYand two compositeindicators derived from manipulationsofthe raw PsychSignal data. Finally,MultivariateOLSRegressionmodels were appliedto the data to determine whether PsychSignal’ssentimentbasedindicatorscouldbe used to predict next day returns in the Spider S&P500 ETF Data The data for thisanalysiswasobtainedfrom Quandl.comandPsychSignal’swebsite. Accordingto PsychSignal’sWebsite: • We measure twotypesof sentiment.Bullish and Bearish.Theymeasure veryspecific financial language andnotsimplypositiveand negative. • Bullishsentimentcanbe interpretedasbuyers or those whofavorthe buyside. • Bearishsentimentcanbe interpretedassellers or those whofavorthe sell side. • Our sentimentismeasuredona0 - 4 scale.0 beingthe lowest4beingthe highest.We sometimesconvertthe 0-4 scale to %. • The 0 - 4 sentimentscale measuresboththe intensity of the sentimentexpressedby individualsbutalsothe collective volumeof that sentimentexpressedoverthe crowdmood. A clarificationwasmade viaanemail exchange with PsychSignal’sfounderconfirmingthatintensity refersto the degree of sentimentpresentinastatement. For example, compare the followingstatements: 1. The SPY is reallystrongtoday 2. The SPY doesn’tlook like abadbuy here The firststatementwouldreceiveahigherintensity rating. The sentimentdataisthenaggregatedinto4 indicators and are chartedinFigures1 and 2 alongside the SPY: 1. BearishSentiment (Intensity) 2. BullishSentiment (Intensity) 3. BearishVolume (Quantity) 4. BullishVolume (Quantity)
  • 2. Figure 1 SPY vs. Bullish and Bearish Sentiment (weekend sentiment data removed) Figure 2 SPY vs. Bullish and Bearish Volume (weekend sentiment data removed) Data Challenges Upon visual inspectionof the original Psychsignal data setsshownbelowinFigure 3, it appearedasif there were 3 distincttrendsinthe data that seemedtobe the resultof some systematicprocessassociatedwiththe waythe data wascollected. Figure 3 Original PsychSignal Dataset revealing 3 systematic trends in the data prior to 2012 and after 2015 You can see that priorto 2012 there isconsistently extreme variance inthe dataset. Additionally,there appearsto be a systematicjumpinbothseries subsequentto2015. Neitherphenomenoncoincided witha similarinthe observedSPYdata. Therefore,it was assumedtobe the resultof a systematicprocess associatedwiththe datacollection processperformed by PsychSignal. Forthat reason,analysiswas confined to a 3 yearperiodbetween2012 and 2015. There are over700 data pointsinthistime periodwhichshouldbe sufficientforthisanalysis. Secondly,the original PsychSignal dataincluded sentimentdatacollectedduringthe weekend. A zoomedininspectionof the 10 daycorrelationbetween the PsychSignal indicatorsshownbelowreveala consistentseasonal decline incorrelation withthe SPY that coincideswiththe weekendsentimentdata. Figure 4. Rolling 10 day correlation between SPY and Bullish/Bearish indicators. Cyclical pattern coincides with weekend sentiment measurement. A choice wasmade to remove the weekendsentiment data basedon the assumptionthatmarketparticipants discussingthe marketsduringthe weekendwere likely ruminatingovercarriedtradingpositionswhichthey were unsure of potentiallyresultinginbias. Data Transformations A visual inspectionof the datarevealscleartrendsin the meanand variance of each series. Eachserieswas differencedto induce trendstationaryandthen standardizedtoz scorescenteredaroundthe mean to minimize the impactbetweendifferentscalesof measurement. The resultingdistributionsare shown below. Figure 5. Distribution of data after differencing and standardization It isnecessarytopointout that bothdailyand annualizedreturnsin S&P500 are platykurtic,witha large concentrationof valuescenteredatthe meanand largertailsthan wouldbe expectedinanormal distribution. The SPYdata usedforthisanalysisis consistentwiththishistorical tendencyasshownbelow.
  • 3. Figure 6. Platykurtic Distribution of Daily SPY Returns between 2012 and 2015 It appearsas if thisphenomenonisconsistentin the PsychSignal dataaswell. The figure belowshowsan overlayof all of the distributionsusedinthisanalysis. Figure 7. Overlay of data distributions One of the assumptionsthatwill be made throughout the followinganalysisisthatviolatingthe assumptionof normalityinthese datawill have anegligibleimpacton resultsdue tothe similarity inshape of eachdataset’s distribution. Additionally, itwill be assumedthatthe data provided by PsychSignal isanaccurate reflectionof the true sentimentexpressedbymarketparticipantsregardless of the potential impactonmarket prices. Inother words,the data reflectsthe true feelingsexpressedby tradersevenif those feelingscanorcannot predict marketmovement. Figure 8. A visual inspection of the chart above reveals a consistent visual relationship between the PsychSignal index and survey based data collected in the weeklyNAAIM and AAII Sentiment Surveys All otherassumptions regardingmodelingwill be discussedin latersections. ContemporaneousAnalysis Afterthe necessarydatacleaningandtransformation, analysiswasperformedtodetermine if there were any contemporaneousrelationshipsevidentinthe data. For clarification,contemporaneousinthiscontextrefersto the comparisonof the standardizeddailynetchange (closingprice toclosingprice) inthe SPYto the standardizeddailychange insentimentindicatorsas reportedbyPsychSignal. Contemporaneous doesnot referto intradaydatain thiscontext. Figure 9 is a CorrelationMatrix useful forvisualizingthe degree of correlationbetweenvariables. Blue values indicate positive correlationsandredvaluesindicate negative correlations. The size of the dotalsoreflects the degree of correlation. Mostof what isseeninthis graphicis inline withwhatone wouldexpectwiththis data ie.Bullishindicatorsare positivelyrelatedtothe SPY and BearishIndicatorsare negativelycorrelated. However,the Bull Volumeseriesrevealsaslight negative correlationtoSPYsuggestingthatas prices increase,the volume of PositiveSentimentdecreases slightly. Additionally,BullVolume andBearVolume exhibitastrongpositivecorrelationratherthana negative correlation.
  • 4. Figure 9. Contemporaneous Correlation Matrix using Pearson Method. Scale is to the right. Both size and color of circle reflect degree of correlation. PairedSample t-testsforCorrelationwere performedto determine if thesecorrelationsweresignificant. The followingtable reveals we mustrejectthe null hypothesisthatr=0 and acceptthe alternate hypothesis that there isa non-zerocorrelationbetweenthese variable. All correlationssuggestall findingsare highly significantwithextremelylowp-values. The scatterplotmatrix inFigure 10 isfurtherhelpful in visualizingthe relationshipsbetweenthe data. The red line isa LoessRegressionSmoothingline usedtobetter visualize the nature of the relationshipsineachplot. The color coded rectangles are used to designate specific plots of interest. The blue rectangle highlights a very strong positive correlation between Bear Volume andBull Volume. The greenbox highlights the relationship between the Bullish and Bearish intensity based indicators and the SPY revealing a moderate positiveandnegativecorrelationrespectively as would be expected. The two red boxes both show the relationship between SPY and the PsychSignal Volume indicators. Note the “V” or curve-linear relationshipbetweenSentimentVolume and S&P Price. Figure 11 and 12 below are helpful visualizing this relationship in more depth. Figure 11. Density Plot comparing SPY on the X-axis to Bull Volume on the Y-axis. Notice V shaped pattern Figure 13 on the following page provides perhaps the best snapshot of the entire contemporaneous dataset. It stratifies the SPY data into 4 equal quadrants (based on frequency) and then displays the resulting distributionforall the otherdatasetsthatcorrespondto the data in each quadrant. CorrelationwithSPY TwoTailedPValue DegreesofFreedom 95%ConfidenceInterval Bullish 0.4095 2.20E-16 717 .3467<R<.4686 Bearish -0.3818 2.20E-16 717 -0.4425<R<-.3175 BullVolume -0.1636 0.00001044 717 -0.2339<R <-0.0915 BearVolume -0.4405 2.20E-16 717 -0.4976< R<-0.3796 Figure 10. ScatterPlot Matrix of Contemporaneous Pairwise Relationships. Colored rectangles highlight 3 primary areas of interest Figure 12 Density Plot comparing SPY (Xaxis) to Bear Volume (Y axis). Note that while the Bullish Volume tends to increase for both large negative and positive SPY returns, Bearish Volume exhibits a similar V shape but with far less densityin the positive return tails of the SPY
  • 5. Figure 13. Marginal Distribution Plot showing the distribution of PsychSignal data corresponding to 4 equal sized quadrants of SPY Data. Note the consistency in the order of the horizontal shifting of the Bullish and Bearish Sentiment Indicators. Additionally, the Volume Indicators are largely centered at zero for the interquartile range of the SPY but differentiate at both tails. Bullish Volume increases uniformly while Bearish Volume increases more for the negative return tail. Thisis reflectedinthe datatable belowwhichdisplays the descriptive statisticsforBullishandBearishVolume on positive returndaysandnegative returndays revealingthatthe meanof Bull Volume onUpdays and Downdays isnot significantlydifferentfromzerowhile Bear Volume is. Summary of Key Findings of Contemporaneous Analysis 1. All foursentimentbasedindicatorsexhibit statisticallysignificantnon-zerocorrelationsto DailySPY Returns. 2. Bear Volume andBull Volumeindicators are positivelycorrelated contrarytoexpectation 3. Bear Volume andBull VolumeIndicatorsare stagnantfor dayswhenSPY volatilityislowand bothincrease duringhighvolatilitydays regardlessof the directionof the volatility 4. BullishVolume increasesmore uniformlyfor bothpositive andnegative returndayswhile Bear Volume increasesmuchmore fornegative daysthan it doespositive days. Predictive Analysis The secondstage of my analysiswasfocusedon exploringwhetherthe sentimentbaseddataprovided by PsychSignal couldbe usedto forecastnextday returnsof the SPY. Lags of (t-1,t-2, t-3, andt-4) were usedtopredictthe value of the target at time t inboth analyses. A brief visual inspectionof the data,showninFigure 13 and Figure 14 raisedimmediate doubtregardingthe predictive powerof the indicators. Figure 14 Correlation Matrix of all lagged series. Notice there is almost no correlation present between the SPY and any of the lagged sentiment indicators.
  • 6. Figure 15. Marginal Distribution Plot of PsychSignal indicators at Lag 1 conditioned on the following day’s returns in the SPY. Notice the lack of differentiation in the distributions compared to the contemporaneous distributions in Figure 13 A multivariate OLSmodel wasthenappliedtothe datasetusingall foursentimentindicatorsateachlag individuallyandthenwithall 4 lagsat the same time. The followingassumptionswere made:  There isa linearrelationshipbetweenvariables  Multivariate normality.  Little multicollinearity.  No auto-correlation.  Homoscedasticity. Figure 16. Results of the OLS Multivariate Regression Model Onlytwolaggedseries,Lag1_BearishandLag4_Bearish were foundtobe significant andthe model performed poorlyoverall withanR-Squaredof .03 Additional regressionmodelswerethenapplied iterativelyconsistingof only1lag at a time. In addition to the laggedmodels,twomodelswere builtusing forwardlagsi.e.I ran a regressionbackwardsintime usingthe sentimentdataatforwardlags (t+1) and (t+2) to predictSPYat time t. The reasoningforthiswasthat if the sentimentdatacouldnotpredictSPY forwardin time butcouldpredictbackwardsin time,itwould suggestthatthe sentimentdataisreactionaryto price movementratherthananticipatoryof it. The R-Squared valuesof these modelsare shownbelow. Figure 17. R-Squared of Lagged, Contemporaneous, and Reverse Lagged Models. As youcan see,all laggedmodelsproduce R-Squared valuesof nearly0 while the contemporaneousmodels and reverse laggedmodelsexhibit largerandlinearly descendingR-Squaredvaluesindicatingthatthe sentimentreflectedinthe dataisreactionaryrather than anticipatory. Summaryof Key Findings 1. All foursentiment basedindicatorsexhibit statisticallysignificantnon-zero contemporaneous correlationstoDailySPY Returns. 2. Bear Volume andBull Volumeindicatorsare positivelycorrelatedcontrarytoexpectation 3. Bear Volume andBull VolumeIndicatorsare stagnantfor dayswhenSPY volatilityislowand bothincrease duringhighvolatilitydays regardlessof the directionof the volatility 4. BullishVolume increasesmore uniformlyfor bothpositive andnegative returndayswhile Bear Volume increasesmuchmore fornegative returndays thanit does forpositive days. 5. Correlation betweenSPYdataandSentiment Data drop off significantlyinlags1-5 SUMMARY OUTPUT Standardized Regression Statistics Multiple R 0.182102668 R Square 0.033161382 Adjusted R Square 0.011125174 Standard Error 0.998003864 Observations 719 ANOVA df SS MS F Significance F Regression 16 23.98171227 1.498857 1.504859 0.091512499 Residual 702 699.2002223 0.996012 Total 718 723.1819345 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -0.001966444 0.037222494 -0.05283 0.957883 -0.075047192 0.071114 Lag1_Bullish 0.020831277 0.056744276 0.367108 0.713649 -0.090577543 0.13224 Lag1_Bearish -0.1554315 0.068882939 -2.25646 0.024349 -0.290672751 -0.02019 Lag1_BullVolum -0.064300473 0.10332598 -0.62231 0.533942 -0.267165435 0.138564 Lag1_BearVolume 0.12832188 0.121951763 1.052235 0.293054 -0.111111996 0.367756 Lag2_Bullish -0.04651744 0.064697438 -0.719 0.47238 -0.173541091 0.080506 Lag2_Bearish -0.12377006 0.076165343 -1.62502 0.104608 -0.273309211 0.025769 Lag2_BullVolum 0.05869661 0.120948614 0.485302 0.627614 -0.178767734 0.296161 Lag2_BearVolume -0.045350618 0.145316264 -0.31208 0.755071 -0.330657163 0.239956 Lag3_Bullish -0.072203302 0.064672649 -1.11644 0.264615 -0.199178284 0.054772 Lag3_Bearish -0.143175415 0.075859529 -1.88738 0.059522 -0.292114147 0.005763 Lag3_BullVolum 0.013903224 0.121460004 0.114468 0.9089 -0.224565155 0.252372 Lag3_BearVolume 0.004262675 0.1460306 0.02919 0.976721 -0.28244636 0.290972 Lag4_Bullish -0.089134713 0.056103892 -1.58874 0.112568 -0.199286235 0.021017 Lag4_Bearish -0.226359518 0.067750585 -3.34107 0.000879 -0.359377564 -0.09334 Lag4_BullVolum 0.028900516 0.103969722 0.277971 0.781117 -0.175228335 0.233029 Lag4_BearVolume 0.08721789 0.12336881 0.706969 0.479821 -0.154998142 0.329434
  • 7. 6. OLS Multivariate Modelswere ineffectivein predictingfuture returnsinthe SPYyetwere marginallyeffective whenrun backwards, suggestingthatsentimentdataisreactive rather thananticipatory. 7. Basedon contemporaneousanalysis,there is evidence tosuggestthatmarketagentsexhibit strongertendenciestoreactto negative market movementthanpositive marketmovement. OngoingResearch 1. Applyingmore sophisticateddatamining techniquestoanalyze the data 2. Explorationof intradaydata 3. Exploringdirectional predictionmodels.i.e. “Up” or “Down”days 4. Developmentof aHeuristicbasedtrading indicator