2. Figure 1 SPY vs. Bullish and Bearish Sentiment (weekend sentiment
data removed)
Figure 2 SPY vs. Bullish and Bearish Volume (weekend sentiment
data removed)
Data Challenges
Upon visual inspectionof the original Psychsignal data
setsshownbelowinFigure 3, it appearedasif there
were 3 distincttrendsinthe data that seemedtobe the
resultof some systematicprocessassociatedwiththe
waythe data wascollected.
Figure 3 Original PsychSignal Dataset revealing 3 systematic trends
in the data prior to 2012 and after 2015
You can see that priorto 2012 there isconsistently
extreme variance inthe dataset. Additionally,there
appearsto be a systematicjumpinbothseries
subsequentto2015. Neitherphenomenoncoincided
witha similarinthe observedSPYdata. Therefore,it
was assumedtobe the resultof a systematicprocess
associatedwiththe datacollection processperformed
by PsychSignal. Forthat reason,analysiswas confined
to a 3 yearperiodbetween2012 and 2015. There are
over700 data pointsinthistime periodwhichshouldbe
sufficientforthisanalysis.
Secondly,the original PsychSignal dataincluded
sentimentdatacollectedduringthe weekend. A
zoomedininspectionof the 10 daycorrelationbetween
the PsychSignal indicatorsshownbelowreveala
consistentseasonal decline incorrelation withthe SPY
that coincideswiththe weekendsentimentdata.
Figure 4. Rolling 10 day correlation between SPY and Bullish/Bearish
indicators. Cyclical pattern coincides with weekend sentiment
measurement.
A choice wasmade to remove the weekendsentiment
data basedon the assumptionthatmarketparticipants
discussingthe marketsduringthe weekendwere likely
ruminatingovercarriedtradingpositionswhichthey
were unsure of potentiallyresultinginbias.
Data Transformations
A visual inspectionof the datarevealscleartrendsin
the meanand variance of each series. Eachserieswas
differencedto induce trendstationaryandthen
standardizedtoz scorescenteredaroundthe mean to
minimize the impactbetweendifferentscalesof
measurement. The resultingdistributionsare shown
below.
Figure 5. Distribution of data after differencing and standardization
It isnecessarytopointout that bothdailyand
annualizedreturnsin S&P500 are platykurtic,witha
large concentrationof valuescenteredatthe meanand
largertailsthan wouldbe expectedinanormal
distribution. The SPYdata usedforthisanalysisis
consistentwiththishistorical tendencyasshownbelow.
3. Figure 6. Platykurtic Distribution of Daily SPY Returns between 2012
and 2015
It appearsas if thisphenomenonisconsistentin the
PsychSignal dataaswell. The figure belowshowsan
overlayof all of the distributionsusedinthisanalysis.
Figure 7. Overlay of data distributions
One of the assumptionsthatwill be made throughout
the followinganalysisisthatviolatingthe assumptionof
normalityinthese datawill have anegligibleimpacton
resultsdue tothe similarity inshape of eachdataset’s
distribution.
Additionally, itwill be assumedthatthe data provided
by PsychSignal isanaccurate reflectionof the true
sentimentexpressedbymarketparticipantsregardless
of the potential impactonmarket prices. Inother
words,the data reflectsthe true feelingsexpressedby
tradersevenif those feelingscanorcannot predict
marketmovement.
Figure 8. A visual inspection of the chart above reveals a consistent
visual relationship between the PsychSignal index and survey based
data collected in the weeklyNAAIM and AAII Sentiment Surveys
All otherassumptions regardingmodelingwill be
discussedin latersections.
ContemporaneousAnalysis
Afterthe necessarydatacleaningandtransformation,
analysiswasperformedtodetermine if there were any
contemporaneousrelationshipsevidentinthe data. For
clarification,contemporaneousinthiscontextrefersto
the comparisonof the standardizeddailynetchange
(closingprice toclosingprice) inthe SPYto the
standardizeddailychange insentimentindicatorsas
reportedbyPsychSignal. Contemporaneous doesnot
referto intradaydatain thiscontext.
Figure 9 is a CorrelationMatrix useful forvisualizingthe
degree of correlationbetweenvariables. Blue values
indicate positive correlationsandredvaluesindicate
negative correlations. The size of the dotalsoreflects
the degree of correlation. Mostof what isseeninthis
graphicis inline withwhatone wouldexpectwiththis
data ie.Bullishindicatorsare positivelyrelatedtothe
SPY and BearishIndicatorsare negativelycorrelated.
However,the Bull Volumeseriesrevealsaslight
negative correlationtoSPYsuggestingthatas prices
increase,the volume of PositiveSentimentdecreases
slightly. Additionally,BullVolume andBearVolume
exhibitastrongpositivecorrelationratherthana
negative correlation.
4. Figure 9. Contemporaneous Correlation Matrix using Pearson
Method. Scale is to the right. Both size and color of circle reflect
degree of correlation.
PairedSample t-testsforCorrelationwere performedto
determine if thesecorrelationsweresignificant. The
followingtable reveals we mustrejectthe null
hypothesisthatr=0 and acceptthe alternate hypothesis
that there isa non-zerocorrelationbetweenthese
variable. All correlationssuggestall findingsare highly
significantwithextremelylowp-values.
The scatterplotmatrix inFigure 10 isfurtherhelpful in
visualizingthe relationshipsbetweenthe data. The red
line isa LoessRegressionSmoothingline usedtobetter
visualize the nature of the relationshipsineachplot.
The color coded rectangles are used to
designate specific plots of interest. The blue rectangle
highlights a very strong positive correlation between
Bear Volume andBull Volume. The greenbox highlights
the relationship between the Bullish and Bearish
intensity based indicators and the SPY revealing a
moderate positiveandnegativecorrelationrespectively
as would be expected. The two red boxes both show
the relationship between SPY and the PsychSignal
Volume indicators. Note the “V” or curve-linear
relationshipbetweenSentimentVolume and S&P Price.
Figure 11 and 12 below are helpful visualizing this
relationship in more depth.
Figure 11. Density Plot comparing SPY on the X-axis to Bull Volume
on the Y-axis. Notice V shaped pattern
Figure 13 on the following page provides perhaps the
best snapshot of the entire contemporaneous dataset.
It stratifies the SPY data into 4 equal quadrants (based
on frequency) and then displays the resulting
distributionforall the otherdatasetsthatcorrespondto
the data in each quadrant.
CorrelationwithSPY TwoTailedPValue DegreesofFreedom 95%ConfidenceInterval
Bullish 0.4095 2.20E-16 717 .3467<R<.4686
Bearish -0.3818 2.20E-16 717 -0.4425<R<-.3175
BullVolume -0.1636 0.00001044 717 -0.2339<R <-0.0915
BearVolume -0.4405 2.20E-16 717 -0.4976< R<-0.3796
Figure 10. ScatterPlot Matrix of Contemporaneous
Pairwise Relationships. Colored rectangles
highlight 3 primary areas of interest
Figure 12 Density Plot comparing SPY (Xaxis) to Bear Volume (Y
axis). Note that while the Bullish Volume tends to increase for both
large negative and positive SPY returns, Bearish Volume exhibits a
similar V shape but with far less densityin the positive return tails of
the SPY
5. Figure 13. Marginal Distribution Plot showing the distribution of
PsychSignal data corresponding to 4 equal sized quadrants of SPY
Data. Note the consistency in the order of the horizontal shifting of
the Bullish and Bearish Sentiment Indicators. Additionally, the
Volume Indicators are largely centered at zero for the interquartile
range of the SPY but differentiate at both tails. Bullish Volume
increases uniformly while Bearish Volume increases more for the
negative return tail.
Thisis reflectedinthe datatable belowwhichdisplays
the descriptive statisticsforBullishandBearishVolume
on positive returndaysandnegative returndays
revealingthatthe meanof Bull Volume onUpdays and
Downdays isnot significantlydifferentfromzerowhile
Bear Volume is.
Summary of Key Findings of Contemporaneous
Analysis
1. All foursentimentbasedindicatorsexhibit
statisticallysignificantnon-zerocorrelationsto
DailySPY Returns.
2. Bear Volume andBull Volumeindicators are
positivelycorrelated contrarytoexpectation
3. Bear Volume andBull VolumeIndicatorsare
stagnantfor dayswhenSPY volatilityislowand
bothincrease duringhighvolatilitydays
regardlessof the directionof the volatility
4. BullishVolume increasesmore uniformlyfor
bothpositive andnegative returndayswhile
Bear Volume increasesmuchmore fornegative
daysthan it doespositive days.
Predictive Analysis
The secondstage of my analysiswasfocusedon
exploringwhetherthe sentimentbaseddataprovided
by PsychSignal couldbe usedto forecastnextday
returnsof the SPY.
Lags of (t-1,t-2, t-3, andt-4) were usedtopredictthe
value of the target at time t inboth analyses. A brief
visual inspectionof the data,showninFigure 13 and
Figure 14 raisedimmediate doubtregardingthe
predictive powerof the indicators.
Figure 14 Correlation Matrix of all lagged series. Notice there is
almost no correlation present between the SPY and any of the lagged
sentiment indicators.
6. Figure 15. Marginal Distribution Plot of PsychSignal indicators at Lag
1 conditioned on the following day’s returns in the SPY. Notice the
lack of differentiation in the distributions compared to the
contemporaneous distributions in Figure 13
A multivariate OLSmodel wasthenappliedtothe
datasetusingall foursentimentindicatorsateachlag
individuallyandthenwithall 4 lagsat the same time.
The followingassumptionswere made:
There isa linearrelationshipbetweenvariables
Multivariate normality.
Little multicollinearity.
No auto-correlation.
Homoscedasticity.
Figure 16. Results of the OLS Multivariate Regression Model
Onlytwolaggedseries,Lag1_BearishandLag4_Bearish
were foundtobe significant andthe model performed
poorlyoverall withanR-Squaredof .03
Additional regressionmodelswerethenapplied
iterativelyconsistingof only1lag at a time. In addition
to the laggedmodels,twomodelswere builtusing
forwardlagsi.e.I ran a regressionbackwardsintime
usingthe sentimentdataatforwardlags (t+1) and (t+2)
to predictSPYat time t. The reasoningforthiswasthat
if the sentimentdatacouldnotpredictSPY forwardin
time butcouldpredictbackwardsin time,itwould
suggestthatthe sentimentdataisreactionaryto price
movementratherthananticipatoryof it. The R-Squared
valuesof these modelsare shownbelow.
Figure 17. R-Squared of Lagged, Contemporaneous, and Reverse
Lagged Models.
As youcan see,all laggedmodelsproduce R-Squared
valuesof nearly0 while the contemporaneousmodels
and reverse laggedmodelsexhibit largerandlinearly
descendingR-Squaredvaluesindicatingthatthe
sentimentreflectedinthe dataisreactionaryrather
than anticipatory.
Summaryof Key Findings
1. All foursentiment basedindicatorsexhibit
statisticallysignificantnon-zero
contemporaneous correlationstoDailySPY
Returns.
2. Bear Volume andBull Volumeindicatorsare
positivelycorrelatedcontrarytoexpectation
3. Bear Volume andBull VolumeIndicatorsare
stagnantfor dayswhenSPY volatilityislowand
bothincrease duringhighvolatilitydays
regardlessof the directionof the volatility
4. BullishVolume increasesmore uniformlyfor
bothpositive andnegative returndayswhile
Bear Volume increasesmuchmore fornegative
returndays thanit does forpositive days.
5. Correlation betweenSPYdataandSentiment
Data drop off significantlyinlags1-5
SUMMARY OUTPUT Standardized
Regression Statistics
Multiple R 0.182102668
R Square 0.033161382
Adjusted R Square 0.011125174
Standard Error 0.998003864
Observations 719
ANOVA
df SS MS F Significance F
Regression 16 23.98171227 1.498857 1.504859 0.091512499
Residual 702 699.2002223 0.996012
Total 718 723.1819345
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -0.001966444 0.037222494 -0.05283 0.957883 -0.075047192 0.071114
Lag1_Bullish 0.020831277 0.056744276 0.367108 0.713649 -0.090577543 0.13224
Lag1_Bearish -0.1554315 0.068882939 -2.25646 0.024349 -0.290672751 -0.02019
Lag1_BullVolum -0.064300473 0.10332598 -0.62231 0.533942 -0.267165435 0.138564
Lag1_BearVolume 0.12832188 0.121951763 1.052235 0.293054 -0.111111996 0.367756
Lag2_Bullish -0.04651744 0.064697438 -0.719 0.47238 -0.173541091 0.080506
Lag2_Bearish -0.12377006 0.076165343 -1.62502 0.104608 -0.273309211 0.025769
Lag2_BullVolum 0.05869661 0.120948614 0.485302 0.627614 -0.178767734 0.296161
Lag2_BearVolume -0.045350618 0.145316264 -0.31208 0.755071 -0.330657163 0.239956
Lag3_Bullish -0.072203302 0.064672649 -1.11644 0.264615 -0.199178284 0.054772
Lag3_Bearish -0.143175415 0.075859529 -1.88738 0.059522 -0.292114147 0.005763
Lag3_BullVolum 0.013903224 0.121460004 0.114468 0.9089 -0.224565155 0.252372
Lag3_BearVolume 0.004262675 0.1460306 0.02919 0.976721 -0.28244636 0.290972
Lag4_Bullish -0.089134713 0.056103892 -1.58874 0.112568 -0.199286235 0.021017
Lag4_Bearish -0.226359518 0.067750585 -3.34107 0.000879 -0.359377564 -0.09334
Lag4_BullVolum 0.028900516 0.103969722 0.277971 0.781117 -0.175228335 0.233029
Lag4_BearVolume 0.08721789 0.12336881 0.706969 0.479821 -0.154998142 0.329434
7. 6. OLS Multivariate Modelswere ineffectivein
predictingfuture returnsinthe SPYyetwere
marginallyeffective whenrun backwards,
suggestingthatsentimentdataisreactive
rather thananticipatory.
7. Basedon contemporaneousanalysis,there is
evidence tosuggestthatmarketagentsexhibit
strongertendenciestoreactto negative market
movementthanpositive marketmovement.
OngoingResearch
1. Applyingmore sophisticateddatamining
techniquestoanalyze the data
2. Explorationof intradaydata
3. Exploringdirectional predictionmodels.i.e.
“Up” or “Down”days
4. Developmentof aHeuristicbasedtrading
indicator