A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

A Dream of Predicting Elections
and Trading Stocks using Twitter
@yelenamm Yelena Mejova
Yet Another Conference
Moscow Nov 30 2014

Money and Power
Financial Indexes Political Opinion
Movie box office sales
Consumer confidence
Dow Jones Industrial Average
Individual stocks
Political leaning
Polarization
User classification
Predicting elections!

More…
CIKM 2013 Tutorial
TWITTER AND THE REAL WORLD
with Ingmar Weber
https://sites.google.com/site/twitterandtherealworld/home
Finance, Politics, Public Health, Event Detection

Can I get rich on the stock market?

Answer: NO
• Efficient Market Hypothesis:
– Financial markets are information efficient:
prices fully reflect all available information
– Cannot be predicted
JUST AS WELL

Answer: NO MAYBE?
A non-random walk down Wall Street
(1999) Lo & MacKinlay
• Behavioral Economics:
overconfidence, overreaction,
information bias…
• Insider trading, governmental
manipulation…
• Speculative bubbles: information be
damned!
• Bitcoin: where is the value?
– pure bubble

http://nymag.com/daily/intelligencer/2013/04/ http://dataminr.com/
bloombergs-vip-terminal-tweeters.html
2. specialized providers 3. data analytics
Self-reported Gains http://www.caymanatlantic.com/
1. content providers
http://gnip.com/
4. traders

Movies
Predicting the Future with Social Media
@sitaramasur Asur, Huberman @ WI-IAT 2010
Hollywood Stock
Exchange
• 2.89 million tweets
• 24 movies
Correl (tweet rate
& box office gross) = 0.90
using previous week’s tweets
to predict weekend box office gross:
Adj R2 = 0.973
…and sentiment (positive/negative) score to
predict second weekend box office gross:
Adj R2 = 0.94
least squares linear regression
using previous week’s HSX scores
to predict weekend box office gross:
Adj R2 = 0.967

Consumer Confidence
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
@brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)
• Index of Consumer Sentiment (ICS) (Reuters/UMich)
• Economic Confidence Index (ECI) (Gallup)
• Subjectivity Lexicon: Opinion Finder
• High day-to-day volatility.
• Average last k days.
• Keyword “jobs”
k = 1, 7, 30
• @ k=15 correlates with
ECI (Gallup) at r = 0.731
[some figures from authors’ original slides]

Consumer Confidence
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
@brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)
• Predicting 1 month in
the future using
previous 15 days
• Correlation with
Gallup poll:
– Twitter model: 77.5%
– Poll model: 80.4%
• As Twitter grows, so is
its accuracy

Twitter mood predicts the stock market
@jlbollen Bollen, Mao, Zeng @ Journal of Computational Science (2011)
Twitter 2008 (~10M tweets)
DJIA
• Opinion Finder: positive / negative
• GPOMS: calm, alert, sure, vital, kind and happy
888 citations!
Slight correlation only
with Calm GPOMS
mood (0.065 at 6 day
lag)

Stocks Tweets and trades: The information content of stock microblogs
@timmsprenger Sprenger, Tumasjan, Sandner, Welpe
@ European Financial Management (2013)
• Tracking stocks $STOCK

Stocks
Tweets and trades: The information content of stock microblogs
@timmsprenger Sprenger, Tumasjan, Sandner, Welpe
@ European Financial Management (2013)
• Tweets: Jan 1 – Jun 30, 2010
• S&P100 companies using $STOCK (price change & volume)
• Naïve Bayes classifier trained on 2,500 tweets
(buy/sell/hold): 81.2% accuracy
-0.022 p<0.05
BULLISH STOCK RETURNS
0.091 p<0.001
0.073 p<0.001
VOLUME TRADING VOLUME
0.312 p<0.001
1.5% posted 53.7% of all messages
– Their quality is not much better!

Stocks
Correlating financial time series with micro-blogging activity
Ruiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)
• Twitter: Jan 1 – Jun 30, 2010
• 150 (randomly selected) companies in S&P 500
=
– Daily relative price change
– Traded volume normalized by mean traded volume for
that company for entire time period
represent tweets
as a GRAPH
constrain graph to a company
and a time window
+ similarity nodes
connecting very
similar tweets (RTs)
using Jaccard distance

Trading Simulation

Correlating financial time series with micro-blogging activity
Ruiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)
• the only one that obtains
a profit during which the
Dow Jones fell -5.8%
• Best performance for vector auto-regression
with the number of
connected components
proposed

Don’t fire your stock broker yet
High-Speed
Trading No
Longer Hurtling
Forward
http://www.nytimes.com/interactive/20
12/10/15/business/Declining-US-High-
Frequency-Trading.html?ref=business
Computer Flaws
Get Wry Smile
From Humans
Displaced
http://dealbook.nytimes.com/2013/09/19/com
puter-flaws-get-wry-smile-from-humans-displaced/?
ref=highfrequencyalgorithmictrading
How a Trading
Algorithm Went
Awry
http://online.wsj.com/article/SB10
0014240527487040293045755263
90131916792.html

Can we track &
predict political
sentiment?

Elections
“the crowning of the Internet as the king of all political
media”
“the beginning of the Internet presidency”
- on Obama's 2008 victory
Mitch Wagner, InformationWeek
Transparency
“Instantaneous tweeting of shady government
practices -- and the resulting uproar -- means that
public bodies are more responsive than ever”.
- Wesley Donehue, CNN
Mobilization
“This exercise of power has produced a template for
political action on a massive scale fueled by social
media.”
- on PIPA and SOPA
Vivek Wadhwa, Washington Post
bloggeruniversity.wordpress.com

US politics
• Most research will be presented
• Clear left/right distinction
• Popular political figures
• High(ish) Twitter engagement REPUBLICAN
(right)
DEMOCRAT
(left)

lets talk politics
• Sampling Twitter for political speech
– general keywords: #current
– event keywords: #debate08, #tweetdebate
– people: obama, romney, merkel
– parties: democrat, republican, pirate
– accounts: wefollow, twellow
– news stories, known URL retweets
• Caveats
– requires expert knowledge
– known best after the event
– selection bias (who do you want to ignore?)

political leaning classification
1. Text (text classification)
2. Network (label propagation)

Predicting the political alignment of twitter users
@vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
• Bootstrapped hashtag-based sample of political discussion
• Gardenhose Sep 14 - Nov 4, 2010
• Classes: right, left, ambiguous
TEXT-BASED
• remove stopwords, hashtags, mentions, urls, all words occurring
once in the corpus
• TFIDF weighting:
HASHTAG-BASED
• remove hashtags used by only one user

NETWORK-BASED
• Label propagation
– Initialize cluster membership
arbitrarily
– Iteratively update each node’s
label according to the majority of
its neighbors
– Ties are broken randomly
• Cluster assignment by majority
cluster label (using manually
labeled data)
retweet network

• Classifier: Support Vector Machine

Political hashtag hijacking in the US
Hadgu, Garimella, Weber @ WWW (2013)
SEED-BASED (highly precise)
1. Start with few seed users of known leaning
2. The leaning of their followers is determined by which side
they retweet more
3. Propagate users’ leaning to their tweets/hashtags/etc
hashtag accuracy: 98.6%, 93%, 90% (by source)

Visualizing media bias through Twitter
@JisunAn An, Cha, Gummadi, Crowcroft, Quercia @ AAAI (2012)
• Position news sources in leaning by considering the
overlap in common audience (followers on Twitter)
Correlates with ADA (Americans
for Democratic Action score):
– Spearman rank order
correlation: .44
– Pearson product-moment
correlation coefficient: .51
Jaccard similarity
of their audience
distance between (co-subscribers)
two media

Russia, Ukraine, and the West: Social Media Sentiment in
• Nov 21, 2013 – Feb 26, 2014
• Classifier labeled to identify pro-and
anti- protest sentiment
• Twitter, blogs, news, forums,
Facebook
the Euromaidan Protests
@bretling Etling @ Berkman Center Research (2014)
US & UK Russia Ukraine
Does it reflect the overall
sentiment of the people?

look who’s talking
Vocal Minority versus Silent Majority: Discovering the Opinions of the Long Tail
@enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)
• 2010 US Senate special election in
Massachusetts
• Silent majority & vocal minority
tweet differently (different
agendas?)
• Spamming, fake grassroots
movements
number of tweets per user

Detecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
• Truthiness is a quality characterizing a "truth" that a person making
an argument or assertion claims to know intuitively "from the gut"
or because it "feels right" without regard to evidence, logic,
intellectual examination, or facts.
Classifying memes for astroturf
Truthy project by Indiana University

#ampat @PeaceKaren_25 &
@HopeMarie_25
gopleader.gov Chris Coons
#Truthy @senjohnmccain on.cnn.com/aVMu5y “Obama said…”
LEGITIMATE TRUTHY
Detecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)

elections
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
Tumasjan, Sprenger, Sandner, Welpe @ AAAI (2010)
• 2009 German federal elections
sentiment profiles of leading candidates in
tweets mentioning them (using LIWC2007) “The mere number of tweets reflects
voter preferences and comes close to
traditional election polls”
CONTROVERSY!
638 citations!

elections
Why the Pirate Party won the German election of 2009 or the trouble with predictions: A
response to Tumasjan, Sprenger, Sander, & Welpe, "Predicting elections with twitter: What
140 characters reveal about political sentiment"
@ajungherr Jungherr, Jürgens, Schoen @ SSCR V30/N2 (2012)
“show that the results of TSSW are contingent on arbitrary choices of the authors”
Choice of Parties Choice of Dates
If results of polls played a role in
deciding upon the inclusion of particular
parties, the TSSW method is dependent
on public opinion surveys
prediction analysis […] between [13.9]
and [27.9], the day of the election,
produces a MAE of of 2.13, significantly
higher than the MAE for TSSW

• Non-US elections:
elections
– Irish: On using twitter to monitor political sentiment and predict election
results, Bermingham, Smeaton (2011)
• "Our approach however has demonstrated an error which is not competitive with the
traditional polling methods.”
– Dutch: Predicting the 2011 Dutch senate election results with twitter, Sang,
Bos (2012)
• Uses polls for demographic imbalances, yet performance still below traditional polls
– Singapore: Tweets and votes: A study of the 2011 singapore general election,
Skoric, Poor, Achananuparp, Lim, Jiang (2012)
• Not as accurate as traditional polls, performance at local government levels
– New Zealand: Can Social Media Predict Election Results? Evidence from New
Zealand, Michael P. Cameron (2013)
• “the size of the effect is small and it appears that social media presence will therefore
only make a difference in closely contested elections”
– many more coming out each day!

! " #$" %#&' ! ! " (
! "#$%&' (#)#&'%* +, (- %' . (/ - %' ' #"
! "# "$%&' (&)*+,$' -. *&/ -+0",$"' 1%&2%"13&45"$$+-
6. 1"+*&7. 8' 9: ;+**' &<!"#$%&'()=& >1";?&' (&@; "+0' &<AB. "1=
/ . 1. 3"' $"%&4?&C+$.D.%&<!*'+,-./0*'1'-=& E+**+%*+8&F' **+3+&<>A: =
)1"&C2%$. (. -.G&<!02,/3-*=& E+**+%*+8&F' **+3+&<>A: =
Metaxas et al. @ SocialCom (2011)

elections
How (Not) To Predict Elections @takis_metaxas Metaxas et al. @ SocialCom (2011)
• A method of prediction should be an algorithm
finalized before the election
– specify data collection, cleaning, analysis, interpretation…
• Data from social media are fundamentally different
than data from natural phenomena
– people change their behavior next time around
– spammers & activists will try to take advantage
• From a testable theory on why and when it predicts
(avoid self-deception!)
• (maybe) Learn from professional pollsters
– tweet ≠ user
– user ≠ eligible voter
– eligible voter ≠ voter
[from authors’ original slides]

What now?
Now-casting  Fore-casting
Show improvement over baseline
or that you could make money / a difference
Publish a paper: let us know!
(or go to Wall Street / Political Thinktank )

thank you
Yelena Mejova
@yelenamm
ymejova@qf.org.qa

day of the week market index
Fixed-effects panel regressions at 1 and 2 day lags
1. Bullishness is affected more strongly by returns than vice versa
2. Message volume predicts trading volume
3. … but high trading volume and volatility predict message volume
more
4. Agreement among traders leads to lower trading volumes

A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Ähnlich wie A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute (20)

Mehr von Yandex

Mehr von Yandex (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute

Hinweis der Redaktion