DSPy a system for AI to Write Prompts and Do Fine Tuning
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mejova, Qatar Computing Research Institute
1. A Dream of Predicting Elections
and Trading Stocks using Twitter
@yelenamm Yelena Mejova
Yet Another Conference
Moscow Nov 30 2014
2.
3. Money and Power
Financial Indexes Political Opinion
Movie box office sales
Consumer confidence
Dow Jones Industrial Average
Individual stocks
Political leaning
Polarization
User classification
Predicting elections!
4. More…
CIKM 2013 Tutorial
TWITTER AND THE REAL WORLD
with Ingmar Weber
https://sites.google.com/site/twitterandtherealworld/home
Finance, Politics, Public Health, Event Detection
6. Answer: NO
• Efficient Market Hypothesis:
– Financial markets are information efficient:
prices fully reflect all available information
– Cannot be predicted
JUST AS WELL
7. Answer: NO MAYBE?
A non-random walk down Wall Street
(1999) Lo & MacKinlay
• Behavioral Economics:
overconfidence, overreaction,
information bias…
• Insider trading, governmental
manipulation…
• Speculative bubbles: information be
damned!
• Bitcoin: where is the value?
– pure bubble
9. Movies
Predicting the Future with Social Media
@sitaramasur Asur, Huberman @ WI-IAT 2010
Hollywood Stock
Exchange
• 2.89 million tweets
• 24 movies
Correl (tweet rate
& box office gross) = 0.90
using previous week’s tweets
to predict weekend box office gross:
Adj R2 = 0.973
…and sentiment (positive/negative) score to
predict second weekend box office gross:
Adj R2 = 0.94
least squares linear regression
using previous week’s HSX scores
to predict weekend box office gross:
Adj R2 = 0.967
10. Consumer Confidence
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
@brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)
• Index of Consumer Sentiment (ICS) (Reuters/UMich)
• Economic Confidence Index (ECI) (Gallup)
• Subjectivity Lexicon: Opinion Finder
• High day-to-day volatility.
• Average last k days.
• Keyword “jobs”
k = 1, 7, 30
• @ k=15 correlates with
ECI (Gallup) at r = 0.731
[some figures from authors’ original slides]
11. Consumer Confidence
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
@brendan642 O’Connor, Balasubramanyan, Routledge, Smith @ ICWSM (2011)
• Predicting 1 month in
the future using
previous 15 days
• Correlation with
Gallup poll:
– Twitter model: 77.5%
– Poll model: 80.4%
• As Twitter grows, so is
its accuracy
12. Twitter mood predicts the stock market
@jlbollen Bollen, Mao, Zeng @ Journal of Computational Science (2011)
Twitter 2008 (~10M tweets)
DJIA
• Opinion Finder: positive / negative
• GPOMS: calm, alert, sure, vital, kind and happy
[some figures from authors’ original slides]
888 citations!
Slight correlation only
with Calm GPOMS
mood (0.065 at 6 day
lag)
13. Stocks Tweets and trades: The information content of stock microblogs
@timmsprenger Sprenger, Tumasjan, Sandner, Welpe
@ European Financial Management (2013)
• Tracking stocks $STOCK
14. Stocks
Tweets and trades: The information content of stock microblogs
@timmsprenger Sprenger, Tumasjan, Sandner, Welpe
@ European Financial Management (2013)
• Tweets: Jan 1 – Jun 30, 2010
• S&P100 companies using $STOCK (price change & volume)
• Naïve Bayes classifier trained on 2,500 tweets
(buy/sell/hold): 81.2% accuracy
-0.022 p<0.05
BULLISH STOCK RETURNS
0.091 p<0.001
0.073 p<0.001
VOLUME TRADING VOLUME
0.312 p<0.001
1.5% posted 53.7% of all messages
– Their quality is not much better!
15. Stocks
Correlating financial time series with micro-blogging activity
Ruiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)
• Twitter: Jan 1 – Jun 30, 2010
• 150 (randomly selected) companies in S&P 500
=
– Daily relative price change
– Traded volume normalized by mean traded volume for
that company for entire time period
represent tweets
as a GRAPH
[some figures from authors’ original slides]
constrain graph to a company
and a time window
+ similarity nodes
connecting very
similar tweets (RTs)
using Jaccard distance
17. Correlating financial time series with micro-blogging activity
Ruiz, Hristidis, Castillo, Gionis, Jaimes @ WSDM (2012)
• the only one that obtains
a profit during which the
Dow Jones fell -5.8%
• Best performance for vector auto-regression
with the number of
connected components
proposed
18. Don’t fire your stock broker yet
High-Speed
Trading No
Longer Hurtling
Forward
http://www.nytimes.com/interactive/20
12/10/15/business/Declining-US-High-
Frequency-Trading.html?ref=business
Computer Flaws
Get Wry Smile
From Humans
Displaced
http://dealbook.nytimes.com/2013/09/19/com
puter-flaws-get-wry-smile-from-humans-displaced/?
ref=highfrequencyalgorithmictrading
How a Trading
Algorithm Went
Awry
http://online.wsj.com/article/SB10
0014240527487040293045755263
90131916792.html
20. Elections
“the crowning of the Internet as the king of all political
media”
“the beginning of the Internet presidency”
- on Obama's 2008 victory
Mitch Wagner, InformationWeek
Transparency
“Instantaneous tweeting of shady government
practices -- and the resulting uproar -- means that
public bodies are more responsive than ever”.
- Wesley Donehue, CNN
Mobilization
“This exercise of power has produced a template for
political action on a massive scale fueled by social
media.”
- on PIPA and SOPA
Vivek Wadhwa, Washington Post
bloggeruniversity.wordpress.com
21. US politics
• Most research will be presented
• Clear left/right distinction
• Popular political figures
• High(ish) Twitter engagement REPUBLICAN
(right)
DEMOCRAT
(left)
22. lets talk politics
• Sampling Twitter for political speech
– general keywords: #current
– event keywords: #debate08, #tweetdebate
– people: obama, romney, merkel
– parties: democrat, republican, pirate
– accounts: wefollow, twellow
– news stories, known URL retweets
• Caveats
– requires expert knowledge
– known best after the event
– selection bias (who do you want to ignore?)
24. political leaning classification
Predicting the political alignment of twitter users
@vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
• Bootstrapped hashtag-based sample of political discussion
• Gardenhose Sep 14 - Nov 4, 2010
• Classes: right, left, ambiguous
TEXT-BASED
• remove stopwords, hashtags, mentions, urls, all words occurring
once in the corpus
• TFIDF weighting:
HASHTAG-BASED
• remove hashtags used by only one user
25. political leaning classification
Predicting the political alignment of twitter users
@vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
NETWORK-BASED
• Label propagation
– Initialize cluster membership
arbitrarily
– Iteratively update each node’s
label according to the majority of
its neighbors
– Ties are broken randomly
• Cluster assignment by majority
cluster label (using manually
labeled data)
retweet network
26. political leaning classification
Predicting the political alignment of twitter users
@vagabondjack Conover, Gonçalves, Ratkiewicz, Flammini, Menczer @ SocialCom (2011)
• Classifier: Support Vector Machine
27. political leaning classification
Political hashtag hijacking in the US
Hadgu, Garimella, Weber @ WWW (2013)
SEED-BASED (highly precise)
1. Start with few seed users of known leaning
2. The leaning of their followers is determined by which side
they retweet more
3. Propagate users’ leaning to their tweets/hashtags/etc
hashtag accuracy: 98.6%, 93%, 90% (by source)
28. political leaning classification
Visualizing media bias through Twitter
@JisunAn An, Cha, Gummadi, Crowcroft, Quercia @ AAAI (2012)
• Position news sources in leaning by considering the
overlap in common audience (followers on Twitter)
Correlates with ADA (Americans
for Democratic Action score):
– Spearman rank order
correlation: .44
– Pearson product-moment
correlation coefficient: .51
Jaccard similarity
of their audience
distance between (co-subscribers)
two media
29. political leaning classification
Russia, Ukraine, and the West: Social Media Sentiment in
• Nov 21, 2013 – Feb 26, 2014
• Classifier labeled to identify pro-and
anti- protest sentiment
• Twitter, blogs, news, forums,
Facebook
the Euromaidan Protests
@bretling Etling @ Berkman Center Research (2014)
US & UK Russia Ukraine
Does it reflect the overall
sentiment of the people?
30. look who’s talking
Vocal Minority versus Silent Majority: Discovering the Opinions of the Long Tail
@enimust Mustafaraj, Finn, Whitlock, Metaxas @ SocialCom (2011)
• 2010 US Senate special election in
Massachusetts
• Silent majority & vocal minority
tweet differently (different
agendas?)
• Spamming, fake grassroots
movements
number of tweets per user
31. look who’s talking
Detecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
• Truthiness is a quality characterizing a "truth" that a person making
an argument or assertion claims to know intuitively "from the gut"
or because it "feels right" without regard to evidence, logic,
intellectual examination, or facts.
Classifying memes for astroturf
Truthy project by Indiana University
32. look who’s talking
#ampat @PeaceKaren_25 &
@HopeMarie_25
gopleader.gov Chris Coons
#Truthy @senjohnmccain on.cnn.com/aVMu5y “Obama said…”
LEGITIMATE TRUTHY
Detecting and Tracking Political Abuse in Social Media
Ratkiewicz, Conover, Meiss, Goncalves, Flammini, Menczer @ ICWSM (2011)
33. elections
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
Tumasjan, Sprenger, Sandner, Welpe @ AAAI (2010)
• 2009 German federal elections
sentiment profiles of leading candidates in
tweets mentioning them (using LIWC2007) “The mere number of tweets reflects
voter preferences and comes close to
traditional election polls”
CONTROVERSY!
638 citations!
34. elections
Why the Pirate Party won the German election of 2009 or the trouble with predictions: A
response to Tumasjan, Sprenger, Sander, & Welpe, "Predicting elections with twitter: What
140 characters reveal about political sentiment"
@ajungherr Jungherr, Jürgens, Schoen @ SSCR V30/N2 (2012)
“show that the results of TSSW are contingent on arbitrary choices of the authors”
Choice of Parties Choice of Dates
If results of polls played a role in
deciding upon the inclusion of particular
parties, the TSSW method is dependent
on public opinion surveys
prediction analysis […] between [13.9]
and [27.9], the day of the election,
produces a MAE of of 2.13, significantly
higher than the MAE for TSSW
35. • Non-US elections:
elections
– Irish: On using twitter to monitor political sentiment and predict election
results, Bermingham, Smeaton (2011)
• "Our approach however has demonstrated an error which is not competitive with the
traditional polling methods.”
– Dutch: Predicting the 2011 Dutch senate election results with twitter, Sang,
Bos (2012)
• Uses polls for demographic imbalances, yet performance still below traditional polls
– Singapore: Tweets and votes: A study of the 2011 singapore general election,
Skoric, Poor, Achananuparp, Lim, Jiang (2012)
• Not as accurate as traditional polls, performance at local government levels
– New Zealand: Can Social Media Predict Election Results? Evidence from New
Zealand, Michael P. Cameron (2013)
• “the size of the effect is small and it appears that social media presence will therefore
only make a difference in closely contested elections”
– many more coming out each day!
37. elections
How (Not) To Predict Elections @takis_metaxas Metaxas et al. @ SocialCom (2011)
• A method of prediction should be an algorithm
finalized before the election
– specify data collection, cleaning, analysis, interpretation…
• Data from social media are fundamentally different
than data from natural phenomena
– people change their behavior next time around
– spammers & activists will try to take advantage
• From a testable theory on why and when it predicts
(avoid self-deception!)
• (maybe) Learn from professional pollsters
– tweet ≠ user
– user ≠ eligible voter
– eligible voter ≠ voter
[from authors’ original slides]
38. What now?
Now-casting Fore-casting
Show improvement over baseline
or that you could make money / a difference
Publish a paper: let us know!
(or go to Wall Street / Political Thinktank )
41. day of the week market index
Fixed-effects panel regressions at 1 and 2 day lags
1. Bullishness is affected more strongly by returns than vice versa
2. Message volume predicts trading volume
3. … but high trading volume and volatility predict message volume
more
4. Agreement among traders leads to lower trading volumes
http://en.wikipedia.org/wiki/Efficient-market_hypothesis
A non-random walk down Wall Street: http://press.princeton.edu/chapters/i6558.html
GNIP
Provides firehose access to major social media APIs including Twitter
http://gnip.com/sources/
Bloomberg Twitter terminal
Added in April 2013, it shows tweets from a selection of users, including news services, financial writers, economists, and bloggers selected by Bloomberg’s terminal team.
http://nymag.com/daily/intelligencer/2013/04/bloombergs-vip-terminal-tweeters.html
Dataminr
social analytics company with clients in finance and government which use firehose access to find tweets which may be newsworthy and relevant to a particular market.
Article on Dataminr:
http://www.fastcoexist.com/1681873/twitter-can-predict-the-stock-market-if-youre-reading-the-right-tweets
Derwent Capital Markets and Cayman Atlantic
are firms which first pioneered in the use of social media sentiment analysis for financial trading. As their inspiration they cite Bollen/Mao/Zeng study that we will talk about later in this talk, which establishes some connection between emotion-related words in Twitter to subsequent moves in the Dow Jones Industrial Average. (“Twitter mood predicts the stock market”)
http://en.wikipedia.org/wiki/Derwent_Capital_Markets
http://money.cnn.com/2013/07/10/investing/twitter-fund-trading/index.html
Much interest to internet community
Viral marketing by producers
Box-office revenues are an easy indicator of market success
Collected using a manually compiled term list
tweet rate: # twts referring to a movie per hour
linear regression uses 7 variables, each for the twt rate for the day
predicting the box office gross during the opening weekend
Using LingPipe for sentiment classification
HSX – holywood stock exchange
http://www.hsx.com/security/view/POPEY
Dataset: 1 billion tweets 2008-2009 (message volume increased by a factor of 50 during this period)
using Gardenhose
ICS: five questions administered monthly in telephone interviews
ECI: two questions administered daily (reported in 3-day averages)
a message is positive (/neg) if it has a pos (/neg) lex word
score = pos / neg
Prediction using 44 through 30 days before the target date
In a model with both variables, at first the importance of twt text is small (the coefficient), but starting in mid-2009 text becomes a much better predictor.
Dataset: ~10M tweets in 2008
Models DJIA closing values
Dow Jones Industrial Average - a stock market index -- price-weighted average of 30 significant stocks traded on the New York Stock Exchange and the Nasdaq
Granger Causality: whether lagged values of X provide statistically significant information about future values of Y
Twitter is mostly REACTIVE
Financial indicators: returns, abnormal returns, cumulative abnormal returns, trading volume, daily volatility
Twitter features: bullishness, message volume, agreement among messages
Built a fixed-effects panel regressions at 1 and 2 day lags (coefficients are standardized)
Bullishness is affected more strongly by returns than vice versa
Message volume predicts trading volume
… but high trading volume and volatility predict message volume more
Agreement among traders leads to lower trading volumes
Vector autoregression (VAR) is an econometric model used to capture the linear interdependencies among multiple time series.
Is the improvement enough to compensate for the fees associated with trading stocks?
For Full-text classification they represent text using TFIDF (removing stopwords, hashtags, mentions, urls, and all words occurring once in the corpus
T_ij – importance of a term I in the set of tweets produced by user j
Using retweet network where there is an undirected link between two users if either user mentions the other during the analysis period
Clusters: accept the majority cluster label
Adjusted Rand Index: similarity of two cluster label assignments (-1 when totally disagree and +1 when totally agree)
Clusters + Tags: topological information with 19 hashtags selected using Hall’s feature selection algorithm
ADA: Americans for Democratic Action score, calculated based on various quantities such as the number of times a media outlet cites various think-tanks and other policy groups
Dashed lines: retweets, Yellow: mentions
#ampat – retweeted between two accounts who seemed to be owned by the same person
@PeaceKaren_25 (and @HopeMarie_25) – two colluding accounts
gopleader.gov – promoted by the two *_25 accounts above
Chris Coons – a tweet smearing Chris Coons using bot accounts
#Truthy – injected by NPR Science Friday radio program
@senjohnmccain -- retweets from @ladygaga (don’t ask don’t tell) and mentions
LIWC – Linguistic Inquiry and Word Count
Second table: absolute errors
method of prediction should be an algorithmfinalized before the elections:
(input) how Social Media data are to be collected, including the dates of data collection,
(filter) the way in which the cleanup of the data is to be performed (e.g., the selection of keywords relevant to the election),
(method) the algorithms to be applied on the data along with their input parameters, and
(output) the semantics under which the results are to be interpreted
Data from social media are fundamentally different than data from natural phenomena
people will change their behavior the next time around
spammers & activists will try to take advantage
Fixed-effects panel regressions of market the three tweet features as independent variables at 1 and 2 day lags
NWK: dummy variable signifying first trading day of the week
Market: market index as control
1. There is almost no effect of bullishness on next day returns, however bullishness 2 days ago is associated with negative returns. (bold: actual values, italicized: standardized)