Twitter Sentiment Analysis for Time Series Prediction

Sentiment Analysis

1. Discover a niche network of Twitter users
2. Model their emotions on topics
3. Use feelings to more accurately predict a
time series e.g. The stock market
e.g. Box office success
4. Are some [users/networks] more influential
than others?

This Talk

 The Design Decision
 The Core Goals
 The 3 parts of the project:
1. Classifying the SENTIMENT of tweets
2. Building a NETWORK of twitter users
3. Finding a TIME SERIES of sentiment for each
user

Sentiment Analysis Used Already

 Derwent Capital Markets - ”The twitter
hedgefund”
 £25m fund
 10% of tweets
 predicts Dow Jones movement direction with
87.6% accuracy
 Returned 1.85% in its first month of trading
 Johan Bollen, Indiana University, used bag-of-
words approach


 Product reviews / ratings

 Social Media Analytics

Design Decision
 Many paragraphs of text (Product Reviews)
+ : Better accuracy of prediction
- : Less data overall
 Huge amount of small quantities of text (Twitter)
+ : Opinions of greater number of people
& at high enough frequency to model as a signal
- : Classification of opinion is v. poor

=> TWITTER

2 Current Aims (will change later)

1. Project aims to be context
independent (i.e. Movies & products)

2. When context is given, use it to
better classify tweets

1: Sentiment Analysis of Tweets
 Three-tier classification process:
tweet

spam not spam

objective subjective

positive negative

 Double-Back Propagation Algorithm
 ACL Journal, March 2011, MIT Press
 Opinion Word Extraction & Target Extraction
 4 rules
 ”The phone has a good screen”
=> add ”good” to list of adjectives
=> add ”screen” to list of nouns
 Etc.
 Great for rating features of a product
 Not great for tweets

 Twitter Part Of Speech (POS) tagger:
www.ark.cs.cmu.edu/TweetNLP/
 Written in java " ^
Drive ^
 Max Ent " ^
, ,
go V
and &
watch V
it O
! ,
Fantastic A
movie N
. ,

Bootstrapped Tweet SA improver
Tweet
IMDB Movie
Review Corpora Tweet

Tweet
Sentiment
Analysis
Tweet
Double-Back
Prop. Algo Tweet

Tweet
Gives useful adjectives, nouns Tweet

2: Building a Network
 Collected my twitter friends, friends of friends,
friends of friends of friends.
 => 115,896 users

 Community detection:
 Paper 1: Near linear time algorithm for
detecting community structures on large
scale networks

 Paper 2: An LDA-based Community Structure
Discovery Approach for Large-Scale Social
Networks Haizheng Zhang


 Like MapReduce
 Instead of ”map” and ”reduce”
 Map = 'Update':
modify overlapping sets of data
 Reduce = 'Sync': perform reductions in the
background while sync is running
 Label Propagation & LDA

3: Time series prediction
 Will get time series from python to R
using the rpy2 module

 R has a great package ”quantmod” for
importing financial market data.

 Can also import other time series
very easily & many great libraries.

Built With

 Python - For majority of code
Packages: numpy, scipy, matplotlib
networkx, graphviz, rpy2
django, twython, nltk
 R - For time series analysis
 Postgreql - SQL database
 Java - Twitter POS tagger
 C/C++ - GraphLab

End Product

IMDB Movie
Review Corpora Tweet
Tweet
Sentiment
Tweet
Double-Back Analysis
Prop. Algo Tweet
Tweet

Thank You
 Mike Davies
 Documented at www.m1ked.com

Notes: Vowpal Wabbit LDA

Vowpal Wabbit is an open source library
for fast online learning (mostly SGD)
mainly developed by a guy at Yahoo.
 Optimised for speed
 LDA uses clever tricks like vectorisation,
floating point representation to avoid using
pow() and exp() functions.

Notes: Label Propagation

 Label Propagation has been proven to be an
effective semi-supervised learning approach in
many applications. The key idea behind label
propagation is to first construct a graph in which
each node represents a data point and each
edge is assigned a weight often computed as
the similarity between data points, then
propagate the class labels of labeled data to
neighbors in the constructed graph in order to
make predictions.

Twitter Sentiment Analysis for Time Series Prediction

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Twitter Sentiment Analysis for Time Series Prediction

Ähnlich wie Twitter Sentiment Analysis for Time Series Prediction (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Twitter Sentiment Analysis for Time Series Prediction