SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
Detecting Trends!
Stanislav Nikolov §,†
Devavrat Shah §




                        §   †
Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
The Barclays Libor scandal
            #

                 12:49: “#Barclays” is listed as
                 a trending topic on Twitter
•  Is there enough information before the
   “jump”?
•  Is there enough information before the
   “jump”?




•  Can we predict which topics will trend in
   advance?
Yes.
•  79% early detection
•  1.43 hours mean early detection
•  95% TPR, 4% FPR.


              (best parameter setting)
What are Trending Topics?
•  Twitter: a global communication network.
What are Trending Topics?
•  Twitter: a global communication network.
•  Tweet: a short, public message.
What are Trending Topics?
•  Twitter: a global communication network.
•  Tweet: a short, public message.




•  Topic: a phrase in a tweet.
What are Trending Topics?
•  Twitter: a global communication network.
•  Tweet: a short, public message.




•  Topic: a phrase in a tweet.
•  Trending topic (a “trend”): a topic that
   becomes popular.
A Parametric Model
•  Expect certain type of pattern (e.g.
   constant + jumps).
  activity




             time
A Parametric Model
•  Expect certain type of pattern (e.g.
   constant + jumps).
•  Fit parameters to data (e.g. how much of
   a jump).
  activity




             time
A Parametric Model
•  Expect certain type of pattern (e.g.
   constant + jumps).
•  Fit parameters to data (e.g. how much of
   a jump).
  activity




                    p = 0.1


             time
A Parametric Model!
•  Expect certain type of pattern (e.g.
   constant + jumps).
•  Fit parameters to data (e.g. how much of
   a jump).
  activity




                        p = 0.6


             time
A Parametric Model!
•  Expect certain type of pattern (e.g.
   constant + jumps).
•  Fit parameters to data (e.g. how much of
   a jump).
  activity




                               p = 4.1


             time
A Parametric Model!
•  Expect certain type of pattern (e.g.
   constant + jumps).
•  Fit parameters to data (e.g. how much of
   a jump).
•  Decide if jump is big enough.
                         trend detected!
  activity




                                  p = 4.1


             time
Parametric Models are
Inadequate!
                            trend
                            detected!




                        activity
                                   time
Parametric Models are
Inadequate!
                            trend
                            detected!




                        activity
                                   time
Parametric Models are
Inadequate!
                            trend
                            detected!




                        activity
                                   time
Parametric Models are
Inadequate!
                            trend
                            detected!




                        activity
                                   time
A Data-Driven Approach
•  All of the information is in the data.
A Data-Driven Approach
•  All of the information is in the data.
•  Hypothesis
A Data-Driven Approach!
•  All of the information is in the data.
•  Hypothesis
  –  Tweets are written by people.
A Data-Driven Approach
•  All of the information is in the data.
•  Hypothesis
  –  Tweets are written by people.
  –  People are simple.
A Data-Driven Approach!
•  All of the information is in the data.
•  Hypothesis
  –  Tweets are written by people.
  –  People are simple.
     •  In how they spread information.
A Data-Driven Approach!
•  All of the information is in the data.
•  Hypothesis
  –  Tweets are written by people.
  –  People are simple.
     •  In how they spread information.
     •  In how they connect to one another.
A Data-Driven Approach!
•  All of the information is in the data.
•  Hypothesis
  –  Tweets are written by people.
  –  People are simple.
     •  In how they spread information.
     •  In how they connect to one another.
  –  Small number of distinct “ways” in which a
     topic can become trending.
Detecting Trends
Detecting Trends
Detecting Trends
Detecting Trends
Detecting Trends
Detecting Trends
Classification by Experts
Classification by Experts!
                     observation



s
Classification by Experts!
                     observation



s
r
Classification by Experts!
                     observation



s
r


          vote
Classification by Experts!
                     observation



s
r


          vote
Classification by Experts!
                     observation



s
r


          vote
Classification by Experts!
                     observation



s
r


          vote
Classification by Experts!
                     observation



s
r


          vote
Classification by Experts!
                     observation



s
r


          vote
Classification by Experts!
                     observation



s
r
Properties
•  Simple (just compute distances)
•  Scalable (can compute distances in
   parallel)
•  Non-parametric – model “parameters”
   scale with the data
Experimental
Results
Experiment
•    500 trends.
•    500 non-trends.
•    Do trend detection on a 50% hold out set.
•    Online signal classification.
Results – Early Detection


          (best parameter setting)
Results – FPR / TPR Tradeoff
Results – Early / Late Tradeoff
Concluding Remarks
•  Algorithm to detect trends early
•  Scalable nonparametric time series
   analysis
Concluding Remarks
•  Algorithm to detect trends early
•  Scalable nonparametric time series
   analysis

    classification
Concluding Remarks
•  Algorithm to detect trends early
•  Scalable nonparametric time series
   analysis

    classification   anomaly detection
Concluding Remarks
•  Algorithm to detect trends early
•  Scalable nonparametric time series
   analysis

    classification   anomaly detection   prediction
Concluding Remarks
•  Algorithm to detect trends early
•  Scalable nonparametric time series
   analysis

    classification   anomaly detection   prediction

Más contenido relacionado

Ähnlich wie Detecting Trends

Research Process | Step By Step | Reference Style In Research |
Research Process | Step By Step | Reference Style In Research |Research Process | Step By Step | Reference Style In Research |
Research Process | Step By Step | Reference Style In Research |FaHaD .H. NooR
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Josh Patterson
 
RESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptx
RESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptxRESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptx
RESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptxTORASIF
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionThe Integral Worm
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCodePolitan
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...tboubez
 
SOCs for the rest of us
SOCs for the rest of usSOCs for the rest of us
SOCs for the rest of usRyan Kovar
 
Making sense of citizen science data: A review of methods
Making sense of citizen science data: A review of methodsMaking sense of citizen science data: A review of methods
Making sense of citizen science data: A review of methodsolivier gimenez
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextElizabeth Murnane
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Josh Patterson
 
Chapter 5 searching and sorting
Chapter 5   searching and sortingChapter 5   searching and sorting
Chapter 5 searching and sortingmailund
 
Chapter 5 searching and sorting handouts
Chapter 5   searching and sorting handoutsChapter 5   searching and sorting handouts
Chapter 5 searching and sorting handoutsmailund
 
Student Affairs Assessment Committee Training
Student Affairs Assessment Committee TrainingStudent Affairs Assessment Committee Training
Student Affairs Assessment Committee TrainingStan Dura
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modellingQuinton Anderson
 
Chapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.pptChapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.pptjayveetyronecordova
 
Chapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.pptChapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.pptSuzetteElyzaSun
 
Untangling Concepts, Objects, and Information
Untangling Concepts, Objects, and InformationUntangling Concepts, Objects, and Information
Untangling Concepts, Objects, and InformationJim Logan
 

Ähnlich wie Detecting Trends (20)

Research Process | Step By Step | Reference Style In Research |
Research Process | Step By Step | Reference Style In Research |Research Process | Step By Step | Reference Style In Research |
Research Process | Step By Step | Reference Style In Research |
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Loubier slide share_qualitative
Loubier slide share_qualitativeLoubier slide share_qualitative
Loubier slide share_qualitative
 
RESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptx
RESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptxRESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptx
RESEARCH DESIGN AND METHODOLOGY - MAZPA EJIKEM NIMSA.pptx
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge Acquisition
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
 
SOCs for the rest of us
SOCs for the rest of usSOCs for the rest of us
SOCs for the rest of us
 
Making sense of citizen science data: A review of methods
Making sense of citizen science data: A review of methodsMaking sense of citizen science data: A review of methods
Making sense of citizen science data: A review of methods
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242
 
Chapter 5 searching and sorting
Chapter 5   searching and sortingChapter 5   searching and sorting
Chapter 5 searching and sorting
 
Chapter 5 searching and sorting handouts
Chapter 5   searching and sorting handoutsChapter 5   searching and sorting handouts
Chapter 5 searching and sorting handouts
 
Student Affairs Assessment Committee Training
Student Affairs Assessment Committee TrainingStudent Affairs Assessment Committee Training
Student Affairs Assessment Committee Training
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Chapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.pptChapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.ppt
 
Chapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.pptChapter Three of Your Thesis.ppt
Chapter Three of Your Thesis.ppt
 
Untangling Concepts, Objects, and Information
Untangling Concepts, Objects, and InformationUntangling Concepts, Objects, and Information
Untangling Concepts, Objects, and Information
 

Detecting Trends