SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Towards Context-Aware Search and Analysis
                   on
           Social Media Data
                Leon Derczynski
                 Bin Yang 杨彬
               Christian S. Jensen
Evolution of communication

Functional utterances

Vowels

Velar closure: consonants

Speech

New modality: writing
                                Increased
Digital text
                                 machine-



                            ?
E-mail                           readable
Social media
                                information
Social Media = Big Data
Gartner ''3V'' definition:

1.Volume

2.Velocity

3.Variety

High volume & velocity of messages:

   Twitter has     ~20 000 000 users per month
   They write     ~500 000 000 messages per day

Massive variety:
  Stock markets;
  Earthquakes;
  Social arrangements;
  … Bieber
What is machine-readable now?
Messages now contain

-   not only linguistic content

-   but also:
       Links (e.g. URI)
       Topic markers (e.g. hashtags)
       Meta-information

What kind of meta-information?

    User profile (including home location)
    Images
    Messages replied to
    Message language

    Time of message
    Location of message
What resources do we have now?


Large, content-rich, linked, digital streams of human communication

We transfer knowledge via communication

Sampling communication gives a sample of human knowledge


          ''You've only done that which you can communicate''


The metadata (time – place – imagery) gives a richer resource:


      → A sampling of human behaviour
What can we do with this resource?
Context increases the data's richness

Increased richness enables novel applications

Time and Place are interesting parts of message context




1.What kinds of applications are there?

2.What are the practical challenges?
Temporal Context
Messages have timestamps:




                                    +
Two temporal retrieval scenarios:

      1. Historical analyses

      2. Emerging data
Historical search
Ability to retrieve from archives: Longitudinal query mode 0

Retrieve information on:

      ●   Lifecycle of socially connected groups

      ●   Analyse precursors to events, post-hoc




                       2008                                                      2011

0. Weikum et al. 2011: ''Longitudinal analytics on web archive data: It’s about time'', Proc. CIDR
Historical search
Retrospective analyses into cause and effect




                                     ''There's a dead crow
                                         in my garden''



Social media mentions of dead crows predict WNV in humans 1




1. Sugumaran & Voss 2012: ''Real-time spatio-temporal analysis of West Nile Virus using Twitter Data'', Proc.
Int'l conference on Computing for Geospatial Research and Applications
Emerging search
Data emerging at high velocity:

      185 000 documents per minute

Gives a high temporal density




Search over this info enables:

      ●   Live coverage of events

      ●
          Realtime identification of emerging events 2



2. Cohen at al. 2011: ''Computational journalism: A call to arms to database researchers'', Proc. CIDR
Temporal indexing
What are our requirements?

   ●   High-frequency document creation

   ●   Temporal cross-sections of varying size

   ●   Time-sensitive TF/IDF: stopwords are fluid



How can we do this? - Open challenge

   ●   Tree indexing hard to distribute

   ●   Maybe with adaptive multi-resolution grids?
Spatial Context
Demand for spatial information:

      20% of all Google searches

      53% of Bing mobile searches

Heterogeneous spatial context sources

      GPS locations (most reliable)

      Origin bounding boxes (e.g. city)

      User profile text??? 3

      Author's friends' locations 4

3. Hecht at al. 2011: ''Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in User
Profiles'', Proc. ACM CHI ;       4. Rout et al. 2013: ''Where's @wally? A Graph Based Method for Geolocating
Users in Social Networks'', Proc. ACM Hypertext
Spatial Keyword Search
How can we query a set of social media messages?

   Treat as a a set of objects, each having
      Text           
      Location       

   Query parameters:
     Query text
     Query location

Given query and set of messages, rank by similarity:

   Text similarity (Cosine, Siamese Learning Net, Oriented PCA)
   Separating distance (Haversine, Manhattan, Eco-routed)
   Blend this with balancing coeff 


   (just like conventional spatial keyword search)
Spatial Keyword Search
Query:                                                  E
  ''good bar in north copenhagen''
                                                                  B
Issued from location 

Five candidate messages                                 A               C

Query region established
                                                                            D
Rank by blend of location and textual similarity

           Message                                          loca text
       A   So drunk last night at @BarSyv                   0.7       0.6
       B   Out shoe shopping!!! #louboutintime              0.9       0.0
       C   Who pays $9 for a beer?!                         0.6       0.5
       D   wow found cph's greatest cocktail bar lol        0.1       1.0
       E   Traffic. Traffic everywhere. Need a drink.       0.4       0.2
Continuous Spatial Queries
Social media scenario characterised by:

   Streaming data

   New spatial objects constantly appearing

Two new spatial keyword query types:

   Static Continuous (SCSKQ)
      - Fixed query location
      - Tracks newly appearing objects

   Moving Continuous (MCSKQ)
     - Query location transits locus
     - Result updated with new objects

Novel part: fresh objects continuously introduced
Location Diversity
Location data unreliable

Reliability of location data... is also unreliable

''There are known knowns.. we also know there are known unknowns..
            but there are also unknown unknowns'' – Donald Rumsfeld

Text mentions require disambiguation


   ●   In profile
   ●   In messages
   ●   In queries




Requirement is to rank vague points given vague query
Willingness to travel
Determines useful search radius

Based on mode of transport:
                   14.9km
                        22.0km
                                 40.6km
                                          61.5km
                                            >100km

Different for varying classes of Point Of Interest?


ST Social media = huge dataset

   Easy data collection

   Useful for e.g. town planning
Spatio-temporal Challenges
We've seen temporal and spatial challenges; let's combine!

Given all these spatio-temporal utterances, what can we do?

   - Spatial gives relevance from physical or travel proximity

   - Temporal gives relevance from recency and historical



Adding text to the spatio-temporal points gives


             explicit semantic context


Not only are ST patterns in the data, we are told what they mean!
Topic-based Retrieval
Retrieving results on a topic is useful; ''Tell me about X''

Specific terms vary between places and over time



2007                                                               England English



en.wikipedia.org/wiki/President_of_the_United_States   ''Jelly''



2011                                                                  US English




    … Spatio-temporally sensitive indexing?
Sentiment Monitoring
Measure how attitudes change over time and over location

Business uses:      where to send marketing

Political uses:     data-driven democratic.. campaigning

Governance uses: what are citizen priorities in a region

Temporal dimension enables tracking of trends and reactions



                                  red = upbeat;

                                  blue = complaint.

                                  - no normalisation for vocality!
Local Computational Journalism
Social media is quick

Social media is uncurated

''Citizen Journalism''


News has relevance scope:
  Recency
  Proximity


Different events relevant in different contexts:
    Rain in London
    Rain in Addis Ababa

Automatic event detection5 - and also reporting!
5. Ritter at al. 2012: 'Open domain event extraction from Twitter'', Proc. ACM SIGKDD
Summary

Social media is a rich source of ''big data''

A small sampling of all human discourse

It comes with temporal and spatial context


Context-aware search and analysis is very demanding!

   - Novel, powerful applications

   - Wide variety of domains

   - An open set of challenges
Thank you!


Thank you for listening!

   Do you have any questions?

Weitere ähnliche Inhalte

Andere mochten auch

Introduction to Social Media in Asia
Introduction to Social Media in AsiaIntroduction to Social Media in Asia
Introduction to Social Media in AsiaGaurav Mishra
 
Surrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative LeadershipSurrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative LeadershipKelsey Ruger
 
Media Research - Research Hypothesis
Media Research- Research HypothesisMedia Research- Research Hypothesis
Media Research - Research HypothesisTrinity Dwarka
 
The Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social MediaThe Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social MediaTactica Interactive
 
Social Media Measurement
Social Media MeasurementSocial Media Measurement
Social Media MeasurementKelsey Ruger
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social MediaKelsey Ruger
 

Andere mochten auch (6)

Introduction to Social Media in Asia
Introduction to Social Media in AsiaIntroduction to Social Media in Asia
Introduction to Social Media in Asia
 
Surrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative LeadershipSurrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative Leadership
 
Media Research - Research Hypothesis
Media Research- Research HypothesisMedia Research- Research Hypothesis
Media Research - Research Hypothesis
 
The Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social MediaThe Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social Media
 
Social Media Measurement
Social Media MeasurementSocial Media Measurement
Social Media Measurement
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
 

Ähnlich wie Towards Context-Aware Search and Analysis on Social Media Data

Phd Colloquium Spatial Analysis
Phd Colloquium Spatial AnalysisPhd Colloquium Spatial Analysis
Phd Colloquium Spatial Analysisalistairleak
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Digital Methods Initiative
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Augmenting offical datasets with volunteered geographic information a case ...
Augmenting offical datasets with volunteered geographic information   a case ...Augmenting offical datasets with volunteered geographic information   a case ...
Augmenting offical datasets with volunteered geographic information a case ...Institute for Transport Studies (ITS)
 
Geographic Information Management Transformation
Geographic Information Management TransformationGeographic Information Management Transformation
Geographic Information Management TransformationPat Kenny
 
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...FIA2010
 
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017kjanowicz
 
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebNoshir Contractor
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLICwebmaster
 
Big Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationBig Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationAndrew Prescott
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?Han Woo PARK
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 

Ähnlich wie Towards Context-Aware Search and Analysis on Social Media Data (20)

Phd Colloquium Spatial Analysis
Phd Colloquium Spatial AnalysisPhd Colloquium Spatial Analysis
Phd Colloquium Spatial Analysis
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
ICAME 2010
ICAME 2010ICAME 2010
ICAME 2010
 
Augmenting offical datasets with volunteered geographic information a case ...
Augmenting offical datasets with volunteered geographic information   a case ...Augmenting offical datasets with volunteered geographic information   a case ...
Augmenting offical datasets with volunteered geographic information a case ...
 
Geographic Information Management Transformation
Geographic Information Management TransformationGeographic Information Management Transformation
Geographic Information Management Transformation
 
ICCM 2014 -- Ignite Talks -- Session 2
ICCM 2014 -- Ignite Talks -- Session 2ICCM 2014 -- Ignite Talks -- Session 2
ICCM 2014 -- Ignite Talks -- Session 2
 
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
 
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
 
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
 
Big Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationBig Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentation
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
History of hci
History of hciHistory of hci
History of hci
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 

Mehr von Leon Derczynski

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and VeracityLeon Derczynski
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018Leon Derczynski
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceLeon Derczynski
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCLeon Derczynski
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingLeon Derczynski
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social MediaLeon Derczynski
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesLeon Derczynski
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Leon Derczynski
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social MediaLeon Derczynski
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doLeon Derczynski
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsLeon Derczynski
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextLeon Derczynski
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy DataLeon Derczynski
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseLeon Derczynski
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyLeon Derczynski
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkLeon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseLeon Derczynski
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceLeon Derczynski
 

Mehr von Leon Derczynski (20)

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 
RumourEval
RumourEvalRumourEval
RumourEval
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation Resource
 

Kürzlich hochgeladen

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 

Towards Context-Aware Search and Analysis on Social Media Data

  • 1. Towards Context-Aware Search and Analysis on Social Media Data Leon Derczynski Bin Yang 杨彬 Christian S. Jensen
  • 2. Evolution of communication Functional utterances Vowels Velar closure: consonants Speech New modality: writing Increased Digital text machine- ? E-mail readable Social media information
  • 3. Social Media = Big Data Gartner ''3V'' definition: 1.Volume 2.Velocity 3.Variety High volume & velocity of messages: Twitter has ~20 000 000 users per month They write ~500 000 000 messages per day Massive variety: Stock markets; Earthquakes; Social arrangements; … Bieber
  • 4. What is machine-readable now? Messages now contain - not only linguistic content - but also: Links (e.g. URI) Topic markers (e.g. hashtags) Meta-information What kind of meta-information? User profile (including home location) Images Messages replied to Message language Time of message Location of message
  • 5. What resources do we have now? Large, content-rich, linked, digital streams of human communication We transfer knowledge via communication Sampling communication gives a sample of human knowledge ''You've only done that which you can communicate'' The metadata (time – place – imagery) gives a richer resource: → A sampling of human behaviour
  • 6. What can we do with this resource? Context increases the data's richness Increased richness enables novel applications Time and Place are interesting parts of message context 1.What kinds of applications are there? 2.What are the practical challenges?
  • 7. Temporal Context Messages have timestamps: + Two temporal retrieval scenarios: 1. Historical analyses 2. Emerging data
  • 8. Historical search Ability to retrieve from archives: Longitudinal query mode 0 Retrieve information on: ● Lifecycle of socially connected groups ● Analyse precursors to events, post-hoc 2008 2011 0. Weikum et al. 2011: ''Longitudinal analytics on web archive data: It’s about time'', Proc. CIDR
  • 9. Historical search Retrospective analyses into cause and effect ''There's a dead crow in my garden'' Social media mentions of dead crows predict WNV in humans 1 1. Sugumaran & Voss 2012: ''Real-time spatio-temporal analysis of West Nile Virus using Twitter Data'', Proc. Int'l conference on Computing for Geospatial Research and Applications
  • 10. Emerging search Data emerging at high velocity: 185 000 documents per minute Gives a high temporal density Search over this info enables: ● Live coverage of events ● Realtime identification of emerging events 2 2. Cohen at al. 2011: ''Computational journalism: A call to arms to database researchers'', Proc. CIDR
  • 11. Temporal indexing What are our requirements? ● High-frequency document creation ● Temporal cross-sections of varying size ● Time-sensitive TF/IDF: stopwords are fluid How can we do this? - Open challenge ● Tree indexing hard to distribute ● Maybe with adaptive multi-resolution grids?
  • 12. Spatial Context Demand for spatial information: 20% of all Google searches 53% of Bing mobile searches Heterogeneous spatial context sources GPS locations (most reliable) Origin bounding boxes (e.g. city) User profile text??? 3 Author's friends' locations 4 3. Hecht at al. 2011: ''Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in User Profiles'', Proc. ACM CHI ; 4. Rout et al. 2013: ''Where's @wally? A Graph Based Method for Geolocating Users in Social Networks'', Proc. ACM Hypertext
  • 13. Spatial Keyword Search How can we query a set of social media messages? Treat as a a set of objects, each having Text  Location  Query parameters: Query text Query location Given query and set of messages, rank by similarity: Text similarity (Cosine, Siamese Learning Net, Oriented PCA) Separating distance (Haversine, Manhattan, Eco-routed) Blend this with balancing coeff  (just like conventional spatial keyword search)
  • 14. Spatial Keyword Search Query: E ''good bar in north copenhagen'' B Issued from location  Five candidate messages A C Query region established D Rank by blend of location and textual similarity Message loca text A So drunk last night at @BarSyv 0.7 0.6 B Out shoe shopping!!! #louboutintime 0.9 0.0 C Who pays $9 for a beer?! 0.6 0.5 D wow found cph's greatest cocktail bar lol 0.1 1.0 E Traffic. Traffic everywhere. Need a drink. 0.4 0.2
  • 15. Continuous Spatial Queries Social media scenario characterised by: Streaming data New spatial objects constantly appearing Two new spatial keyword query types: Static Continuous (SCSKQ) - Fixed query location - Tracks newly appearing objects Moving Continuous (MCSKQ) - Query location transits locus - Result updated with new objects Novel part: fresh objects continuously introduced
  • 16. Location Diversity Location data unreliable Reliability of location data... is also unreliable ''There are known knowns.. we also know there are known unknowns.. but there are also unknown unknowns'' – Donald Rumsfeld Text mentions require disambiguation ● In profile ● In messages ● In queries Requirement is to rank vague points given vague query
  • 17. Willingness to travel Determines useful search radius Based on mode of transport: 14.9km 22.0km 40.6km 61.5km >100km Different for varying classes of Point Of Interest? ST Social media = huge dataset Easy data collection Useful for e.g. town planning
  • 18. Spatio-temporal Challenges We've seen temporal and spatial challenges; let's combine! Given all these spatio-temporal utterances, what can we do? - Spatial gives relevance from physical or travel proximity - Temporal gives relevance from recency and historical Adding text to the spatio-temporal points gives explicit semantic context Not only are ST patterns in the data, we are told what they mean!
  • 19. Topic-based Retrieval Retrieving results on a topic is useful; ''Tell me about X'' Specific terms vary between places and over time 2007 England English en.wikipedia.org/wiki/President_of_the_United_States ''Jelly'' 2011 US English … Spatio-temporally sensitive indexing?
  • 20. Sentiment Monitoring Measure how attitudes change over time and over location Business uses: where to send marketing Political uses: data-driven democratic.. campaigning Governance uses: what are citizen priorities in a region Temporal dimension enables tracking of trends and reactions red = upbeat; blue = complaint. - no normalisation for vocality!
  • 21. Local Computational Journalism Social media is quick Social media is uncurated ''Citizen Journalism'' News has relevance scope: Recency Proximity Different events relevant in different contexts: Rain in London Rain in Addis Ababa Automatic event detection5 - and also reporting! 5. Ritter at al. 2012: 'Open domain event extraction from Twitter'', Proc. ACM SIGKDD
  • 22. Summary Social media is a rich source of ''big data'' A small sampling of all human discourse It comes with temporal and spatial context Context-aware search and analysis is very demanding! - Novel, powerful applications - Wide variety of domains - An open set of challenges
  • 23. Thank you! Thank you for listening! Do you have any questions?