SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Identifying Content for
Planned Events Across
Social Media Sites
Hila Becker, Dan Iter, Mor Naaman, Luis Gravano
Event Content in Social Media




                                2
MIKE CLARKE/AFP/Getty Images

                               3
Source: Tweets
from Tahrir, edited
by Nadia Idle and
Alex Nunns
                      4
5
Event Content in Social Media
 Challenges:
  Wide variety of topics, not all related to events
   (e.g., personal status updates, every-day mundane
   conversations)
  Unconventional text: abbreviations, typos
  Large-scale, rapidly produced content

 Opportunities:
  Content generated in real-time, as events happen
  Rich context features (e.g., time, location)
  Users’ perspective


                                                       6
Event Content in Social Media


                               Planned Event is a real-world occurrence with
Content Discovery




                                corresponding published event record consisting
                    Known




                                of:
                                 Title, describing the subject of the event
                                 The time at which the event is planned to occur
                    Unknown




                                                                                    7
Identifying Content for Planned
   Events
 Identify planned event documents given
 known event information
  User-contributed planned event records
    LastFM Events
    EventBrite
    Facebook Events
  Structured features (e.g., title, time, location)

 Challenging identification scenario
  Known event information is often inaccurate or
   incomplete
  Social media documents are brief and noisy

                                                       8
Planned Event Record

  Title


Descriptio
   n


Date/Time


 Venue


   City



                                 9
Approach for Planned Event Content
   Identification
 Two-step query formulation strategy
  Precision-oriented queries using known event
   features
  Recall-oriented queries using retrieved content
   from precision-oriented queries

 Leverage cross-site content
  Identify event documents on each site
   individually
  Use event documents on one site to retrieve
   additional event documents on a different site

                                                     10
Query Formulation Strategies:
   Precision-oriented Queries
 Combined event record features
   Phrase, bag-of-words, stop word elimination
   Examples: [“title”+”venue”], [title-no-
    stopwords+”city”]

 Restricted document creation time
 Why is this hard?
   Specific titles: “Celebrate Brooklyn! Opening Night
    Gala & Concert with Andrew Bird”
   General titles: “Opening Night Concert”


                                                          11
Query Formulation Strategies:
Precision-oriented Queries Demo




                                  12
Query Formulation Strategies: Recall-
   oriented Queries
 Generated using “high-precision” results
 from precision-oriented queries
 Frequency Analysis
   Frequent terms in the event’s retrieved content
   Infrequent terms in Web documents
   Limited to 100 candidate queries

 Term Extraction
  Identify meaningful event-related concepts



                                                      13
Query Selection Strategies
 Problem: potentially large set of generated
 queries
 Select top candidate queries
  Specificity: favor longer queries
  Temporal profile:
          120
          100
           80
           60
           40
           20
            0
                6/7/11   6/8/11     6/9/11       6/10/11   6/11/11    6/12/11   6/13/11
                         [andrew bird concert]        [state farm insurance]


                                                                                          14
Leveraging Cross-Site Content
 Build precision-oriented
  queries using planned event
  features
                                  …
 Use precision-oriented
  queries to retrieve data
  from:
   Twitter
   Flickr
   YouTube

 Build recall-oriented queries
  using data from:
   Each site individually
   All sites collectively
                                      15
Experimental Settings
 60 planned events from
  EventBrite, LastFM, LinkedIn, and Facebook
 Corresponding social media documents
   Retrieved from Twitter, Flickr, and YouTube
   Ranked according to similarity to event record

 Techniques
   Precision: only precision-oriented queries
   MS: precision- and recall-oriented queries selected
    using Microsoft n-gram probability score
   TR/RTR: precision- and recall-oriented queries selected
    using ratio of document frequency around the time of
    the event to document frequency in larger time
    window

                                                              16
Evaluation
 How do our queries compare with human-
 generated queries for the event?
 How good are our queries?
 How good are the results retrieved by our
 queries?




                                              17
How good are our queries?
Would the query match documents related
 to the event? 1 = not likely, 5 = certainly
  5

 4.5

  4
                                                      MS
 3.5                                                  TR
  3                                                   RTR
                                                      MS-TR
 2.5
                                                      MS-RTR
  2
                                                      Precision
 1.5

  1
        Twi er   Flickr   YouTube   All   Precision

                                                                  18
Can our queries retrieve relevant
    results?
 Rank retrieved results
   Based on similarity to event record
   Using multi-feature similarity metric (Becker et al.
    WSDM’10)

 Evaluate relevance of documents
   NDCG
   Averaged over all events that had some retrieved
    results

 Consider event coverage

                                                           19
NDCG Performance on Twitter

         1
       0.95
                                                           Twi er-MS
        0.9
       0.85
NDCG




        0.8                                                Twi er-RTR

       0.75
        0.7                                                Precision
       0.65
        0.6
                 5          10          15        20
                          Number of Documents k

       NDCG scores for top-k Twitter documents retrieved by
       Precision-oriented queries (Precision), and query strategies
       using Twitter data (Twitter-RTR, Twitter-MS).
                                                                        20
Cross-Site NDCG Performance
       1.1
         1                         4          4
       0.9
       0.8       5       5                              Precision
       0.7
NDCG




       0.6       39      36        34         34
                                                        Twi er-MS
       0.5
       0.4
       0.3                                              YouTube-MS
                                              7
       0.2       9       8         8
       0.1
         0
             0   5       10        15         20   25
                      Number of Documents k
NDCG scores for top-k YouTube documents retrieved by
Precision-oriented queries (Precision), and query strategies
using data from Twitter (Twitter-MS) and YouTube (YouTube MS).
                                                                     21
Conclusions
 Developed a two-step query-oriented
 solution for planned event content
 identification
   User contributed event records
   Multiple social media sites

 Identified diverse event content:
 photos, videos, and tweets
 Showed how event content from one site
 can be used to enhance event content
 identification on other sites

                                           22
Future Work
 Leverage explicit links
   From event records to documents
   Between documents from different social media
    sites

 Sub-event content analysis
 Event timeline construction




                                                    23

Weitere ähnliche Inhalte

Ähnlich wie Hila wsdm12-final

Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality AssurancePéter Király
 
Real World Application Performance with MongoDB
Real World Application Performance with MongoDBReal World Application Performance with MongoDB
Real World Application Performance with MongoDBMongoDB
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...Yiannis Kompatsiaris
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Eventsmor
 
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK BehaviorsStarted from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK BehaviorsJamieWilliams130
 
IDEAS 2013 Presentation
IDEAS 2013 PresentationIDEAS 2013 Presentation
IDEAS 2013 PresentationMuntazir Mehdi
 
A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...CREST @ University of Adelaide
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...Michele Pasin
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSAPRBETTER
 
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...confluent
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Ashutosh Jadhav
 

Ähnlich wie Hila wsdm12-final (20)

Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality Assurance
 
Real World Application Performance with MongoDB
Real World Application Performance with MongoDBReal World Application Performance with MongoDB
Real World Application Performance with MongoDB
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Events
 
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK BehaviorsStarted from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
 
IDEAS 2013 Presentation
IDEAS 2013 PresentationIDEAS 2013 Presentation
IDEAS 2013 Presentation
 
A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...A Decentralised Platform for Provenance Management of Machine Learning Softwa...
A Decentralised Platform for Provenance Management of Machine Learning Softwa...
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 

Kürzlich hochgeladen

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Hila wsdm12-final

  • 1. Identifying Content for Planned Events Across Social Media Sites Hila Becker, Dan Iter, Mor Naaman, Luis Gravano
  • 2. Event Content in Social Media 2
  • 4. Source: Tweets from Tahrir, edited by Nadia Idle and Alex Nunns 4
  • 5. 5
  • 6. Event Content in Social Media  Challenges:  Wide variety of topics, not all related to events (e.g., personal status updates, every-day mundane conversations)  Unconventional text: abbreviations, typos  Large-scale, rapidly produced content  Opportunities:  Content generated in real-time, as events happen  Rich context features (e.g., time, location)  Users’ perspective 6
  • 7. Event Content in Social Media  Planned Event is a real-world occurrence with Content Discovery corresponding published event record consisting Known of:  Title, describing the subject of the event  The time at which the event is planned to occur Unknown 7
  • 8. Identifying Content for Planned Events  Identify planned event documents given known event information  User-contributed planned event records  LastFM Events  EventBrite  Facebook Events  Structured features (e.g., title, time, location)  Challenging identification scenario  Known event information is often inaccurate or incomplete  Social media documents are brief and noisy 8
  • 9. Planned Event Record Title Descriptio n Date/Time Venue City 9
  • 10. Approach for Planned Event Content Identification  Two-step query formulation strategy  Precision-oriented queries using known event features  Recall-oriented queries using retrieved content from precision-oriented queries  Leverage cross-site content  Identify event documents on each site individually  Use event documents on one site to retrieve additional event documents on a different site 10
  • 11. Query Formulation Strategies: Precision-oriented Queries  Combined event record features  Phrase, bag-of-words, stop word elimination  Examples: [“title”+”venue”], [title-no- stopwords+”city”]  Restricted document creation time  Why is this hard?  Specific titles: “Celebrate Brooklyn! Opening Night Gala & Concert with Andrew Bird”  General titles: “Opening Night Concert” 11
  • 13. Query Formulation Strategies: Recall- oriented Queries  Generated using “high-precision” results from precision-oriented queries  Frequency Analysis  Frequent terms in the event’s retrieved content  Infrequent terms in Web documents  Limited to 100 candidate queries  Term Extraction Identify meaningful event-related concepts 13
  • 14. Query Selection Strategies  Problem: potentially large set of generated queries  Select top candidate queries  Specificity: favor longer queries  Temporal profile: 120 100 80 60 40 20 0 6/7/11 6/8/11 6/9/11 6/10/11 6/11/11 6/12/11 6/13/11 [andrew bird concert] [state farm insurance] 14
  • 15. Leveraging Cross-Site Content  Build precision-oriented queries using planned event features …  Use precision-oriented queries to retrieve data from:  Twitter  Flickr  YouTube  Build recall-oriented queries using data from:  Each site individually  All sites collectively 15
  • 16. Experimental Settings  60 planned events from EventBrite, LastFM, LinkedIn, and Facebook  Corresponding social media documents  Retrieved from Twitter, Flickr, and YouTube  Ranked according to similarity to event record  Techniques  Precision: only precision-oriented queries  MS: precision- and recall-oriented queries selected using Microsoft n-gram probability score  TR/RTR: precision- and recall-oriented queries selected using ratio of document frequency around the time of the event to document frequency in larger time window 16
  • 17. Evaluation  How do our queries compare with human- generated queries for the event?  How good are our queries?  How good are the results retrieved by our queries? 17
  • 18. How good are our queries? Would the query match documents related to the event? 1 = not likely, 5 = certainly 5 4.5 4 MS 3.5 TR 3 RTR MS-TR 2.5 MS-RTR 2 Precision 1.5 1 Twi er Flickr YouTube All Precision 18
  • 19. Can our queries retrieve relevant results?  Rank retrieved results  Based on similarity to event record  Using multi-feature similarity metric (Becker et al. WSDM’10)  Evaluate relevance of documents  NDCG  Averaged over all events that had some retrieved results  Consider event coverage 19
  • 20. NDCG Performance on Twitter 1 0.95 Twi er-MS 0.9 0.85 NDCG 0.8 Twi er-RTR 0.75 0.7 Precision 0.65 0.6 5 10 15 20 Number of Documents k NDCG scores for top-k Twitter documents retrieved by Precision-oriented queries (Precision), and query strategies using Twitter data (Twitter-RTR, Twitter-MS). 20
  • 21. Cross-Site NDCG Performance 1.1 1 4 4 0.9 0.8 5 5 Precision 0.7 NDCG 0.6 39 36 34 34 Twi er-MS 0.5 0.4 0.3 YouTube-MS 7 0.2 9 8 8 0.1 0 0 5 10 15 20 25 Number of Documents k NDCG scores for top-k YouTube documents retrieved by Precision-oriented queries (Precision), and query strategies using data from Twitter (Twitter-MS) and YouTube (YouTube MS). 21
  • 22. Conclusions  Developed a two-step query-oriented solution for planned event content identification  User contributed event records  Multiple social media sites  Identified diverse event content: photos, videos, and tweets  Showed how event content from one site can be used to enhance event content identification on other sites 22
  • 23. Future Work  Leverage explicit links  From event records to documents  Between documents from different social media sites  Sub-event content analysis  Event timeline construction 23

Hinweis der Redaktion

  1. Users often share information about events in a variety of forms on different social media sites
  2. Social media provides many challenges ad opportunities for identifying event information
  3. Explain that we work in real-time (for the most part) and say we divide the space into unknown and know identification scenarios, then mention the type of even we focus on for each. Also briefly mention that as we discuss in the thesis, these are not disjoint
  4. on average, queries generated by this strategy are expected to retrieve some results for their associated event.
  5. This is averaged over all events that had some results. How many events had some results? Precision – 22% of events , Twitter RTR 76% of events