SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Discovering and Navigating Memes
               in Social Media
                              Matt Lease
                         School of Information
                      University of Texas at Austin
                        ml@ischool.utexas.edu
                              @mattlease


                            Joint Work with
                    Hohyon Ryu & Nicholas Woodward


Paper to appear at HyperText 2012: 23rd ACM Conference on Hypertext and Social Media
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   2
Critical Reading (Literacy)
      • Context-awareness (how work is situated)
                – Related works, Time/Place, Author…
      • Recognizing & questioning
                – Sources of Influence
                – Positions, Assumptions, Bias, …
      • New challenges online
                – Scale, authorship, citing of sources, borrowing…
      • Traditional approach: education
April 3, 2012       SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   3
Inspiration #1: Living Stories




                     livingstories.googlelabs.com
April 3, 2012    SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   4
Memes
• Similar phrases found across multiple sources
      – Includes multiple phrasings of same idea
• Re-use reveals implicit network
      – Sources, Individuals, Communities
      – Patterns of re-use reinforce links
• Questions
      – Re-use?
      – Intended re-use?
      – Visible (quoted)?
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   5
Inspiration #2: Meme Tracker




April 3, 2012    SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   6
Where Repeated Text Occurs
      • Intended Re-use
                – Visible (Quotation): “to be or not to be”
                    • Leskovec et al., KDD’09 ( memetracker.org )
                – Hidden: e.g. plagiarism, false plurality
                – Unmarked
                    •   Near-Duplicate documents
                    •   Boilerplate: All rights reserved
                    •   Common adage: …a penny saved…
                    •   Style, genre, laziness, …
      • Accidental borrowing
      • Shared context (e.g. named entities)
                – E.g. named-entities: S. Skiena et al., Stony Brook ( textmap.com )
      • Chance (e.g. …then he said…)
April 3, 2012           SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   7
Data
      • TREC Blogs08 Collection
                – http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html
                – 28M permalinks (January 2008 – January 2009)
                – 250G compressed
      • ICWSM 2009 Spinn3r Blog Dataset
                – http://www.icwsm.org/data/
                – 44 million blog posts (August - September, 2008)
                – 27 GB compressed
      • ICWSM 2011 Spinn3r Blog Dataset

April 3, 2012       SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   8
Inspiration #3: Popular Passages
      • Kolak & Schilit, HyperText’08
      • Find re-use in scanned books
                – Find repeated phrases
                – Group related phrases
                – Rank passages
                – MapReduce processing architecture
      • Browsing interface with generated links
      • Issues: data/task, locality, details, scalability
April 3, 2012       SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   9
Processing Architecture
                                                                               Blogs08 Test Collection
                                                                                  28M posts, 1.4TB
                Preprocessing (Pseudo-MapReduce)
                Decruft & Language Identification
                HTML Strip & Near-Duplicate Detection                            16M posts, 960GB



                Common Phrase Extraction
                                                                                  15K posts, 43GB
                3 MapReduce Stages

                Common Phrase Ranking
                Daily Top 200 Phrases                                            6.2M phrases, 2GB
                1 MapReduce Process

                Common Phrase Clustering
                                                                                75K phrases, 2.6MB
                1 MapReduce Process

                Meme Browser                                                        68K memes



April 3, 2012        SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   10
Meme Browser




April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   11
Efficiency: Meme Clustering



 • From WEKA ARFF format to sparse representation
       – From ~96 hours  11 hours
 • Indexed vs. un-indexed
       – From 11 hours  16 minutes (single core)
       – From 34 minutes  3 minutes (136 cores)
 • Distributed vs. single core
       – From 11 hours  34 minutes (un-indexed)
       – From 16 minutes  3 minutes (indexed)
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   12
Thank You!
Joint Work with                 Matt Lease
– Hohyon (Will) Ryu             ml@ischool.utexas.edu
– Nicholas Woodward             www.ischool.utexas.edu/~ml
                                  @mattlease



                                Support
                                • FCT of Portugal / UT CoLab
                                • Amazon Web Services
Meme Browser:                   • UT Austin LIFT Award
odyssey.ischool.utexas.edu/mb   • John P. Commons Fellowship

Weitere ähnliche Inhalte

Ähnlich wie Discovering and Navigating Memes in Social Media

Discovering Memes in Social Media
Discovering Memes in Social MediaDiscovering Memes in Social Media
Discovering Memes in Social MediaMatthew Lease
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11Rafael Alvarado
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databasessjwoodman
 
Hany's Doctoral Consortium
Hany's Doctoral ConsortiumHany's Doctoral Consortium
Hany's Doctoral Consortiumheinestien
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handoutsSTIinnsbruck
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMESharonYang
 
Hany's JCDL Doctoral Consortium
Hany's JCDL Doctoral ConsortiumHany's JCDL Doctoral Consortium
Hany's JCDL Doctoral Consortiumheinestien
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)Marcia Zeng
 
Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012RENDER project
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 

Ähnlich wie Discovering and Navigating Memes in Social Media (20)

Discovering Memes in Social Media
Discovering Memes in Social MediaDiscovering Memes in Social Media
Discovering Memes in Social Media
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Hany's Doctoral Consortium
Hany's Doctoral ConsortiumHany's Doctoral Consortium
Hany's Doctoral Consortium
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
 
Hany's JCDL Doctoral Consortium
Hany's JCDL Doctoral ConsortiumHany's JCDL Doctoral Consortium
Hany's JCDL Doctoral Consortium
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
 
Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012
 
Ir1
Ir1Ir1
Ir1
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Transitive credit
Transitive creditTransitive credit
Transitive credit
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 

Mehr von Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 

Mehr von Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Discovering and Navigating Memes in Social Media

  • 1. Discovering and Navigating Memes in Social Media Matt Lease School of Information University of Texas at Austin ml@ischool.utexas.edu @mattlease Joint Work with Hohyon Ryu & Nicholas Woodward Paper to appear at HyperText 2012: 23rd ACM Conference on Hypertext and Social Media
  • 2. April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 2
  • 3. Critical Reading (Literacy) • Context-awareness (how work is situated) – Related works, Time/Place, Author… • Recognizing & questioning – Sources of Influence – Positions, Assumptions, Bias, … • New challenges online – Scale, authorship, citing of sources, borrowing… • Traditional approach: education April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 3
  • 4. Inspiration #1: Living Stories livingstories.googlelabs.com April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 4
  • 5. Memes • Similar phrases found across multiple sources – Includes multiple phrasings of same idea • Re-use reveals implicit network – Sources, Individuals, Communities – Patterns of re-use reinforce links • Questions – Re-use? – Intended re-use? – Visible (quoted)? April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 5
  • 6. Inspiration #2: Meme Tracker April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 6
  • 7. Where Repeated Text Occurs • Intended Re-use – Visible (Quotation): “to be or not to be” • Leskovec et al., KDD’09 ( memetracker.org ) – Hidden: e.g. plagiarism, false plurality – Unmarked • Near-Duplicate documents • Boilerplate: All rights reserved • Common adage: …a penny saved… • Style, genre, laziness, … • Accidental borrowing • Shared context (e.g. named entities) – E.g. named-entities: S. Skiena et al., Stony Brook ( textmap.com ) • Chance (e.g. …then he said…) April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 7
  • 8. Data • TREC Blogs08 Collection – http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html – 28M permalinks (January 2008 – January 2009) – 250G compressed • ICWSM 2009 Spinn3r Blog Dataset – http://www.icwsm.org/data/ – 44 million blog posts (August - September, 2008) – 27 GB compressed • ICWSM 2011 Spinn3r Blog Dataset April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 8
  • 9. Inspiration #3: Popular Passages • Kolak & Schilit, HyperText’08 • Find re-use in scanned books – Find repeated phrases – Group related phrases – Rank passages – MapReduce processing architecture • Browsing interface with generated links • Issues: data/task, locality, details, scalability April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 9
  • 10. Processing Architecture Blogs08 Test Collection 28M posts, 1.4TB Preprocessing (Pseudo-MapReduce) Decruft & Language Identification HTML Strip & Near-Duplicate Detection 16M posts, 960GB Common Phrase Extraction 15K posts, 43GB 3 MapReduce Stages Common Phrase Ranking Daily Top 200 Phrases 6.2M phrases, 2GB 1 MapReduce Process Common Phrase Clustering 75K phrases, 2.6MB 1 MapReduce Process Meme Browser 68K memes April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 10
  • 11. Meme Browser April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 11
  • 12. Efficiency: Meme Clustering • From WEKA ARFF format to sparse representation – From ~96 hours  11 hours • Indexed vs. un-indexed – From 11 hours  16 minutes (single core) – From 34 minutes  3 minutes (136 cores) • Distributed vs. single core – From 11 hours  34 minutes (un-indexed) – From 16 minutes  3 minutes (indexed) April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 12
  • 13. Thank You! Joint Work with Matt Lease – Hohyon (Will) Ryu ml@ischool.utexas.edu – Nicholas Woodward www.ischool.utexas.edu/~ml @mattlease Support • FCT of Portugal / UT CoLab • Amazon Web Services Meme Browser: • UT Austin LIFT Award odyssey.ischool.utexas.edu/mb • John P. Commons Fellowship