SlideShare a Scribd company logo
1 of 22
Characterising the Emergent
 Semantics in Twitter Lists
 Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,
                        Oscar Corcho †
                   † {hgarcia, ocorcho}@fi.upm.es
                       Facultad de Informática
               Universidad Politécnica de Madrid, Spain

                       *{jeonhyuk,lerman}@isi.edu
                      Information Sciences Institute,
                  University of Southern California, USA
Introduction

     Twitter Lists




Characterising the Emergent Semantics in Twitter Lists       2
Introduction

         Curators
            and
        List Names




Characterising the Emergent Semantics in Twitter Lists       3
Introduction

        Members
           and
       List Names




Characterising the Emergent Semantics in Twitter Lists       4
Introduction

       Subscribers
           and
       List Names




Characterising the Emergent Semantics in Twitter Lists       5
Introduction




    • Previous examples showed individual uses of lists
          • Some list names where related among them


    • What about if we group the lists?




Characterising the Emergent Semantics in Twitter Lists       6
Introduction
Lists where the Yahoo!Finance user is a member
grouped by frequency of membership

Lists where the NASDAQ user is a member
grouped by number of subscriptions




Characterising the Emergent Semantics in Twitter Lists       7
Introduction: Research questions
• Is it possible to identify related keywords from list names according to
  the use given by the different user roles?
     • Are two list names related if they have been used by a similar set of
       curators?
     • Are two list names related if a similar set of users have subscribe to the
       corresponding lists?
     • Are two list names related if their corresponding lists have a similar set of
       members?
• What kind of user roles will generate more related keywords?
• What types of relations between keywords can we obtain?
     •   Synonyms, is-a, siblings..?



                                 Stocks                           Investment



                Curator 1
                              PersonalBanking                       Banks      Curator 2
                                                  List members




                                                   Subscriber 1

Characterising the Emergent Semantics in Twitter Lists                                8
Approach



          Elicit related keywords                              Characterise the
              from Twitter lists                            semantics of the relations



                Schema Representation                      Model to identify similar
                     of keywords                                                          Pairs of
                                                                 keywords
                                                                                          related
                   Based on curators                                                     keywords
                                                            Vector Space Model              per
Twitter                                                                                   Schema
                 Based on subscribers
 Lists                                                         Latent Dirichlet          Rep. and
                  Based on members                               Allocation                Model




  Characterising the Emergent Semantics in Twitter Lists                                  9
Approach


        Elicit related keywords                             Characterise the
            from Twitter lists                           semantics of the relations

                                                                               Synonyms
                                  Similarity based on WordNet                     Is-a
                                                                                Siblings
                                           Path Length                        Indirect is-a
    Pairs of
    related                      Wu & Palmer (Hierarchical Inf.)             Specificity of
   keywords                                                                    relations
      per                     Jiang & Conrath (Distributional Inf.)
    Schema
   Rep. and                                                                    Synonyms
     Model                                                                     (sameAs)
                              SPARQL queries over general KBs
                                                                             Binary relations
                                 published as Linked Data
                                                                              (TypeOf, BT)
                               DBpedia, OpenCyc, and UMBEL
                                                                              Object Prop.
                                                                              (Occupation)


Characterising the Emergent Semantics in Twitter Lists                                10
Experiment: Setup



    • Data set
          • Total
             • 297,521 lists, 2,171,140 members, 215,599 curators, and
                616,662 subscribers

          • We extracted 5932 unique keywords from list names; 55% of them
            were found in WordNet.
             • We use approximate matching of the list names with
               dictionary entries
             • The dictionary was created from Wikipedia article titles




Characterising the Emergent Semantics in Twitter Lists                   11
Experiment: Execution

                     Elicit related keywords from Twitter lists
                                                                                       Pairs of
               Schema Representation                                                   related
                                                          Model to identify similar
                    of keywords                                                       keywords
                                                                keywords
                                                                                         per
                  Based on curators                                                    Schema
                                                           Vector Space Model         Rep. and
Data                                                                                    Model
                Based on subscribers                          Latent Dirichlet
 set
                                                                Allocation
                 Based on members



                     Characterise the semantics of the relations                        Each
                                                                                      keyword
                                  Similarity based on WordNet                         with the 5
WordNet                                    Path Length                                  Most
Similarity                                                                             related
                                 Wu & Palmer (Hierarchical Inf.)

                              Jiang & Conrath (Distributional Inf.)



 Characterising the Emergent Semantics in Twitter Lists                                12
Experiment: Data Analysis
  Pearson's coefficient of correlations
     Correlation Values (-1 to 1)




                                    Average J&C distance and W&P similarity




Characterising the Emergent Semantics in Twitter Lists                              13
Experiment: Data Analysis
  Path Length in WordNet

         Path Length                   Members            Subscribers      Curators
                                    VSM          LDA      VSM     LDA    VSM    LDA
  1 (synonyms)                   8.58%        10.87% 3.97%       3.24%   1.24% 0.50%
  2 (is-a)                       3.42%        3.08%      1.93%   0.47%   0.70% 0.00%
  3 (Siblings, ind. Is-a)        2.37%        3.77%      2.96%   2.06%   2.38% 4.03%
  >3                             67.61%       65.5%      67.2%   67.5%   77.8% 75.8%
       % of relations found by each schema representation and model



    In average 97.65% of the relations with a path length greater than 3
    involve a common subsumer




Characterising the Emergent Semantics in Twitter Lists                         14
Experiment: Data Analysis
 Depth (LCS) and path length as indicators of specificity

  Depth of the least common subsumer
            Relations in WordNet




     Length of the path setting up the relation




                                        Relations with dept(LCS) >=5




Characterising the Emergent Semantics in Twitter Lists                                   15
Experiment: Findings
  Summary
  •    Similarity models based on members
        •   produce the results that are most correlated to the results of similarity measures
            based on WordNet
        •   find more synonyms and direct relations is-a when compared to the other
            models (path length).


  •    The majority of relations found by any model have a path length >= 3 and
       involve a common subsumer.
        •   Depth of LCS
             • VSM based on subscribers produces the highest number of specific
               relations (depth of LCS >= 5 or 6).


  •    Similarity models based on curators produce a lower number of relations.




Characterising the Emergent Semantics in Twitter Lists                                16
Experiment: Execution

                     Elicit related keywords from Twitter lists
                                                                                       Pairs of
               Schema Representation                                                   related
                                                          Model to identify similar
                    of keywords                                                       keywords
                                                                keywords
                                                                                         per
                  Based on curators                                                    Schema
                                                           Vector Space Model         Rep. and
Data                                                                                    Model
                Based on subscribers                          Latent Dirichlet
 set
                                                                Allocation
                 Based on members



                                                                                        Each
                             Characterise the semantics of the                        keyword
Ontological                              relations                                    with the 5
 Relations                        SPARQL queries over general KBs                       Most
  between                            published as Linked Data                          related
 keywords                          DBpedia, OpenCyc, and UMBEL




 Characterising the Emergent Semantics in Twitter Lists                                17
Experiment

    • We anchor 63.77% of the keywords extracted from
      Twitter Lists to DBPedia resources




Characterising the Emergent Semantics in Twitter Lists      18
Experiment
Vector-space model based on members (direct relations)
              Relation type                 Example of keywords
       Broader Term        26%           life-science        biotech
        subClassOf         26%              writers          authors
         developer         11%               google        google_apps
           genre           11%              funland          comedy
        largest city        6%             houston            texas
          Others           20%                 -                -

                Vector-space model based on subscribers (relations of length 3)
                                            Linked data pattern (54.73%): x -> object <-y
                                     Relations                          object                 Keywords
                          type          type          67.35%          company            nokia           intel
                      subClassOf    subClassOf        30.61%          activities    philanthropy    fundraising
                                             Linked data pattern (43.49%): x <-object->y
                                     Relations                          object                 Keywords
                         genre         genre          12.43%         Aesthetica        theater            film
                      occupation       genre          10.27%       Adam Maxwell         fiction         writer
                      occupation    occupation         8.11%        Alina Tugend         poet           writer
                       product        product          7.57%          ChenOne          clothes         fashion
                       industry       product          9.73%       UserLand Softw.       blogs        internet
                      known for     occupation         5.41%      Adeline Yen Mah       author         writing
                      known for     known for          3.78%      Rebecca Watson       skeptics        atheist
                     main interest main interest       3.24%          Aristotle        politics     government

Characterising the Emergent Semantics in Twitter Lists                                            19
Conclusions




 •   Different models to elicit related keywords from Twitter lists.
      • Curators, Subscribers and members - VSM and LDA
 •   Characterise the semantics of relations: WordNet-based similarity
     measures and SPARQL queries over linked data sets




Characterising the Emergent Semantics in Twitter Lists                   20
Conclusions


  •   Vector-space and LDA models based on members produce the most
      correlated results to those of WordNet-based metrics.
       • Shortest JC distance and highest WP similarities
  •   According to the path length in WordNet
       • Models based on members produce more synonyms and direct is-a
       • Most of the relations have path length ≥ 3 and have a common subsumer
           • Depth of LCS
               • Vector-space model based on subscribers finds highest
                  number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0)
  •   We confirm these results according to linked data sets




Characterising the Emergent Semantics in Twitter Lists                21
Characterising the Emergent
 Semantics in Twitter Lists
 Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*,
                        Oscar Corcho †
                   † {hgarcia, ocorcho}@fi.upm.es
                       Facultad de Informática
               Universidad Politécnica de Madrid, Spain

                       *{jeonhyuk,lerman}@isi.edu
                      Information Sciences Institute,
                  University of Southern California, USA

More Related Content

What's hot

Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
Question answer template
Question answer templateQuestion answer template
Question answer templateThanuw Chaks
 
Lect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentLect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentAntonio Moreno
 
An approach to source code plagiarism
An approach to source code plagiarismAn approach to source code plagiarism
An approach to source code plagiarismvarsha_bhat
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature miningLars Juhl Jensen
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Semantic Analysis and Concept-based Translation for Multilingual Information ...
Semantic Analysis and Concept-based Translation for Multilingual Information ...Semantic Analysis and Concept-based Translation for Multilingual Information ...
Semantic Analysis and Concept-based Translation for Multilingual Information ...Johannes Leveling
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics OntologyHammad Afzal
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Languagebutest
 
IRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET Journal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Semantic Technology: State of the arts and Trends
Semantic Technology: State of the arts and TrendsSemantic Technology: State of the arts and Trends
Semantic Technology: State of the arts and TrendsWon Kwang University
 

What's hot (17)

Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
Ontology Dev
Ontology DevOntology Dev
Ontology Dev
 
Ontology
OntologyOntology
Ontology
 
Question answer template
Question answer templateQuestion answer template
Question answer template
 
Lect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentLect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology development
 
An approach to source code plagiarism
An approach to source code plagiarismAn approach to source code plagiarism
An approach to source code plagiarism
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
 
Semantic Analysis and Concept-based Translation for Multilingual Information ...
Semantic Analysis and Concept-based Translation for Multilingual Information ...Semantic Analysis and Concept-based Translation for Multilingual Information ...
Semantic Analysis and Concept-based Translation for Multilingual Information ...
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
 
IRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency Parser
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Semantic Technology: State of the arts and Trends
Semantic Technology: State of the arts and TrendsSemantic Technology: State of the arts and Trends
Semantic Technology: State of the arts and Trends
 

Similar to Characterising the Emergent Semantics in Twitter Lists

Case Study: Text Analytics on 2 Million Documents
Case Study: Text Analytics on 2 Million DocumentsCase Study: Text Analytics on 2 Million Documents
Case Study: Text Analytics on 2 Million DocumentsPeter Wren-Hilton
 
Text Analytics on 2 Million Documents: A Case Study
Text Analytics on 2 Million Documents: A Case StudyText Analytics on 2 Million Documents: A Case Study
Text Analytics on 2 Million Documents: A Case StudyAlyona Medelyan
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationIRJET Journal
 
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...eMadrid network
 
"Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature""Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature"bridgingworlds2008
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01Tarek Koudsi
 
SKOS, RDFa, Microformats, Microdata
SKOS, RDFa, Microformats, MicrodataSKOS, RDFa, Microformats, Microdata
SKOS, RDFa, Microformats, MicrodataBernhard Haslhofer
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishingBradley Allen
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic WebKurt Cagle
 
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...Dr.-Ing. Thomas Hartmann
 

Similar to Characterising the Emergent Semantics in Twitter Lists (20)

Resource Description and Acess
Resource Description and AcessResource Description and Acess
Resource Description and Acess
 
Case Study: Text Analytics on 2 Million Documents
Case Study: Text Analytics on 2 Million DocumentsCase Study: Text Analytics on 2 Million Documents
Case Study: Text Analytics on 2 Million Documents
 
Text Analytics on 2 Million Documents: A Case Study
Text Analytics on 2 Million Documents: A Case StudyText Analytics on 2 Million Documents: A Case Study
Text Analytics on 2 Million Documents: A Case Study
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet Segmentation
 
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
2012 01 20 (upm) emadrid ocorcho upm dynalearn tecnologias semanticas en cont...
 
"Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature""Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature"
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Machine Aided Indexer
Machine Aided IndexerMachine Aided Indexer
Machine Aided Indexer
 
SNSW CO3.pptx
SNSW CO3.pptxSNSW CO3.pptx
SNSW CO3.pptx
 
Extended WordNet
Extended WordNetExtended WordNet
Extended WordNet
 
Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01
 
SKOS, RDFa, Microformats, Microdata
SKOS, RDFa, Microformats, MicrodataSKOS, RDFa, Microformats, Microdata
SKOS, RDFa, Microformats, Microdata
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation Defense
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Js3616841689
Js3616841689Js3616841689
Js3616841689
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
Aardvark shalini
Aardvark shaliniAardvark shalini
Aardvark shalini
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic Web
 
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
 

More from Oscar Corcho

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Oscar Corcho
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management Oscar Corcho
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaOscar Corcho
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceOscar Corcho
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101Oscar Corcho
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET Oscar Corcho
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016Oscar Corcho
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Oscar Corcho
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
 

More from Oscar Corcho (20)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Characterising the Emergent Semantics in Twitter Lists

  • 1. Characterising the Emergent Semantics in Twitter Lists Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*, Oscar Corcho † † {hgarcia, ocorcho}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid, Spain *{jeonhyuk,lerman}@isi.edu Information Sciences Institute, University of Southern California, USA
  • 2. Introduction Twitter Lists Characterising the Emergent Semantics in Twitter Lists 2
  • 3. Introduction Curators and List Names Characterising the Emergent Semantics in Twitter Lists 3
  • 4. Introduction Members and List Names Characterising the Emergent Semantics in Twitter Lists 4
  • 5. Introduction Subscribers and List Names Characterising the Emergent Semantics in Twitter Lists 5
  • 6. Introduction • Previous examples showed individual uses of lists • Some list names where related among them • What about if we group the lists? Characterising the Emergent Semantics in Twitter Lists 6
  • 7. Introduction Lists where the Yahoo!Finance user is a member grouped by frequency of membership Lists where the NASDAQ user is a member grouped by number of subscriptions Characterising the Emergent Semantics in Twitter Lists 7
  • 8. Introduction: Research questions • Is it possible to identify related keywords from list names according to the use given by the different user roles? • Are two list names related if they have been used by a similar set of curators? • Are two list names related if a similar set of users have subscribe to the corresponding lists? • Are two list names related if their corresponding lists have a similar set of members? • What kind of user roles will generate more related keywords? • What types of relations between keywords can we obtain? • Synonyms, is-a, siblings..? Stocks Investment Curator 1 PersonalBanking Banks Curator 2 List members Subscriber 1 Characterising the Emergent Semantics in Twitter Lists 8
  • 9. Approach Elicit related keywords Characterise the from Twitter lists semantics of the relations Schema Representation Model to identify similar of keywords Pairs of keywords related Based on curators keywords Vector Space Model per Twitter Schema Based on subscribers Lists Latent Dirichlet Rep. and Based on members Allocation Model Characterising the Emergent Semantics in Twitter Lists 9
  • 10. Approach Elicit related keywords Characterise the from Twitter lists semantics of the relations Synonyms Similarity based on WordNet Is-a Siblings Path Length Indirect is-a Pairs of related Wu & Palmer (Hierarchical Inf.) Specificity of keywords relations per Jiang & Conrath (Distributional Inf.) Schema Rep. and Synonyms Model (sameAs) SPARQL queries over general KBs Binary relations published as Linked Data (TypeOf, BT) DBpedia, OpenCyc, and UMBEL Object Prop. (Occupation) Characterising the Emergent Semantics in Twitter Lists 10
  • 11. Experiment: Setup • Data set • Total • 297,521 lists, 2,171,140 members, 215,599 curators, and 616,662 subscribers • We extracted 5932 unique keywords from list names; 55% of them were found in WordNet. • We use approximate matching of the list names with dictionary entries • The dictionary was created from Wikipedia article titles Characterising the Emergent Semantics in Twitter Lists 11
  • 12. Experiment: Execution Elicit related keywords from Twitter lists Pairs of Schema Representation related Model to identify similar of keywords keywords keywords per Based on curators Schema Vector Space Model Rep. and Data Model Based on subscribers Latent Dirichlet set Allocation Based on members Characterise the semantics of the relations Each keyword Similarity based on WordNet with the 5 WordNet Path Length Most Similarity related Wu & Palmer (Hierarchical Inf.) Jiang & Conrath (Distributional Inf.) Characterising the Emergent Semantics in Twitter Lists 12
  • 13. Experiment: Data Analysis Pearson's coefficient of correlations Correlation Values (-1 to 1) Average J&C distance and W&P similarity Characterising the Emergent Semantics in Twitter Lists 13
  • 14. Experiment: Data Analysis Path Length in WordNet Path Length Members Subscribers Curators VSM LDA VSM LDA VSM LDA 1 (synonyms) 8.58% 10.87% 3.97% 3.24% 1.24% 0.50% 2 (is-a) 3.42% 3.08% 1.93% 0.47% 0.70% 0.00% 3 (Siblings, ind. Is-a) 2.37% 3.77% 2.96% 2.06% 2.38% 4.03% >3 67.61% 65.5% 67.2% 67.5% 77.8% 75.8% % of relations found by each schema representation and model In average 97.65% of the relations with a path length greater than 3 involve a common subsumer Characterising the Emergent Semantics in Twitter Lists 14
  • 15. Experiment: Data Analysis Depth (LCS) and path length as indicators of specificity Depth of the least common subsumer Relations in WordNet Length of the path setting up the relation Relations with dept(LCS) >=5 Characterising the Emergent Semantics in Twitter Lists 15
  • 16. Experiment: Findings Summary • Similarity models based on members • produce the results that are most correlated to the results of similarity measures based on WordNet • find more synonyms and direct relations is-a when compared to the other models (path length). • The majority of relations found by any model have a path length >= 3 and involve a common subsumer. • Depth of LCS • VSM based on subscribers produces the highest number of specific relations (depth of LCS >= 5 or 6). • Similarity models based on curators produce a lower number of relations. Characterising the Emergent Semantics in Twitter Lists 16
  • 17. Experiment: Execution Elicit related keywords from Twitter lists Pairs of Schema Representation related Model to identify similar of keywords keywords keywords per Based on curators Schema Vector Space Model Rep. and Data Model Based on subscribers Latent Dirichlet set Allocation Based on members Each Characterise the semantics of the keyword Ontological relations with the 5 Relations SPARQL queries over general KBs Most between published as Linked Data related keywords DBpedia, OpenCyc, and UMBEL Characterising the Emergent Semantics in Twitter Lists 17
  • 18. Experiment • We anchor 63.77% of the keywords extracted from Twitter Lists to DBPedia resources Characterising the Emergent Semantics in Twitter Lists 18
  • 19. Experiment Vector-space model based on members (direct relations) Relation type Example of keywords Broader Term 26% life-science biotech subClassOf 26% writers authors developer 11% google google_apps genre 11% funland comedy largest city 6% houston texas Others 20% - - Vector-space model based on subscribers (relations of length 3) Linked data pattern (54.73%): x -> object <-y Relations object Keywords type type 67.35% company nokia intel subClassOf subClassOf 30.61% activities philanthropy fundraising Linked data pattern (43.49%): x <-object->y Relations object Keywords genre genre 12.43% Aesthetica theater film occupation genre 10.27% Adam Maxwell fiction writer occupation occupation 8.11% Alina Tugend poet writer product product 7.57% ChenOne clothes fashion industry product 9.73% UserLand Softw. blogs internet known for occupation 5.41% Adeline Yen Mah author writing known for known for 3.78% Rebecca Watson skeptics atheist main interest main interest 3.24% Aristotle politics government Characterising the Emergent Semantics in Twitter Lists 19
  • 20. Conclusions • Different models to elicit related keywords from Twitter lists. • Curators, Subscribers and members - VSM and LDA • Characterise the semantics of relations: WordNet-based similarity measures and SPARQL queries over linked data sets Characterising the Emergent Semantics in Twitter Lists 20
  • 21. Conclusions • Vector-space and LDA models based on members produce the most correlated results to those of WordNet-based metrics. • Shortest JC distance and highest WP similarities • According to the path length in WordNet • Models based on members produce more synonyms and direct is-a • Most of the relations have path length ≥ 3 and have a common subsumer • Depth of LCS • Vector-space model based on subscribers finds highest number of relations (depth LCS ≥ 5 and 4 ≤ path length ≤ 0) • We confirm these results according to linked data sets Characterising the Emergent Semantics in Twitter Lists 21
  • 22. Characterising the Emergent Semantics in Twitter Lists Andrés García-Silva †, Jeon-Hyung Kang*, Kristina Lerman*, Oscar Corcho † † {hgarcia, ocorcho}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid, Spain *{jeonhyuk,lerman}@isi.edu Information Sciences Institute, University of Southern California, USA