SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
An Integrated Approach
       to Discover Tag Semantics
            SAC 2011, Web Technologies Track, March 24th 2011



   Antonina Dattolo              Davide Eynard                  Luca Mazzola
    University of Udine       USI - University of Lugano    USI - University of Lugano
Department of Mathematics         ITC - Institute for           ITC - Institute for
  and Computer Science       Communication Technologies    Communication Technologies
 antonina.dattolo@uniud.it     davide.eynard@usi.ch           luca.mazzola@usi.ch
Talk outline

    Properties of tags
    Folksonomies as edge-colored multigraphs
    Framework design and implementation
    Tests and evaluations
    Conclusions




24/03/2011           An integrated approach to discover tag semantics   2/27
Tags properties

     Tags:
            are democratic and bottom-up (vs hierarchical)
            are inclusive and current
            follow desire lines
            are easy to use




24/03/2011                   An integrated approach to discover tag semantics   3/27
Tags cons

    Lexical ambiguities:
            Synonyms
                 game and juego, or web2.0 and web_2
            Homonyms
                 check as in chess and in “to check” (polysemous)
                 sf as scifi or san_francisco
            Basic level variations
                 dog and poodle
    Ambiguities due to different purposes:
                 blog to tag a blog software (i.e. Wordpress), a blog service, a blog
                   post, something to blog later, ...

24/03/2011                      An integrated approach to discover tag semantics     4/27
Advantages of disambiguation

    Synonym detection:
            increases recall
            allows for better recommendation systems
    Homonym detection:
            allows to find different contexts of use
            increases precision
    Basic level variations detection:
            identifies a hierarchy
            increases recall (i.e. automatically searching for subclasses)
            provides a mean to browse search results
24/03/2011                   An integrated approach to discover tag semantics   5/27
Approaches to tag disambiguation

    Roughly two main families of approaches
            Theoretical ones, aiming at describing the system as a
             whole
            More practical, ad-hoc ones (often addressing one or few
             issues at a time)
    Our approach
            Main assumption: lexical ambiguities are not independent
             from each other
            Solution based on
                 a theoretical framework
                 a modular, extensible analysis tool

24/03/2011                      An integrated approach to discover tag semantics   6/27
Folksonomies as edge-colored
multigraphs
    Def.1: An edge-colored multigraph is a triple
             ECMG = (MG, C, c)
     where:
            MG = (V,E,f) is a multigraph
            C is a set of colors
            c : E→C is an assignment of colors to multigraph edges

    Def.2: A personomy related to user u is a non-directed
     edge-colored graph of color Cu:
             Pu = (T, R, E, Cu)

24/03/2011                    An integrated approach to discover tag semantics   7/27
Folksonomies as edge-colored
multigraphs
    Def.3: Given a set of users U and the family of
     personomies Pu (u ∈U), a folksonomy is defined as



     that is, an edge-colored multigraph where:
            vertices are tags + resources
            edges are tag assignments made on
             resources by each user
            every color is a different user


24/03/2011                   An integrated approach to discover tag semantics   8/27
First simplification step

    As we are only interested in relationships between
     tags, we need to perform two simplification steps on
     the edge-colored multigraph
    Step 1: colored edges are collapsed and substituted
     by weighted edges
        potentially, every color (user) might be
         assigned a different weight wu
        the weight w of the collapsed edge is the sum
         of all the wu linking the same two vertices
        when wu= 1 for each user, w = times a tag is
         used on a resource

24/03/2011                    An integrated approach to discover tag semantics   9/27
Second simplification step
    Step 2:
            a link is created between ta and tb if they
             share a resource
            resource nodes are dropped
    Edges' weights can be calculated
     in different ways:
            number of triples (ti ,r,tj ) where (ti ,r), (r,tj ) ∈E
             => co-occurrence
            normalized co-occurrence (i.e. Using the
             Jaccard index)
            distributional measures
            custom metrics (i.e. sum of products of
             connecting edges' weights)                                =>
24/03/2011                         An integrated approach to discover tag semantics   10/27
The whole process at a glance
 1                              2




 3                              4




                                11/27
System architecture

    Basic assumption:
            ambiguous tags should be related (either by cooccurrence or
             by presence in the same context)
    Three main components:
            tag analysis tool
            disambiguation tool
            front-end




24/03/2011                   An integrated approach to discover tag semantics   12/27
Synonyms detection / 1

    Natural text 

            Two words are considered synonyms if they can be replaced
             by each other without affecting the meaning of a sentence
    
 vs. Tag-based systems
            It is possible to swap two tags within a “sentence” (i.e. a
             tagging action) without affecting its meaning when we have:
                 variations of a word (i.e. blog, blogs, blogging)
                 translations into other languages (i.e. game, juego, spiel)
                 terms joined by non-alphabetic characters (i.e. web2, web_2)
            No “one size fits all” solution


24/03/2011                      An integrated approach to discover tag semantics   13/27
Synonyms detection / 2

    A modular solution for synonyms detection:
            different heuristics, each one returning the likelihood of tags to be
             synonyms
            results are weighted to obtain an overall likelihood


    Suggested heuristics:
            an edit distance such as Levenshtein's (normalized to account for short
             strings);
            synonym search in WordNet (good precision, low recall);
            online translation bases (top-down, such as dictionaries, or bottom-up,
             collaboratively grown vocabs like Wikipedia)
            stemming with NLP algorithms

24/03/2011                      An integrated approach to discover tag semantics     14/27
Homonyms detection

    Check if the tag t has been used in different contexts
            cluster tags related to t in groups
            the most frequent tags in these groups are used to name
             and disambiguate the contexts
    Clustering algorithm:
            an overlapping one, also used in social network analysis*
            a cluster is a subgraph G identified by the maximization of a
             fitness property
                                                                                            s = strength of internal (in)
                                                                                                or external (out) links
                                                                                            α = tweaking parameter


     * A. Lancichinetti et al. : “Detecting the overlapping and hierarchical community structure of complex networks”


24/03/2011                                An integrated approach to discover tag semantics                                  15/27
Hierarchy detection

    Hierarchy is a specific case of basic level variation
    A possible approach: Hearst patterns on the Web,
     such as:
            C1 (and|or) other C2 (i.e. “poodles and other dogs”)
            C1 such as I (i.e. “cities such as San Francisco”)
             (note: Ci are concepts, I is a concept instance)

    Search for the patterns, and use the number of results
     as an indicator for their strength
    Pros: the Web is as up-to-date as folksonomies
 
     Cons: O(n2) complexity, not really scalable
24/03/2011                      An integrated approach to discover tag semantics   16/27
Prototype development

       Dataset
               Data from more than 30K users of
                http://www.delicious.com
               Ignored the system:unfiled tag
               For the calculation of Tag Context Similarity,
                we only took into account the top 10K tags



       Prototype
               Tag analysis tool, calculating CO, NCO, and TCS (takes time, runs as a
                batch job and saves matrices in the DB)
               Disambiguation with homonyms plugin, implementing the overlapping
                clustering algorithm, and Wikipedia synonym discovery
               Front-end is currently a command-line application
24/03/2011                         An integrated approach to discover tag semantics   17/27
Experimental results / 1

    System tested against three different sets of tags:
            Top 20 tags in delicious
            A group of tags known to be ambiguous (apple, cambridge, sf,
             stream, turkey, tube)
            A set of subjective tags, chosen between the most popular ones in
             delicious (cool, fun, funny, interesting, toread)
    For each tag:
            we calculated the top n (with n = 50) related tags with the three metrics
             (CO, NCO, TCS)
            we performed synonym and homonym analyses




24/03/2011                      An integrated approach to discover tag semantics    18/27
Experimental results / 2
    Tag Context Similarity already tends to provide
     synonyms as top-related tags
            i.e. toread related: read, read_later, to_read, etc.
    Analyzing a less popular synonym (@readit):
            9 out of the top 10 (and 17 out of the top 50) related tags are synonyms
            reason: as less popular tags are less spread across contexts, they tend
             to have a higher similarity with other less popular synonyms
    Wikipedia results:
            analyzing the 31 tags in our three sets, we got 215 new words;
            of those 215, only 83 are valid tags in our delicious dataset;
            of those 83, only 20 belong to the 10K most-used tags;
            only 2 belong to the set of the top-related tags of their English synonym.
24/03/2011                      An integrated approach to discover tag semantics    19/27
Experimental results / 3

    Homonyms detection:
            we tested the algorithm with
             different values of α
            meaningful results in a relatively
             short time (but we are working
             only on the top related tags...)
            limit: the graphs of top related
             tags differ in connectivity, so
             there is not a value of α that is
             good for all of them (αsf=1.4,
             αstream=1.74).




24/03/2011                       An integrated approach to discover tag semantics   20/27
Conclusions

    Model
            Flexible enough to support other kind of metrics
            Multigraph can be simplified in other ways
            User-related weights still have to be taken into account
    Tool
            Still in prototypal phase, but already provided useful results
             and allowed us to compare
                 metrics: different metrics provide very different results, that might be
                   more or less useful according to the user needs
                 tag behaviors: different depending on their popularity and the use
                    that people do of them


24/03/2011                       An integrated approach to discover tag semantics     21/27
Conclusions

    Ongoing work
            Clustering evaluation metrics to find best α
            Applications (i.e. for tag grouping and visualization*)
            User- and resource-specific projections**
    Future work
            Development of other plugins and front-end
            Play with user-related weights to focus on specific
             communities / filter spam

     * Mazzola, Eynard, Mazza: ”GVIS: a framework for graphical mashups of heterogeneous sources to support data
     interpretation”.

     ** Dattolo, Ferrara, Tasso: "On social semantic relations for recommending tags and resources using folksonomies"

24/03/2011                               An integrated approach to discover tag semantics                                22/27
Thank you!



             Thanks for your attention!

                        Questions?




24/03/2011       An integrated approach to discover tag semantics   23/27
toread top 20 related tags




24/03/2011   An integrated approach to discover tag semantics   24/27
@readit top 20 related tags




24/03/2011   An integrated approach to discover tag semantics   25/27
sf top 20 related tags




24/03/2011   An integrated approach to discover tag semantics   26/27
stream top 20 related tags




24/03/2011   An integrated approach to discover tag semantics   27/27

Weitere Àhnliche Inhalte

Andere mochten auch

Alfa y omega
Alfa y omegaAlfa y omega
Alfa y omegatortugo01
 
AMOP MONO K
AMOP MONO K AMOP MONO K
AMOP MONO K joanafitas
 
V7 sneakpeak slideshow
V7 sneakpeak slideshowV7 sneakpeak slideshow
V7 sneakpeak slideshowmhymas
 
Historia de la ContaminaciĂłn
Historia de la ContaminaciĂłnHistoria de la ContaminaciĂłn
Historia de la ContaminaciĂłn51-20-21
 
Estructras TECNOJAVI
Estructras TECNOJAVIEstructras TECNOJAVI
Estructras TECNOJAVITecnojavi
 
Teorias de Aprendizaje
Teorias de AprendizajeTeorias de Aprendizaje
Teorias de AprendizajeAndreaGlez
 
ExposiciĂłn las mercedes
ExposiciĂłn las mercedesExposiciĂłn las mercedes
ExposiciĂłn las mercedessobreruedasclasicas
 
How To Speak To Them On Their Wavelength
How To Speak To Them On Their WavelengthHow To Speak To Them On Their Wavelength
How To Speak To Them On Their WavelengthGeorge Hutton
 
CĂłmo hacer un libro usando wikipedia 1
CĂłmo hacer un libro usando wikipedia 1CĂłmo hacer un libro usando wikipedia 1
CĂłmo hacer un libro usando wikipedia 1OEI CapacitaciĂłn
 
What the matrix can tell us about the social network.
What the matrix can tell us about the social network.What the matrix can tell us about the social network.
What the matrix can tell us about the social network.David Gleich
 
MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2
MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2
MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2diplomados2
 
Inteligencias multiples
Inteligencias multiplesInteligencias multiples
Inteligencias multiplesmarelyne_s
 
Informe empleo2013 enviroo
Informe empleo2013 envirooInforme empleo2013 enviroo
Informe empleo2013 envirooenviroo
 
Light to the Nations - Week 13
Light to the Nations - Week 13Light to the Nations - Week 13
Light to the Nations - Week 13PDEI
 
ComputaciĂłn
ComputaciĂłnComputaciĂłn
ComputaciĂłnCecilia Lobos
 

Andere mochten auch (19)

Alfa y omega
Alfa y omegaAlfa y omega
Alfa y omega
 
web 1.0 vs web 2.0
web 1.0 vs web 2.0web 1.0 vs web 2.0
web 1.0 vs web 2.0
 
Sd alarma digital
Sd alarma digitalSd alarma digital
Sd alarma digital
 
El rei esmorzar
El rei esmorzarEl rei esmorzar
El rei esmorzar
 
Plan de seguridad
Plan de seguridad Plan de seguridad
Plan de seguridad
 
AMOP MONO K
AMOP MONO K AMOP MONO K
AMOP MONO K
 
V7 sneakpeak slideshow
V7 sneakpeak slideshowV7 sneakpeak slideshow
V7 sneakpeak slideshow
 
Historia de la ContaminaciĂłn
Historia de la ContaminaciĂłnHistoria de la ContaminaciĂłn
Historia de la ContaminaciĂłn
 
Estructras TECNOJAVI
Estructras TECNOJAVIEstructras TECNOJAVI
Estructras TECNOJAVI
 
Teorias de Aprendizaje
Teorias de AprendizajeTeorias de Aprendizaje
Teorias de Aprendizaje
 
ExposiciĂłn las mercedes
ExposiciĂłn las mercedesExposiciĂłn las mercedes
ExposiciĂłn las mercedes
 
How To Speak To Them On Their Wavelength
How To Speak To Them On Their WavelengthHow To Speak To Them On Their Wavelength
How To Speak To Them On Their Wavelength
 
CĂłmo hacer un libro usando wikipedia 1
CĂłmo hacer un libro usando wikipedia 1CĂłmo hacer un libro usando wikipedia 1
CĂłmo hacer un libro usando wikipedia 1
 
What the matrix can tell us about the social network.
What the matrix can tell us about the social network.What the matrix can tell us about the social network.
What the matrix can tell us about the social network.
 
MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2
MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2
MĂłdulo negociaciĂłn y motivaciĂłn ppp clase 2
 
Inteligencias multiples
Inteligencias multiplesInteligencias multiples
Inteligencias multiples
 
Informe empleo2013 enviroo
Informe empleo2013 envirooInforme empleo2013 enviroo
Informe empleo2013 enviroo
 
Light to the Nations - Week 13
Light to the Nations - Week 13Light to the Nations - Week 13
Light to the Nations - Week 13
 
ComputaciĂłn
ComputaciĂłnComputaciĂłn
ComputaciĂłn
 

Ähnlich wie An integrated approach to discover tag semantics

The Grammar of User Experience
The Grammar of User ExperienceThe Grammar of User Experience
The Grammar of User ExperienceStefano Bussolon
 
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and NotationsMathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and NotationsChristoph Lange
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsMarkus Strohmaier
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityInovex GmbH
 
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksRostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksWitology
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnClaudiu Mihăilă
 
Experimental categorization and deep visualization
 Experimental categorization and deep visualization Experimental categorization and deep visualization
Experimental categorization and deep visualizationEverardo Reyes-GarcĂ­a
 
Exploiting Semantic Web Techniques For Representing And Utilising
Exploiting Semantic Web Techniques For Representing And UtilisingExploiting Semantic Web Techniques For Representing And Utilising
Exploiting Semantic Web Techniques For Representing And UtilisingOwen Sacco
 
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Antonella Dattolo
 
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Antonella Dattolo
 
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Antonella Dattolo
 
Literature review on 30 visualization tools
Literature review on 30 visualization toolsLiterature review on 30 visualization tools
Literature review on 30 visualization toolsIvana
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summaryJaved Riza
 

Ähnlich wie An integrated approach to discover tag semantics (20)

The Grammar of User Experience
The Grammar of User ExperienceThe Grammar of User Experience
The Grammar of User Experience
 
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and NotationsMathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging Systems
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
 
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic NetworksRostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition Yarn
 
Experimental categorization and deep visualization
 Experimental categorization and deep visualization Experimental categorization and deep visualization
Experimental categorization and deep visualization
 
Exploiting Semantic Web Techniques For Representing And Utilising
Exploiting Semantic Web Techniques For Representing And UtilisingExploiting Semantic Web Techniques For Representing And Utilising
Exploiting Semantic Web Techniques For Representing And Utilising
 
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
 
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
 
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
Visualizing and Managing Folksonomies, SASWeb 2011 workshop, at UMAP 2011
 
Literature review on 30 visualization tools
Literature review on 30 visualization toolsLiterature review on 30 visualization tools
Literature review on 30 visualization tools
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social Network
 
Improving Tag Clouds
Improving Tag CloudsImproving Tag Clouds
Improving Tag Clouds
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 

Mehr von Davide Eynard

Building Compatible Bases on Graphs, Images, and Manifolds
Building Compatible Bases on Graphs, Images, and ManifoldsBuilding Compatible Bases on Graphs, Images, and Manifolds
Building Compatible Bases on Graphs, Images, and ManifoldsDavide Eynard
 
Laplacian Colormaps: a framework for structure-preserving color transformations
Laplacian Colormaps: a framework for structure-preserving color transformationsLaplacian Colormaps: a framework for structure-preserving color transformations
Laplacian Colormaps: a framework for structure-preserving color transformationsDavide Eynard
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral ClusteringDavide Eynard
 
SAnno: a unifying framework for semantic annotation
SAnno: a unifying framework for semantic annotationSAnno: a unifying framework for semantic annotation
SAnno: a unifying framework for semantic annotationDavide Eynard
 
A Virtuous Cycle of Semantics and Participation
A Virtuous Cycle of Semantics and ParticipationA Virtuous Cycle of Semantics and Participation
A Virtuous Cycle of Semantics and ParticipationDavide Eynard
 
ReSearch - Searching for Researchers
ReSearch - Searching for ResearchersReSearch - Searching for Researchers
ReSearch - Searching for ResearchersDavide Eynard
 
PhDLinux: A Linux Crash Course for PhD Students
PhDLinux: A Linux Crash Course for PhD StudentsPhDLinux: A Linux Crash Course for PhD Students
PhDLinux: A Linux Crash Course for PhD StudentsDavide Eynard
 
Exploiting user gratification for collaborative semantic annotation
Exploiting user gratification for collaborative semantic annotationExploiting user gratification for collaborative semantic annotation
Exploiting user gratification for collaborative semantic annotationDavide Eynard
 
Performance Attacks on Intrusion Detection Systems
Performance Attacks on Intrusion Detection SystemsPerformance Attacks on Intrusion Detection Systems
Performance Attacks on Intrusion Detection SystemsDavide Eynard
 
Cracking Codes With Genetic Algorithms
Cracking Codes With Genetic AlgorithmsCracking Codes With Genetic Algorithms
Cracking Codes With Genetic AlgorithmsDavide Eynard
 
Rewire the Net
Rewire the NetRewire the Net
Rewire the NetDavide Eynard
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonDavide Eynard
 
Unambiguous Recognizable Two-dimensional Languages
Unambiguous Recognizable Two-dimensional LanguagesUnambiguous Recognizable Two-dimensional Languages
Unambiguous Recognizable Two-dimensional LanguagesDavide Eynard
 
Research on collaborative information sharing systems
Research on collaborative information sharing systemsResearch on collaborative information sharing systems
Research on collaborative information sharing systemsDavide Eynard
 

Mehr von Davide Eynard (15)

Building Compatible Bases on Graphs, Images, and Manifolds
Building Compatible Bases on Graphs, Images, and ManifoldsBuilding Compatible Bases on Graphs, Images, and Manifolds
Building Compatible Bases on Graphs, Images, and Manifolds
 
Laplacian Colormaps: a framework for structure-preserving color transformations
Laplacian Colormaps: a framework for structure-preserving color transformationsLaplacian Colormaps: a framework for structure-preserving color transformations
Laplacian Colormaps: a framework for structure-preserving color transformations
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
 
SAnno: a unifying framework for semantic annotation
SAnno: a unifying framework for semantic annotationSAnno: a unifying framework for semantic annotation
SAnno: a unifying framework for semantic annotation
 
A Virtuous Cycle of Semantics and Participation
A Virtuous Cycle of Semantics and ParticipationA Virtuous Cycle of Semantics and Participation
A Virtuous Cycle of Semantics and Participation
 
Talk Hpl
Talk HplTalk Hpl
Talk Hpl
 
ReSearch - Searching for Researchers
ReSearch - Searching for ResearchersReSearch - Searching for Researchers
ReSearch - Searching for Researchers
 
PhDLinux: A Linux Crash Course for PhD Students
PhDLinux: A Linux Crash Course for PhD StudentsPhDLinux: A Linux Crash Course for PhD Students
PhDLinux: A Linux Crash Course for PhD Students
 
Exploiting user gratification for collaborative semantic annotation
Exploiting user gratification for collaborative semantic annotationExploiting user gratification for collaborative semantic annotation
Exploiting user gratification for collaborative semantic annotation
 
Performance Attacks on Intrusion Detection Systems
Performance Attacks on Intrusion Detection SystemsPerformance Attacks on Intrusion Detection Systems
Performance Attacks on Intrusion Detection Systems
 
Cracking Codes With Genetic Algorithms
Cracking Codes With Genetic AlgorithmsCracking Codes With Genetic Algorithms
Cracking Codes With Genetic Algorithms
 
Rewire the Net
Rewire the NetRewire the Net
Rewire the Net
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
 
Unambiguous Recognizable Two-dimensional Languages
Unambiguous Recognizable Two-dimensional LanguagesUnambiguous Recognizable Two-dimensional Languages
Unambiguous Recognizable Two-dimensional Languages
 
Research on collaborative information sharing systems
Research on collaborative information sharing systemsResearch on collaborative information sharing systems
Research on collaborative information sharing systems
 

KĂŒrzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

KĂŒrzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

An integrated approach to discover tag semantics

  • 1. An Integrated Approach to Discover Tag Semantics SAC 2011, Web Technologies Track, March 24th 2011 Antonina Dattolo Davide Eynard Luca Mazzola University of Udine USI - University of Lugano USI - University of Lugano Department of Mathematics ITC - Institute for ITC - Institute for and Computer Science Communication Technologies Communication Technologies antonina.dattolo@uniud.it davide.eynard@usi.ch luca.mazzola@usi.ch
  • 2. Talk outline  Properties of tags  Folksonomies as edge-colored multigraphs  Framework design and implementation  Tests and evaluations  Conclusions 24/03/2011 An integrated approach to discover tag semantics 2/27
  • 3. Tags properties Tags:  are democratic and bottom-up (vs hierarchical)  are inclusive and current  follow desire lines  are easy to use 24/03/2011 An integrated approach to discover tag semantics 3/27
  • 4. Tags cons  Lexical ambiguities:  Synonyms  game and juego, or web2.0 and web_2  Homonyms  check as in chess and in “to check” (polysemous)  sf as scifi or san_francisco  Basic level variations  dog and poodle  Ambiguities due to different purposes:  blog to tag a blog software (i.e. Wordpress), a blog service, a blog post, something to blog later, ... 24/03/2011 An integrated approach to discover tag semantics 4/27
  • 5. Advantages of disambiguation  Synonym detection:  increases recall  allows for better recommendation systems  Homonym detection:  allows to find different contexts of use  increases precision  Basic level variations detection:  identifies a hierarchy  increases recall (i.e. automatically searching for subclasses)  provides a mean to browse search results 24/03/2011 An integrated approach to discover tag semantics 5/27
  • 6. Approaches to tag disambiguation  Roughly two main families of approaches  Theoretical ones, aiming at describing the system as a whole  More practical, ad-hoc ones (often addressing one or few issues at a time)  Our approach  Main assumption: lexical ambiguities are not independent from each other  Solution based on  a theoretical framework  a modular, extensible analysis tool 24/03/2011 An integrated approach to discover tag semantics 6/27
  • 7. Folksonomies as edge-colored multigraphs  Def.1: An edge-colored multigraph is a triple ECMG = (MG, C, c) where:  MG = (V,E,f) is a multigraph  C is a set of colors  c : E→C is an assignment of colors to multigraph edges  Def.2: A personomy related to user u is a non-directed edge-colored graph of color Cu: Pu = (T, R, E, Cu) 24/03/2011 An integrated approach to discover tag semantics 7/27
  • 8. Folksonomies as edge-colored multigraphs  Def.3: Given a set of users U and the family of personomies Pu (u ∈U), a folksonomy is defined as that is, an edge-colored multigraph where:  vertices are tags + resources  edges are tag assignments made on resources by each user  every color is a different user 24/03/2011 An integrated approach to discover tag semantics 8/27
  • 9. First simplification step  As we are only interested in relationships between tags, we need to perform two simplification steps on the edge-colored multigraph  Step 1: colored edges are collapsed and substituted by weighted edges  potentially, every color (user) might be assigned a different weight wu  the weight w of the collapsed edge is the sum of all the wu linking the same two vertices  when wu= 1 for each user, w = times a tag is used on a resource 24/03/2011 An integrated approach to discover tag semantics 9/27
  • 10. Second simplification step  Step 2:  a link is created between ta and tb if they share a resource  resource nodes are dropped  Edges' weights can be calculated in different ways:  number of triples (ti ,r,tj ) where (ti ,r), (r,tj ) ∈E => co-occurrence  normalized co-occurrence (i.e. Using the Jaccard index)  distributional measures  custom metrics (i.e. sum of products of connecting edges' weights) => 24/03/2011 An integrated approach to discover tag semantics 10/27
  • 11. The whole process at a glance 1 2 3 4 11/27
  • 12. System architecture  Basic assumption:  ambiguous tags should be related (either by cooccurrence or by presence in the same context)  Three main components:  tag analysis tool  disambiguation tool  front-end 24/03/2011 An integrated approach to discover tag semantics 12/27
  • 13. Synonyms detection / 1  Natural text 
  Two words are considered synonyms if they can be replaced by each other without affecting the meaning of a sentence  
 vs. Tag-based systems  It is possible to swap two tags within a “sentence” (i.e. a tagging action) without affecting its meaning when we have:  variations of a word (i.e. blog, blogs, blogging)  translations into other languages (i.e. game, juego, spiel)  terms joined by non-alphabetic characters (i.e. web2, web_2)  No “one size fits all” solution 24/03/2011 An integrated approach to discover tag semantics 13/27
  • 14. Synonyms detection / 2  A modular solution for synonyms detection:  different heuristics, each one returning the likelihood of tags to be synonyms  results are weighted to obtain an overall likelihood  Suggested heuristics:  an edit distance such as Levenshtein's (normalized to account for short strings);  synonym search in WordNet (good precision, low recall);  online translation bases (top-down, such as dictionaries, or bottom-up, collaboratively grown vocabs like Wikipedia)  stemming with NLP algorithms 24/03/2011 An integrated approach to discover tag semantics 14/27
  • 15. Homonyms detection  Check if the tag t has been used in different contexts  cluster tags related to t in groups  the most frequent tags in these groups are used to name and disambiguate the contexts  Clustering algorithm:  an overlapping one, also used in social network analysis*  a cluster is a subgraph G identified by the maximization of a fitness property s = strength of internal (in) or external (out) links α = tweaking parameter * A. Lancichinetti et al. : “Detecting the overlapping and hierarchical community structure of complex networks” 24/03/2011 An integrated approach to discover tag semantics 15/27
  • 16. Hierarchy detection  Hierarchy is a specific case of basic level variation  A possible approach: Hearst patterns on the Web, such as:  C1 (and|or) other C2 (i.e. “poodles and other dogs”)  C1 such as I (i.e. “cities such as San Francisco”) (note: Ci are concepts, I is a concept instance)  Search for the patterns, and use the number of results as an indicator for their strength  Pros: the Web is as up-to-date as folksonomies  Cons: O(n2) complexity, not really scalable 24/03/2011 An integrated approach to discover tag semantics 16/27
  • 17. Prototype development  Dataset  Data from more than 30K users of http://www.delicious.com  Ignored the system:unfiled tag  For the calculation of Tag Context Similarity, we only took into account the top 10K tags  Prototype  Tag analysis tool, calculating CO, NCO, and TCS (takes time, runs as a batch job and saves matrices in the DB)  Disambiguation with homonyms plugin, implementing the overlapping clustering algorithm, and Wikipedia synonym discovery  Front-end is currently a command-line application 24/03/2011 An integrated approach to discover tag semantics 17/27
  • 18. Experimental results / 1  System tested against three different sets of tags:  Top 20 tags in delicious  A group of tags known to be ambiguous (apple, cambridge, sf, stream, turkey, tube)  A set of subjective tags, chosen between the most popular ones in delicious (cool, fun, funny, interesting, toread)  For each tag:  we calculated the top n (with n = 50) related tags with the three metrics (CO, NCO, TCS)  we performed synonym and homonym analyses 24/03/2011 An integrated approach to discover tag semantics 18/27
  • 19. Experimental results / 2  Tag Context Similarity already tends to provide synonyms as top-related tags  i.e. toread related: read, read_later, to_read, etc.  Analyzing a less popular synonym (@readit):  9 out of the top 10 (and 17 out of the top 50) related tags are synonyms  reason: as less popular tags are less spread across contexts, they tend to have a higher similarity with other less popular synonyms  Wikipedia results:  analyzing the 31 tags in our three sets, we got 215 new words;  of those 215, only 83 are valid tags in our delicious dataset;  of those 83, only 20 belong to the 10K most-used tags;  only 2 belong to the set of the top-related tags of their English synonym. 24/03/2011 An integrated approach to discover tag semantics 19/27
  • 20. Experimental results / 3  Homonyms detection:  we tested the algorithm with different values of α  meaningful results in a relatively short time (but we are working only on the top related tags...)  limit: the graphs of top related tags differ in connectivity, so there is not a value of α that is good for all of them (αsf=1.4, αstream=1.74). 24/03/2011 An integrated approach to discover tag semantics 20/27
  • 21. Conclusions  Model  Flexible enough to support other kind of metrics  Multigraph can be simplified in other ways  User-related weights still have to be taken into account  Tool  Still in prototypal phase, but already provided useful results and allowed us to compare  metrics: different metrics provide very different results, that might be more or less useful according to the user needs  tag behaviors: different depending on their popularity and the use that people do of them 24/03/2011 An integrated approach to discover tag semantics 21/27
  • 22. Conclusions  Ongoing work  Clustering evaluation metrics to find best α  Applications (i.e. for tag grouping and visualization*)  User- and resource-specific projections**  Future work  Development of other plugins and front-end  Play with user-related weights to focus on specific communities / filter spam * Mazzola, Eynard, Mazza: ”GVIS: a framework for graphical mashups of heterogeneous sources to support data interpretation”. ** Dattolo, Ferrara, Tasso: "On social semantic relations for recommending tags and resources using folksonomies" 24/03/2011 An integrated approach to discover tag semantics 22/27
  • 23. Thank you! Thanks for your attention! Questions? 24/03/2011 An integrated approach to discover tag semantics 23/27
  • 24. toread top 20 related tags 24/03/2011 An integrated approach to discover tag semantics 24/27
  • 25. @readit top 20 related tags 24/03/2011 An integrated approach to discover tag semantics 25/27
  • 26. sf top 20 related tags 24/03/2011 An integrated approach to discover tag semantics 26/27
  • 27. stream top 20 related tags 24/03/2011 An integrated approach to discover tag semantics 27/27