SlideShare a Scribd company logo
1 of 58
Download to read offline
Knowledge Management Institute




                    Extracting Semantics from Crowds

                     International Summer School on Semantic Computing (SSSC 2011)
                                                                             (SSSC’2011)
                                   August 8 - 12, 2011, Berkeley, California


                                                   Markus Strohmaier

                                                     Assistant Professor
                                            Graz University of Technology, Austria

                                                        Visiting Scientist
                                                      (XEROX) PARC, USA




                with collaborators D. Helic, C. Körner, J. Pöschko, R. Kern, C. Trattner, C. Wagner (Graz University of
                Technology, Austria), D. Benz, G. Stumme (U. of Kassel, Germany), A. Hotho (U. of Würzburg, Germany), S.
                             Hellmann, J. Lehmann, C. Stadler, J. Unbehauen (U. of Leipzig, Germany)
 Markus Strohmaier                                             2011
                                                                                                                           1
Knowledge Management Institute



                                  Egypt 2011
                                 Twitter Trends      place
                                                             event




                                            person




 Markus Strohmaier                   2011
                                                                2
Knowledge Management Institute




              Semantic Structures: The DMOZ Project




 Markus Strohmaier               2011
                                                      3
Knowledge Management Institute



                            Classification Systems
                     in Information and Library Sciences

                                 Usually produced and maintained by few
                                    (e.g.
                                    (e g dozens of) domain experts
                                                           experts.

                                 but: used by many (potentially millions).




                             Can a very large group (a crowd) of users
                             contribute to ontology engineering efforts?




 Markus Strohmaier                                 2011
                                                                             4
Knowledge Management Institute




                                           Objectives
            Provide (some) answers to the following questions:
            P   id (     )         t th f ll i          ti

            •    What is the difference between extracting semantics from text
                 vs. extracting semantics from crowds?
            •    Why should we study crowds and crowd behavior from a
                 semantic computing perspective?
                        ti        ti          ti ?
            •    How can semantics be extracted from online crowd behavior,
                 such as
                  •    …from Social Labeling
                  •    …from Social Tagging
                  •    …from Soc a Navigation
                          o Social a ga o
            •    What are the implications for semantic computing research?



 Markus Strohmaier                              2011
                                                                                 5
Knowledge Management Institute




     Extracting Semantics from …

     Motivation

     Social Labeling
     • Hashtag Semantics
     Social Tagging              Extraction
     • Tag Relatedness
     • Tag Generality
     • Tag Hierarchies
     Social Navigation
     • Navigational Knowledge    Interaction
     Engineering
       g       g

 Markus Strohmaier                2011
                                               6
Knowledge Management Institute




                       Ontology engineering vs. learning
            • O t l
              Ontology engineering
                          i    i
                  •    Expert-driven, small-scale
                  •    Knowledge and concept identification
                  •    Preliminary informal representation
                  •    Formalization (RDF, OWL, etc)
                  •    Evaluation d Maintenance
                       E l ti and M i t
            • Ontology learning
                  •    Data-driven, large-scale
                       Data driven large scale
                  •    Source selection and data sampling
                  •    Data exploration and probing
                  •    Concept and link learning
                  •    Evaluation and Updating


 Markus Strohmaier                             2011
                                                              7
Knowledge Management Institute




                                  Distributional
                                  Distrib tional Semantics
                                                        [Hovy 2011]

            • I recent years, people in NLP and IR h
              In      t             l i          d     have started
                                                              t t d
               using a different representation for all kinds of
               problems: word distributions                  [Harris 1954]:                                    “words
                                                                                                  that occur in the
                                                                                                  same contexts tend
                                                                                                  to   have    similar
                                                                                                  meanings“




            • Statistical NLP operates at word level, frequency distributions are
               used as the (de facto) semantics of a word
                             (        )
            • Not actual semantics, but captures something of contents
            • Not compositional: how to ‘add’ two distributions?
            • No explicit theory of semantics
 Markus Strohmaier                                        2011
                  [Hovy 2011] Based on slides by Ed Hovy, USC, Reasoning with Text Workshop 2011 (link)              8
Knowledge Management Institute




                                             Why
                                             Wh apple is similiar to pear
                                                     [based on slides by Hovy 2011 / Pantel 02]




    Markus Strohmaier                                                                    2011
Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining   9
(KDD '02). ACM, New York, NY, USA, 613-619.
Knowledge Management Institute




                           Why
                           Wh apple is not similiar to toothbr sh
                                                       toothbrush
                                                     [based on slides by Hovy 2011 / Pantel 02]




    Markus Strohmaier                                                                    2011
Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining   10
(KDD '02). ACM, New York, NY, USA, 613-619.
Knowledge Management Institute




                                                     Distributional
                                                     Distrib tional Semantics
                                                     [based on slides by Hovy 2011 / Pantel 02]




    Markus Strohmaier                                                                    2011
Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining   11
(KDD '02). ACM, New York, NY, USA, 613-619.
Knowledge Management Institute




                                 Hearst Patterns
                                 H    t P tt
      M. Hearst, Automatic Acquisition of Hyponyms from Large Text Corpora 1992


       (S1) The bow lute, such as the
       Bambara ndang, is plucked and has
       an individual
       curved neck for each string.




 Markus Strohmaier                      2011
                                                                                  12
Knowledge Management Institute




                             Ontology Learning From Text




 Markus Strohmaier                      2011
                                                           13
Knowledge Management Institute




                                                   Evaluation




              A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES, Janez Brank, Marko Grobelnik, Dunja Mladenić,
              Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2005), 2005
 Markus Strohmaier                                        2011
                                                                                                          14
Knowledge Management Institute



             Limitations of Knowledge Extraction from Text
                     (or why we want to study semantics in crowds)


            • Delay
                  •Time it t k t write (hi h quality) t t on a t i
                   Ti      takes to it (high    lit ) text     topic
            • Population bias and dependence
                  • Authors´ demographics
                    Authors
                  • study language of target groups (e.g. software developers)
            • Style
                y
                  •Internal monologue vs. conversation with others
            • Topical bias
                  • General Fiction, Mystery, Science Fiction, Romance, Humor, ..
                  • Books, newspapers, magazines [Brown Corpus]


 Markus Strohmaier                                        2011
             Brown Corpus: http://icame.uib.no/brown/bcm.html                       15
Knowledge Management Institute



                      Extracting Semantics from Crowds
                               (i.e. Social Media)




     How can we tap into users interaction with data and with each
     other for the extraction of semantic structures?

 Markus Strohmaier                    2011
                                                                     16
Knowledge Management Institute




                                 Crowdsourcing


            “Crowdsourcing represents the act of a company or
            institution taking a function once performed by
            employees and outsourcing it to an undefined (and
            generally large) network of people in the form of an
            open call.”
              p

                                              [J. Howe 2006]



 Markus Strohmaier                   2011
                                                                   17
Knowledge Management Institute



                                 Crowdsourcing Semantics:
                                   An Experiment (2008)




                                                      LoC
                                                      L C posted 3000 photos
                                                                 t d   h t
                                                      to flickr:

                                                      24 hours after launch:
                                                      • over 4,000 unique tags
                                                      • about 19,000 tags added

                                                      Output? noisy,
                How can this data contribute to the
                                                      unstructured, unqualified,
                induction of semantic structures?     weak semantics

 Markus Strohmaier                         2011
                                                                              18
Knowledge Management Institute




                      Extracting Semantics from Crowds
       Vision:
       Vi i
         Utilizing online behavior of crowds for
         the construction, maintenance and enrichment
              construction
         of large-scale semantic structures.

       Mission:
       •    to model behavior of large numbers (millions) of users online
       •    to develop techniques and algorithms that acquire semantic structures
            from users‘ interaction with data and with each other
       •    to influence user behavior and emerging semantics
       •    to evaluate results


 Markus Strohmaier                      2011
                                                                               19
Knowledge Management Institute



                                 Activities of Crowds Online
     Users engage in…

     Labeling                              Tagging                    Navigation




                        Can we tap into the outcome of these activities
                        to extract and evaluate semantic structures?
 Markus Strohmaier                            2011
                                                                                   20
Knowledge Management Institute




     Extracting Semantics from Crowds

     Motivation

     Social Labeling
     • Hashtag Semantics
     Social Tagging
     • Tag Relatedness
     • Tag Generality
     • Tag Hierarchies
     Social Navigation
     • Navigational Knowledge
     Engineering
       g       g

 Markus Strohmaier               2011
                                        21
Knowledge Management Institute



                                        Social Labeling
                                        Example: Twitter
                                 users label short messages
                                  with concepts (hashtags)




            Whether hashtags behave as strong identifiers, and if they could be
              mapped to concept identifiers in the Semantic Web (URIs)?
                                                     [Laniado and Mika 2010]


Craig s
Craig‘s talk this morning: Background knowledge for URLs to explore other URLs
                                                    URLs,

 Markus Strohmaier                                2011
                                                                                  22
Markus Strohmaier
                                                                                                                                                                          Knowledge Management Institute




               with
and C.Wagner
               2011 J. Pöschko
                                                                                                                                                       ConceptNet
                                                                                                                                                                                                           developed by




                                   C. Wagner, M. Strohmaie The Wisdom in Tw
                                      W                    er,               weetonomies: Acquiri   ing Latent Conceptua Structures from So
                                                                                                                           al                  ocial Awareness Stre
                                                                                                                                                                  eams,
23




                                   Semmantic Search 2010 Workshop (SemSearch2
                                                         W                    2010), in conjunction with the 19th Internatio
                                                                                                    w                      onal World Wide Web Conference (WWW20
                                                                                                                                               C                  010),
                                   Rale
                                      eigh, NC, USA, April 26-30, ACM, 2010.
                                                           2
Markus Strohmaier
                                                                                                                                                                          Knowledge Management Institute




                                                                      such a network?


                                                         p
               with
                                          X is a subconcept of Y
                                     e.g. X is more general than Y,


and C.Wagner
               2011 J. Pöschko
                                                                                                                                                       ConceptNet




                                                                      Can we qualify semantic associations in
                                                                                                                ent
                                                                                                                Ev-
                                                                                                                    ent
                                                                                                                    Ev-
                                                                                                                                                                                                           developed by




                                   C. Wagner, M. Strohmaie The Wisdom in Tw
                                      W                    er,               weetonomies: Acquiri   ing Latent Conceptua Structures from So
                                                                                                                           al                  ocial Awareness Stre
                                                                                                                                                                  eams,
24




                                   Semmantic Search 2010 Workshop (SemSearch2
                                                         W                    2010), in conjunction with the 19th Internatio
                                                                                                    w                      onal World Wide Web Conference (WWW20
                                                                                                                                               C                  010),
                                   Rale
                                      eigh, NC, USA, April 26-30, ACM, 2010.
                                                           2
Knowledge Management Institute




     Extracting Semantics from Crowds

     Motivation

     Social Labeling
     • Hashtag Semantics
     Social Tagging
     • Tag Relatedness
     • Tag Generality
     • Tag Hierarchies
     Social Navigation
     • Navigational Knowledge
     Engineering
       g       g

 Markus Strohmaier               2011
                                        25
Knowledge Management Institute



                                       Social Tagging
                                     Example: Delicious
                                                            Users label and categorize
                    Resources                             resources with concepts (tags)

                                                                               Tags

              Users
              U

        is a tuple F:= (U, T, R, Y) where
        • th th
             the three di j i t fi it sets U T R correspond t
                       disjoint, finite t U, T,           d to                       user 1

               –    a set of persons or users u ∈ U
               –    a set of tags t ∈ T and
               –    a set of resources or objects r ∈ R                      tag 1            res. 1
        •      Y ⊆ U ×T ×R, called set of tag assignments


 Markus Strohmaier                                 2011
                                                                                                   26
Knowledge Management Institute




                                                        Tag Relatedness
                                                                                                                                                ResCont

                                                                                                                                           t1      t2            t3



                                                                                                                                                 r1         r2
              Different
         tag similiarity
             measures
  applied to del.icio.us




 Markus Strohmaier                                                          2011
 C. Cattuto, D. Benz, A. Hotho, G. Stumme, Semantic Grounding of Tag Relatedness in Social Bookmarking Systems, 7th International Semantic Web Conference
                                                                                                                                                                  27
 ISWC2008, LNCS 5318, 615-631 (2008).
Knowledge Management Institute



                                               Intuition:
                                     Latent Hierachical Structures
                                              high centrality:
                                               more abstract
                                                                                                                        [
                                                                                                                        [Strohmaier et al 2011]
                                                                                                                                              ]




                                                                                              [Heyman and Garcia-Molina 2006]


                                            low centrality:
                                             more specific


    Note: Betweeness centrality is usually difficult to calculate. Calculating all shortest
       p
       path is usually O(n2). For all nodes O(n3). We use an approximation.
                     y ( )                   ( )                 pp

 Markus Strohmaier                                                            2011
 M. Strohmaier, D. Helic, D. Benz. Evaluation of Folksonomy Induction Algorithms. Submitted to the ACM Transactions on Intelligent Systems and Technology, (2011). 28
Knowledge Management Institute




                                                       Tag Abstractness

                                                                                                           frequency




                                                                               Del.icio.us




 with D. Benz
 et al.

 Markus Strohmaier                                                           2011
 D. Benz, C. Körner, A. Hotho, G. Stumme, M. Strohmaier, One Tag to bind them all: Measuring Term Abstractness in Social Metadata, Submitted to ESWC 2011.   29
Knowledge Management Institute



                                                Emergent semantics
                                           through hierarchical clustering
                                                                                                                                             Approaches:
                                                                                                                                             • k-means
                                                                                                                                             • Affinity propagation
                                                                                                                                             • Tag generality

                                                                                                                                             Applications:
                                                                                                                                             • user navigation
                                                                                                                                                       i ti
                                                                                                                                             • ontology learning
                                                                                                                                             • disambiguation
                                                            on a delicious dataset
                                                                                                                                             Evaluation:
                                                                                                                                             • Semantic grounding to
                                                                                                                                             Golden Standards (e.g.
                                                                                                                                             WordNet)
                                                                                                                                             W dN t)
    Study of different tagging systems:
    BibSonomy, CiteULike ,Delicious, Flickr, LastFM                                                                      Semantic and Pragmatic Quality
                                                                                                                                varying greatly.
                                                                                                                                    i       tl
    Markus Strohmaier                                                                      2011
Anon Plangprasopchok, Kristina Lerman, and Lise Getoor (2010), Integrating Structured Metadata via Relational Affinity Propagation, in Proceedings of AAAI Workshop on Statistical
Relational AI
                                                                                                                                                                                     30
Knowledge Management Institute



               Semantic Validation of Folksonomies
     Semantic Networks                                     Semantic Groundingg
     (Emergent)                                            (Golden Standard)
     via e.g. hierarchical clustering                      WordNet: a lexical DB for English

                                                                                  computers

                                                Map-                                Synset Hierarchy
       Programming                              ping
                                                                    programming
                             distance d1 = 1                                                distance
                                                                                             d2 = 2
                          Python
                                                              Design
                                                                   g            languages
                                                                                   g g
                                                              patterns
 abs. difference |d1 - d2| a                   Semantic
 simple p y for the q
     p proxy           quality
                             y                 grounding                 j
                                                                         java               python
 of emergent semantics
 Markus Strohmaier                                  2011
                                         Based on slides by A. Hotho                                 31
Knowledge Management Institute



                                              Semantically Evaluating
                                                 Tag Hierarchies

          [Dellschaft, Staab 2006]
                                                                                              TP..Taxonomic    TR..Taxonomic     TF..Taxonomic TO…Taxonomic
                                                                                              precision        recall            F measure     overlap




                                                                                                         Different folksonomy algorithms
       Holds with other knowledge bases (Yago, Wikitaxonomy) and datasets (bibsonomy,
       citeulike, lastfm less so)
 [Dellschaft, Staab 2006] On How to Perform a Gold Standard Based Evaluation of Ontology Learning (2006), by Klaas Dellschaft , Steffen Staab, In Proceedings of the
 5th International Semantic Web Conference (ISWC’06)

 Markus Strohmaier                                                            2011
 M. Strohmaier, D. Helic, D. Benz. Evaluation of Folksonomy Induction Algorithms. Submitted to the ACM Transactions on Intelligent Systems and Technology, (2011). 32
Knowledge Management Institute




                             Limitations and Opportunities
            Pragmatics i fl
            P     ti influence semantics:
                                    ti

            1. Tagging behavior effects preciseness of emerging
               semantics [Körner et al. 2010]
            2. Social t
            2 S i l networks i fl
                           k influence resulting semantic networks
                                              lti       ti   t  k
               [Wang and Groth 2010]
            3.
            3 Stream types effect stream semantics
               [Wagner and Strohmaier 2010]




 Markus Strohmaier                       2011
                                                                     33
Knowledge Management Institute




                                 Different motivations for tagging

                             User A                                     User B




      This                                                             These
    seems to                                                          seem to
      be a                                                              be
    category!                                                        keywords!




 Markus Strohmaier                            2011
                                                                                 34
Knowledge Management Institute



                                                 How do Semantics Emerge?
                        Are they Influenced by the Pragmatics of Tagging?
                Different styles of tagging:
                Diff    t t l      ft   i
                To categorize or to describe resources [Strohmaier et al. 2010]
                                                                                 Categorizer (C)                                           Describer (D)

                       Goal                                                       later browsing                                             later search
                       Change of vocabulary                                            costly                                                   cheap
                       Size f
                       Si of vocabulary
                                  b l                                                 limited
                                                                                      li it d                                                   Open
                                                                                                                                                O
                       Tags                                                         subjective                                                objective


                       Example tag clouds




                                      Semantic Assumption:
                Categorizers produce more precise emergent semantics than Describers.

 Markus Strohmaier                                                                      2011
 [Strohmaier et al. 2010] M. Strohmaier, C. Koerner, R. Kern, Why do Users Tag? Detecting Users' Motivation for Tagging in Social Tagging Systems, 4th International AAAI Conference on
                                                                                                                                                                                          35
 Weblogs and Social Media (ICWSM2010), Washington, DC, USA, May 23-26, 2010.
Knowledge Management Institute


                                                                           Example 1
                                                                               p
                        Describers outperform categorizers on precision of
                                    emergent tag semantics
                          Categorizers perform                                                                                              Describers perform
                           worse than random                                                                                                better than random

                                            Random                                                                                Random
                                              users                                                                                 users




                (C)                                                                                     (D)



C. Körner, D. Benz, A. Hotho, M. Strohmaier, G. Stumme, Stop Thinking, Start Tagging: Tag Semantics Emerge From Collaborative Verbosity, 19th International World Wide Web Conference
 Markus Strohmaier
(WWW2010), Raleigh, NC, USA, April 26-30, ACM, 2010.
                                                                                        2011
                                                                                                                                                                                   36
Knowledge Management Institute


                                                                       Example 2
                                                                           p
          Categorizers outperform describers on social classification
                                  accuracy




                                                                                                  Describers perform worse
                                                                                                     than Categorizers




                                                                                                          Supervised multi-class SVM




 Markus Strohmaier                                                                 2011
 A. Zubiaga, C. Koerner, and M. Strohmaier. Tags vs. shelves: From social tagging to social classification. In 22nd ACM SIGWEB Conference on Hypertext and Hypermedia   37
 (HT 2011), Eindhoven, Netherlands, ACM, 2011.
Knowledge Management Institute



                              Example 3
              Type of stream aggregation effects semantics
         … the type of aggregation:
            Hashtag stream aggregations are more robust against external
            disturbances than user list streams




 Hashtag Stream                         User List Stream
 OR(RUa)S(Rh)                           OR(RUa)S(RUL)
 Markus Strohmaier                    2011
                                                                           38
Knowledge Management Institute




     Extracting Semantics from Crowds

     Motivation

     Social Labeling
     • Hashtag Semantics
     Social Tagging
     • Tag Relatedness
     • Tag Generality
     • Tag Hierarchies
     Social Navigation
     • Navigational Knowledge
     Engineering
       g       g

 Markus Strohmaier               2011
                                        39
Knowledge Management Institute




                                 Social Navigation
                                                 Users search for and navigate
                                                  to certain sets of resources




                   Can we produce ontological constructs as a
                   byproduct of navigational activities of users ?
                   b    d t f      i ti    l ti iti      f
                   Navigational Knowledge Engineering:
                    A light-weight methodology for low-cost
                knowledge engineering by a massive user base.

 Markus Strohmaier                      2011
                                                                                 40
Knowledge Management Institute




                                 Prototype: HANNE
            HANNE: Holistic Application for Navigational
            HANNE H li ti A li ti f N i ti             l
               Knowledge Engineering
            http://aksw.org/Projects/NKE
            http://aksw org/Projects/NKE




            with S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, University of Leipzig
 Markus Strohmaier                          2011
                                                                                       41
Knowledge Management Institute




                   Navigational Knowledge Engineering
                                                      http://aksw.org/Projects/NKE

            Example: Extending DBPedia with NKE




 Markus Strohmaier                                                       2011
                  S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation)   42
Knowledge Management Institute




                   Navigational Knowledge Engineering




                                        Choose initial positive and
                                        negative examples from the
                                        search result.
                                        Here we are looking for
                                        Football Clubs in Saxony, a
                                        region in German
                                                  Germany.



 Markus Strohmaier               2011
                                            http://aksw.org/Projects/NKE
                                                                           43
Knowledge Management Institute




                 The Problem of Inductive Concept Learning

            U i a universal set of objects
              is     i     l t f bj t
            C is a concept: a subset of objects in U

            To learn a concept C means to learn to recognize
              objects in C t be able t t ll whether
                bj t i C, to b bl to tell h th



            Example:
            U may be the set of all patients in a register, and
                                f
            The set of all patients having a particular disease

 Markus Strohmaier                   2011
                                                                  44
Knowledge Management Institute




                                  Inductive Concept Learning




            To test the coverage, the function



            Returns the value true if e is covered by H, and false otherwise.


 Markus Strohmaier                                        2011
                Nada Lavrac and Saso Dzeroski, Inductive Logic Programming: Techniques and Applications, 1994   45
Knowledge Management Institute



                                 HANNE &
                 The Problem of Inductive Concept Learning

            given:
            • Background knowledge (OWL/DL knowledge base)
            • positive and negative examples of a concept
              (instances)
              (i t       )

            find:
            fi d
            • A hypothesis (expressed as OWL class descriptions)
               that covers all positive and no negative examples



 Markus Strohmaier                 2011
                                                                   46
Knowledge Management Institute




                                                                                                                           Based on the
                                                                                                                           extension, ICL
                                                                                                                           searches for
                                                                                                                           suitable
                                                                                                                           hypotheses.




 Markus Strohmaier                                                       2011
                  S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation)   47
Knowledge Management Institute




                                                  Concepts Learned




         • The Learned C
                       Concept is shown in Manchester O    OWL SSyntax
         • The user can retain the concept for later retrieval.
         • Saved concepts are displayed as social navigation
           suggestions. Can be used to enrich existing knowledge
           base.
           base
 Markus Strohmaier                                                       2011
                  S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation)   48
Knowledge Management Institute




                                                                      Result
                                                                                  Useful properties:
                                                                                  • Biased towards high recall

                                                                                  • Scales well: Number of training
                                                                                  examples is more important than the
                                                                                  size of the background knowledge




                                                                                          With only 2 positives and 4 negatives,
                                                                                          it is possible to find 13 more
                                                                                          instances, which are f tb ll clubs
                                                                                          i t           hi h      football l b
                                                                                          situated close to Saxony, Germany




 Markus Strohmaier                                                       2011
                  S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation)   49
Knowledge Management Institute
                                 http://aksw.org/Projects/NKE


                                   Mock up (1)




 Markus Strohmaier                       2011
                                                                50
Knowledge Management Institute
                                 http://aksw.org/Projects/NKE


                                   Mock up (2)




 Markus Strohmaier                       2011
                                                                51
Knowledge Management Institute




     Extracting Semantics from …

     Motivation

     Social Labeling
     • Hashtag Semantics
     Social Tagging
     • Tag Relatedness
     • Tag Generality
     • Tag Hierarchies
     Social Navigation
     • Navigational Knowledge
     Engineering
       g       g                        Conclusions
 Markus Strohmaier               2011
                                                      52
Knowledge Management Institute




                                           Objectives
            Provide (some) answers to the following questions:
            P   id (     )         t th f ll i          ti

            •    What is the difference between extracting semantics from text
                 vs. extracting semantics from crowds?
            •    Why should we study crowds and crowd behavior from a
                 semantic computing perspective?
                        ti        ti          ti ?
            •    How can semantics be extracted from online crowd behavior,
                 such as
                  •    …from Social Labeling
                  •    …from Social Tagging
                  •    …from Soc a Navigation
                          o Social a ga o
            •    What are the implications for semantic computing research?



 Markus Strohmaier                              2011
                                                                                 53
Knowledge Management Institute




               Social Computation for the Web of Data

            Crowds provide some advantages over semantic extraction from text.

            It is through the process of social computation, i.e.
                  the combination of social behavior and algorithmic computation,
                  that
                  th t semantic structures emerge.
                             ti t t

            In order to extract semantics from crowds understanding users
                                               crowds,               users’
                behavior and its impact on emerging semantic structures is
                important.

            Shaping or classifying certain users’ behavior are ways of
               increasing p
                        g preciseness of emerging semantic structures.
                                             g g

 Markus Strohmaier                        2011
                                                                                    54
Knowledge Management Institute




                                          Crowd S
                                          C   d Semantics
                                                     ti
                                                   based on [Hovy 2011]




 Markus Strohmaier                                        2011
                  [Hovy 2011] Based on slides by Ed Hovy, USC, Reasoning with Text Workshop 2011 (link)   55
Knowledge Management Institute




                                 Summary
            We
            W can observe that:
                      b       th t
            • semantic structures can be obtained as a byproduct
              of online crowd behavior (l b li t i navigation)
                                       (labeling, tagging, i ti )
            • these structures can approximate structures in
              reference knowledge bases (DBPedia WordNet etc)
                                           (DBPedia, WordNet,
            but:
            • pragmatics influences resulting semantics
            • semantic preciseness remains a challenge




 Markus Strohmaier                 2011
                                                                    56
Knowledge Management Institute
 An Agenda for Semantic Computing
 Research s
 R What‘s the knowledge of SAS?
     What
        h




                                                                                                   10 August, 2011
                                                                                                    0
                        Web Resources
                                         Natural Language Constructs
              Real-World Happenings
 Utilizing online behavior of crowds for




                                                                                                       57
 the construction, maintenance and enrichment of
 large-scale semantic structures
 Markus Strohmaier                                                     2011
                                                 http://www.flickr.com/photos/waldoj/722508166/
 http://www.flickr.com/photos/matthewfield/2306001896/                                            57
Knowledge Management Institute




                                             Thank You.



                                        Acknowledgements

                                         co-authors and collaborators

                            H.P. Grahsl, D. Helic, C. Körner, R. Kern, C. Trattner (TU Graz)
                                           D. Benz, G. Stumme (U. Kassel)
                                               A. Hotho (U. Würzburg)
                    S. H ll
                    S Hellmann, J. Lehmann, C. Stadler, J. Unbehauen (U. of Leipzig, Germany)
                                  J L h         C St dl J U b h            (U f L i i G      )




 Markus Strohmaier                                    2011
                                                                                                 58

More Related Content

Similar to Extracting semantics from crowds

Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"
Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"
Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"Nicolas Balacheff
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelMihika Shah
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteAldo Gangemi
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2Nadine Ludwig
 
Universal design for learners
Universal design for learnersUniversal design for learners
Universal design for learnerssterrone
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationTore Hoel
 
Universal design for learners
Universal design for learnersUniversal design for learners
Universal design for learnerssterrone
 
LAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning AnalyticsLAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning AnalyticsSimon Knight
 
LAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning AnalyticsLAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning AnalyticsSoLARTalks
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologieseswcsummerschool
 
Coast to Coast March 2013
Coast to Coast March 2013Coast to Coast March 2013
Coast to Coast March 2013Brian Fisher
 
types of analyses of research
types of analyses of researchtypes of analyses of research
types of analyses of researchssuser1ee781
 
Personality Assessment using Twitter Tweets
Personality Assessment using Twitter TweetsPersonality Assessment using Twitter Tweets
Personality Assessment using Twitter TweetsNadeem Ahmad Ch
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysiscjbuckner
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...summersocialwebshop
 
Using the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social InteractionsUsing the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social InteractionsDmitry Paranyushkin
 

Similar to Extracting semantics from crowds (20)

Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"
Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"
Multidisciplinarity vs. Multivocality, the case of “Learning Analytics"
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object model
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynote
 
Ontology
OntologyOntology
Ontology
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2
 
Universal design for learners
Universal design for learnersUniversal design for learners
Universal design for learners
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
 
Contested Modelling
Contested ModellingContested Modelling
Contested Modelling
 
Universal design for learners
Universal design for learnersUniversal design for learners
Universal design for learners
 
LAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning AnalyticsLAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
 
LAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning AnalyticsLAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
LAK13: Epistemology, Pedagogy, Assessment and Learning Analytics
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
Coast to Coast March 2013
Coast to Coast March 2013Coast to Coast March 2013
Coast to Coast March 2013
 
types of analyses of research
types of analyses of researchtypes of analyses of research
types of analyses of research
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Personality Assessment using Twitter Tweets
Personality Assessment using Twitter TweetsPersonality Assessment using Twitter Tweets
Personality Assessment using Twitter Tweets
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysis
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
 
Using the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social InteractionsUsing the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social Interactions
 

Recently uploaded

How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 

Recently uploaded (20)

How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 

Extracting semantics from crowds

  • 1. Knowledge Management Institute Extracting Semantics from Crowds International Summer School on Semantic Computing (SSSC 2011) (SSSC’2011) August 8 - 12, 2011, Berkeley, California Markus Strohmaier Assistant Professor Graz University of Technology, Austria Visiting Scientist (XEROX) PARC, USA with collaborators D. Helic, C. Körner, J. Pöschko, R. Kern, C. Trattner, C. Wagner (Graz University of Technology, Austria), D. Benz, G. Stumme (U. of Kassel, Germany), A. Hotho (U. of Würzburg, Germany), S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen (U. of Leipzig, Germany) Markus Strohmaier 2011 1
  • 2. Knowledge Management Institute Egypt 2011 Twitter Trends place event person Markus Strohmaier 2011 2
  • 3. Knowledge Management Institute Semantic Structures: The DMOZ Project Markus Strohmaier 2011 3
  • 4. Knowledge Management Institute Classification Systems in Information and Library Sciences Usually produced and maintained by few (e.g. (e g dozens of) domain experts experts. but: used by many (potentially millions). Can a very large group (a crowd) of users contribute to ontology engineering efforts? Markus Strohmaier 2011 4
  • 5. Knowledge Management Institute Objectives Provide (some) answers to the following questions: P id ( ) t th f ll i ti • What is the difference between extracting semantics from text vs. extracting semantics from crowds? • Why should we study crowds and crowd behavior from a semantic computing perspective? ti ti ti ? • How can semantics be extracted from online crowd behavior, such as • …from Social Labeling • …from Social Tagging • …from Soc a Navigation o Social a ga o • What are the implications for semantic computing research? Markus Strohmaier 2011 5
  • 6. Knowledge Management Institute Extracting Semantics from … Motivation Social Labeling • Hashtag Semantics Social Tagging Extraction • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Interaction Engineering g g Markus Strohmaier 2011 6
  • 7. Knowledge Management Institute Ontology engineering vs. learning • O t l Ontology engineering i i • Expert-driven, small-scale • Knowledge and concept identification • Preliminary informal representation • Formalization (RDF, OWL, etc) • Evaluation d Maintenance E l ti and M i t • Ontology learning • Data-driven, large-scale Data driven large scale • Source selection and data sampling • Data exploration and probing • Concept and link learning • Evaluation and Updating Markus Strohmaier 2011 7
  • 8. Knowledge Management Institute Distributional Distrib tional Semantics [Hovy 2011] • I recent years, people in NLP and IR h In t l i d have started t t d using a different representation for all kinds of problems: word distributions [Harris 1954]: “words that occur in the same contexts tend to have similar meanings“ • Statistical NLP operates at word level, frequency distributions are used as the (de facto) semantics of a word ( ) • Not actual semantics, but captures something of contents • Not compositional: how to ‘add’ two distributions? • No explicit theory of semantics Markus Strohmaier 2011 [Hovy 2011] Based on slides by Ed Hovy, USC, Reasoning with Text Workshop 2011 (link) 8
  • 9. Knowledge Management Institute Why Wh apple is similiar to pear [based on slides by Hovy 2011 / Pantel 02] Markus Strohmaier 2011 Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 9 (KDD '02). ACM, New York, NY, USA, 613-619.
  • 10. Knowledge Management Institute Why Wh apple is not similiar to toothbr sh toothbrush [based on slides by Hovy 2011 / Pantel 02] Markus Strohmaier 2011 Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 10 (KDD '02). ACM, New York, NY, USA, 613-619.
  • 11. Knowledge Management Institute Distributional Distrib tional Semantics [based on slides by Hovy 2011 / Pantel 02] Markus Strohmaier 2011 Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 11 (KDD '02). ACM, New York, NY, USA, 613-619.
  • 12. Knowledge Management Institute Hearst Patterns H t P tt M. Hearst, Automatic Acquisition of Hyponyms from Large Text Corpora 1992 (S1) The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string. Markus Strohmaier 2011 12
  • 13. Knowledge Management Institute Ontology Learning From Text Markus Strohmaier 2011 13
  • 14. Knowledge Management Institute Evaluation A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES, Janez Brank, Marko Grobelnik, Dunja Mladenić, Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2005), 2005 Markus Strohmaier 2011 14
  • 15. Knowledge Management Institute Limitations of Knowledge Extraction from Text (or why we want to study semantics in crowds) • Delay •Time it t k t write (hi h quality) t t on a t i Ti takes to it (high lit ) text topic • Population bias and dependence • Authors´ demographics Authors • study language of target groups (e.g. software developers) • Style y •Internal monologue vs. conversation with others • Topical bias • General Fiction, Mystery, Science Fiction, Romance, Humor, .. • Books, newspapers, magazines [Brown Corpus] Markus Strohmaier 2011 Brown Corpus: http://icame.uib.no/brown/bcm.html 15
  • 16. Knowledge Management Institute Extracting Semantics from Crowds (i.e. Social Media) How can we tap into users interaction with data and with each other for the extraction of semantic structures? Markus Strohmaier 2011 16
  • 17. Knowledge Management Institute Crowdsourcing “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.” p [J. Howe 2006] Markus Strohmaier 2011 17
  • 18. Knowledge Management Institute Crowdsourcing Semantics: An Experiment (2008) LoC L C posted 3000 photos t d h t to flickr: 24 hours after launch: • over 4,000 unique tags • about 19,000 tags added Output? noisy, How can this data contribute to the unstructured, unqualified, induction of semantic structures? weak semantics Markus Strohmaier 2011 18
  • 19. Knowledge Management Institute Extracting Semantics from Crowds Vision: Vi i Utilizing online behavior of crowds for the construction, maintenance and enrichment construction of large-scale semantic structures. Mission: • to model behavior of large numbers (millions) of users online • to develop techniques and algorithms that acquire semantic structures from users‘ interaction with data and with each other • to influence user behavior and emerging semantics • to evaluate results Markus Strohmaier 2011 19
  • 20. Knowledge Management Institute Activities of Crowds Online Users engage in… Labeling Tagging Navigation Can we tap into the outcome of these activities to extract and evaluate semantic structures? Markus Strohmaier 2011 20
  • 21. Knowledge Management Institute Extracting Semantics from Crowds Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Markus Strohmaier 2011 21
  • 22. Knowledge Management Institute Social Labeling Example: Twitter users label short messages with concepts (hashtags) Whether hashtags behave as strong identifiers, and if they could be mapped to concept identifiers in the Semantic Web (URIs)? [Laniado and Mika 2010] Craig s Craig‘s talk this morning: Background knowledge for URLs to explore other URLs URLs, Markus Strohmaier 2011 22
  • 23. Markus Strohmaier Knowledge Management Institute with and C.Wagner 2011 J. Pöschko ConceptNet developed by C. Wagner, M. Strohmaie The Wisdom in Tw W er, weetonomies: Acquiri ing Latent Conceptua Structures from So al ocial Awareness Stre eams, 23 Semmantic Search 2010 Workshop (SemSearch2 W 2010), in conjunction with the 19th Internatio w onal World Wide Web Conference (WWW20 C 010), Rale eigh, NC, USA, April 26-30, ACM, 2010. 2
  • 24. Markus Strohmaier Knowledge Management Institute such a network? p with X is a subconcept of Y e.g. X is more general than Y, and C.Wagner 2011 J. Pöschko ConceptNet Can we qualify semantic associations in ent Ev- ent Ev- developed by C. Wagner, M. Strohmaie The Wisdom in Tw W er, weetonomies: Acquiri ing Latent Conceptua Structures from So al ocial Awareness Stre eams, 24 Semmantic Search 2010 Workshop (SemSearch2 W 2010), in conjunction with the 19th Internatio w onal World Wide Web Conference (WWW20 C 010), Rale eigh, NC, USA, April 26-30, ACM, 2010. 2
  • 25. Knowledge Management Institute Extracting Semantics from Crowds Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Markus Strohmaier 2011 25
  • 26. Knowledge Management Institute Social Tagging Example: Delicious Users label and categorize Resources resources with concepts (tags) Tags Users U is a tuple F:= (U, T, R, Y) where • th th the three di j i t fi it sets U T R correspond t disjoint, finite t U, T, d to user 1 – a set of persons or users u ∈ U – a set of tags t ∈ T and – a set of resources or objects r ∈ R tag 1 res. 1 • Y ⊆ U ×T ×R, called set of tag assignments Markus Strohmaier 2011 26
  • 27. Knowledge Management Institute Tag Relatedness ResCont t1 t2 t3 r1 r2 Different tag similiarity measures applied to del.icio.us Markus Strohmaier 2011 C. Cattuto, D. Benz, A. Hotho, G. Stumme, Semantic Grounding of Tag Relatedness in Social Bookmarking Systems, 7th International Semantic Web Conference 27 ISWC2008, LNCS 5318, 615-631 (2008).
  • 28. Knowledge Management Institute Intuition: Latent Hierachical Structures high centrality: more abstract [ [Strohmaier et al 2011] ] [Heyman and Garcia-Molina 2006] low centrality: more specific Note: Betweeness centrality is usually difficult to calculate. Calculating all shortest p path is usually O(n2). For all nodes O(n3). We use an approximation. y ( ) ( ) pp Markus Strohmaier 2011 M. Strohmaier, D. Helic, D. Benz. Evaluation of Folksonomy Induction Algorithms. Submitted to the ACM Transactions on Intelligent Systems and Technology, (2011). 28
  • 29. Knowledge Management Institute Tag Abstractness frequency Del.icio.us with D. Benz et al. Markus Strohmaier 2011 D. Benz, C. Körner, A. Hotho, G. Stumme, M. Strohmaier, One Tag to bind them all: Measuring Term Abstractness in Social Metadata, Submitted to ESWC 2011. 29
  • 30. Knowledge Management Institute Emergent semantics through hierarchical clustering Approaches: • k-means • Affinity propagation • Tag generality Applications: • user navigation i ti • ontology learning • disambiguation on a delicious dataset Evaluation: • Semantic grounding to Golden Standards (e.g. WordNet) W dN t) Study of different tagging systems: BibSonomy, CiteULike ,Delicious, Flickr, LastFM Semantic and Pragmatic Quality varying greatly. i tl Markus Strohmaier 2011 Anon Plangprasopchok, Kristina Lerman, and Lise Getoor (2010), Integrating Structured Metadata via Relational Affinity Propagation, in Proceedings of AAAI Workshop on Statistical Relational AI 30
  • 31. Knowledge Management Institute Semantic Validation of Folksonomies Semantic Networks Semantic Groundingg (Emergent) (Golden Standard) via e.g. hierarchical clustering WordNet: a lexical DB for English computers Map- Synset Hierarchy Programming ping programming distance d1 = 1 distance d2 = 2 Python Design g languages g g patterns abs. difference |d1 - d2| a Semantic simple p y for the q p proxy quality y grounding j java python of emergent semantics Markus Strohmaier 2011 Based on slides by A. Hotho 31
  • 32. Knowledge Management Institute Semantically Evaluating Tag Hierarchies [Dellschaft, Staab 2006] TP..Taxonomic TR..Taxonomic TF..Taxonomic TO…Taxonomic precision recall F measure overlap Different folksonomy algorithms Holds with other knowledge bases (Yago, Wikitaxonomy) and datasets (bibsonomy, citeulike, lastfm less so) [Dellschaft, Staab 2006] On How to Perform a Gold Standard Based Evaluation of Ontology Learning (2006), by Klaas Dellschaft , Steffen Staab, In Proceedings of the 5th International Semantic Web Conference (ISWC’06) Markus Strohmaier 2011 M. Strohmaier, D. Helic, D. Benz. Evaluation of Folksonomy Induction Algorithms. Submitted to the ACM Transactions on Intelligent Systems and Technology, (2011). 32
  • 33. Knowledge Management Institute Limitations and Opportunities Pragmatics i fl P ti influence semantics: ti 1. Tagging behavior effects preciseness of emerging semantics [Körner et al. 2010] 2. Social t 2 S i l networks i fl k influence resulting semantic networks lti ti t k [Wang and Groth 2010] 3. 3 Stream types effect stream semantics [Wagner and Strohmaier 2010] Markus Strohmaier 2011 33
  • 34. Knowledge Management Institute Different motivations for tagging User A User B This These seems to seem to be a be category! keywords! Markus Strohmaier 2011 34
  • 35. Knowledge Management Institute How do Semantics Emerge? Are they Influenced by the Pragmatics of Tagging? Different styles of tagging: Diff t t l ft i To categorize or to describe resources [Strohmaier et al. 2010] Categorizer (C) Describer (D) Goal later browsing later search Change of vocabulary costly cheap Size f Si of vocabulary b l limited li it d Open O Tags subjective objective Example tag clouds Semantic Assumption: Categorizers produce more precise emergent semantics than Describers. Markus Strohmaier 2011 [Strohmaier et al. 2010] M. Strohmaier, C. Koerner, R. Kern, Why do Users Tag? Detecting Users' Motivation for Tagging in Social Tagging Systems, 4th International AAAI Conference on 35 Weblogs and Social Media (ICWSM2010), Washington, DC, USA, May 23-26, 2010.
  • 36. Knowledge Management Institute Example 1 p Describers outperform categorizers on precision of emergent tag semantics Categorizers perform Describers perform worse than random better than random Random Random users users (C) (D) C. Körner, D. Benz, A. Hotho, M. Strohmaier, G. Stumme, Stop Thinking, Start Tagging: Tag Semantics Emerge From Collaborative Verbosity, 19th International World Wide Web Conference Markus Strohmaier (WWW2010), Raleigh, NC, USA, April 26-30, ACM, 2010. 2011 36
  • 37. Knowledge Management Institute Example 2 p Categorizers outperform describers on social classification accuracy Describers perform worse than Categorizers Supervised multi-class SVM Markus Strohmaier 2011 A. Zubiaga, C. Koerner, and M. Strohmaier. Tags vs. shelves: From social tagging to social classification. In 22nd ACM SIGWEB Conference on Hypertext and Hypermedia 37 (HT 2011), Eindhoven, Netherlands, ACM, 2011.
  • 38. Knowledge Management Institute Example 3 Type of stream aggregation effects semantics … the type of aggregation: Hashtag stream aggregations are more robust against external disturbances than user list streams Hashtag Stream User List Stream OR(RUa)S(Rh) OR(RUa)S(RUL) Markus Strohmaier 2011 38
  • 39. Knowledge Management Institute Extracting Semantics from Crowds Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Markus Strohmaier 2011 39
  • 40. Knowledge Management Institute Social Navigation Users search for and navigate to certain sets of resources Can we produce ontological constructs as a byproduct of navigational activities of users ? b d t f i ti l ti iti f Navigational Knowledge Engineering: A light-weight methodology for low-cost knowledge engineering by a massive user base. Markus Strohmaier 2011 40
  • 41. Knowledge Management Institute Prototype: HANNE HANNE: Holistic Application for Navigational HANNE H li ti A li ti f N i ti l Knowledge Engineering http://aksw.org/Projects/NKE http://aksw org/Projects/NKE with S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, University of Leipzig Markus Strohmaier 2011 41
  • 42. Knowledge Management Institute Navigational Knowledge Engineering http://aksw.org/Projects/NKE Example: Extending DBPedia with NKE Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 42
  • 43. Knowledge Management Institute Navigational Knowledge Engineering Choose initial positive and negative examples from the search result. Here we are looking for Football Clubs in Saxony, a region in German Germany. Markus Strohmaier 2011 http://aksw.org/Projects/NKE 43
  • 44. Knowledge Management Institute The Problem of Inductive Concept Learning U i a universal set of objects is i l t f bj t C is a concept: a subset of objects in U To learn a concept C means to learn to recognize objects in C t be able t t ll whether bj t i C, to b bl to tell h th Example: U may be the set of all patients in a register, and f The set of all patients having a particular disease Markus Strohmaier 2011 44
  • 45. Knowledge Management Institute Inductive Concept Learning To test the coverage, the function Returns the value true if e is covered by H, and false otherwise. Markus Strohmaier 2011 Nada Lavrac and Saso Dzeroski, Inductive Logic Programming: Techniques and Applications, 1994 45
  • 46. Knowledge Management Institute HANNE & The Problem of Inductive Concept Learning given: • Background knowledge (OWL/DL knowledge base) • positive and negative examples of a concept (instances) (i t ) find: fi d • A hypothesis (expressed as OWL class descriptions) that covers all positive and no negative examples Markus Strohmaier 2011 46
  • 47. Knowledge Management Institute Based on the extension, ICL searches for suitable hypotheses. Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 47
  • 48. Knowledge Management Institute Concepts Learned • The Learned C Concept is shown in Manchester O OWL SSyntax • The user can retain the concept for later retrieval. • Saved concepts are displayed as social navigation suggestions. Can be used to enrich existing knowledge base. base Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 48
  • 49. Knowledge Management Institute Result Useful properties: • Biased towards high recall • Scales well: Number of training examples is more important than the size of the background knowledge With only 2 positives and 4 negatives, it is possible to find 13 more instances, which are f tb ll clubs i t hi h football l b situated close to Saxony, Germany Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 49
  • 50. Knowledge Management Institute http://aksw.org/Projects/NKE Mock up (1) Markus Strohmaier 2011 50
  • 51. Knowledge Management Institute http://aksw.org/Projects/NKE Mock up (2) Markus Strohmaier 2011 51
  • 52. Knowledge Management Institute Extracting Semantics from … Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Conclusions Markus Strohmaier 2011 52
  • 53. Knowledge Management Institute Objectives Provide (some) answers to the following questions: P id ( ) t th f ll i ti • What is the difference between extracting semantics from text vs. extracting semantics from crowds? • Why should we study crowds and crowd behavior from a semantic computing perspective? ti ti ti ? • How can semantics be extracted from online crowd behavior, such as • …from Social Labeling • …from Social Tagging • …from Soc a Navigation o Social a ga o • What are the implications for semantic computing research? Markus Strohmaier 2011 53
  • 54. Knowledge Management Institute Social Computation for the Web of Data Crowds provide some advantages over semantic extraction from text. It is through the process of social computation, i.e. the combination of social behavior and algorithmic computation, that th t semantic structures emerge. ti t t In order to extract semantics from crowds understanding users crowds, users’ behavior and its impact on emerging semantic structures is important. Shaping or classifying certain users’ behavior are ways of increasing p g preciseness of emerging semantic structures. g g Markus Strohmaier 2011 54
  • 55. Knowledge Management Institute Crowd S C d Semantics ti based on [Hovy 2011] Markus Strohmaier 2011 [Hovy 2011] Based on slides by Ed Hovy, USC, Reasoning with Text Workshop 2011 (link) 55
  • 56. Knowledge Management Institute Summary We W can observe that: b th t • semantic structures can be obtained as a byproduct of online crowd behavior (l b li t i navigation) (labeling, tagging, i ti ) • these structures can approximate structures in reference knowledge bases (DBPedia WordNet etc) (DBPedia, WordNet, but: • pragmatics influences resulting semantics • semantic preciseness remains a challenge Markus Strohmaier 2011 56
  • 57. Knowledge Management Institute An Agenda for Semantic Computing Research s R What‘s the knowledge of SAS? What h 10 August, 2011 0 Web Resources Natural Language Constructs Real-World Happenings Utilizing online behavior of crowds for 57 the construction, maintenance and enrichment of large-scale semantic structures Markus Strohmaier 2011 http://www.flickr.com/photos/waldoj/722508166/ http://www.flickr.com/photos/matthewfield/2306001896/ 57
  • 58. Knowledge Management Institute Thank You. Acknowledgements co-authors and collaborators H.P. Grahsl, D. Helic, C. Körner, R. Kern, C. Trattner (TU Graz) D. Benz, G. Stumme (U. Kassel) A. Hotho (U. Würzburg) S. H ll S Hellmann, J. Lehmann, C. Stadler, J. Unbehauen (U. of Leipzig, Germany) J L h C St dl J U b h (U f L i i G ) Markus Strohmaier 2011 58