SlideShare ist ein Scribd-Unternehmen logo
1 von 177
Distributional Semantic Models
                   Tutorial at NAACL-HLT 2010, Los Angeles, CA
                                   — part 1 —


                                               Stefan Evert1
            with contributions from Marco Baroni2 and Alessandro Lenci3

                                    1 University   of Osnabrück, Germany
                                         2 University   of Trento, Italy
                                          3 University   of Pisa, Italy


                                      Los Angeles, 1 June 2010
                  Copyright © 2009–2010 Baroni, Evert & Lenci | Licensed under CC-by-sa version 3.0




© Evert/Baroni/Lenci (CC-by-sa)                    DSM Tutorial                     wordspace.collocations.de   1 / 107
Outline


Outline

     Introduction
         The distributional hypothesis
         General overview
         Three famous DSM examples

     Taxonomy of DSM parameters
        Definition & overview
        DSM parameters
        Examples

     Usage and evaluation of DSM
        What to do with DSM distances
        Evaluation: semantic similarity and relatedness
        Attributional similarity
        Relational similarity

 © Evert/Baroni/Lenci (CC-by-sa)     DSM Tutorial     wordspace.collocations.de   2 / 107
Introduction   The distributional hypothesis


Outline

     Introduction
         The distributional hypothesis
         General overview
         Three famous DSM examples

     Taxonomy of DSM parameters
        Definition & overview
        DSM parameters
        Examples

     Usage and evaluation of DSM
        What to do with DSM distances
        Evaluation: semantic similarity and relatedness
        Attributional similarity
        Relational similarity

 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   3 / 107
Introduction   The distributional hypothesis


Meaning & distribution



            “Die Bedeutung eines Wortes liegt in seinem Gebrauch.”
                                                  — Ludwig Wittgenstein




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   4 / 107
Introduction   The distributional hypothesis


Meaning & distribution



            “Die Bedeutung eines Wortes liegt in seinem Gebrauch.”
                                                  — Ludwig Wittgenstein


            “You shall know a word by the company it keeps!”
                                                  — J. R. Firth (1957)




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   4 / 107
Introduction   The distributional hypothesis


Meaning & distribution



            “Die Bedeutung eines Wortes liegt in seinem Gebrauch.”
                                                  — Ludwig Wittgenstein


            “You shall know a word by the company it keeps!”
                                                  — J. R. Firth (1957)


            Distributional hypothesis (Zellig Harris 1954)




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   4 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.
            Beef dishes are made to complement the bardiwacs.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.
            Beef dishes are made to complement the bardiwacs.
            Nigel staggered to his feet, face flushed from too much
            bardiwac.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.
            Beef dishes are made to complement the bardiwacs.
            Nigel staggered to his feet, face flushed from too much
            bardiwac.
            Malbec, one of the lesser-known bardiwac grapes, responds
            well to Australia’s sunshine.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.
            Beef dishes are made to complement the bardiwacs.
            Nigel staggered to his feet, face flushed from too much
            bardiwac.
            Malbec, one of the lesser-known bardiwac grapes, responds
            well to Australia’s sunshine.
            I dined off bread and cheese and this excellent bardiwac.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.
            Beef dishes are made to complement the bardiwacs.
            Nigel staggered to his feet, face flushed from too much
            bardiwac.
            Malbec, one of the lesser-known bardiwac grapes, responds
            well to Australia’s sunshine.
            I dined off bread and cheese and this excellent bardiwac.
            The drinks were delicious: blood-red bardiwac as well as light,
            sweet Rhenish.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


What is the meaning of “bardiwac”?

            He handed her her glass of bardiwac.
            Beef dishes are made to complement the bardiwacs.
            Nigel staggered to his feet, face flushed from too much
            bardiwac.
            Malbec, one of the lesser-known bardiwac grapes, responds
            well to Australia’s sunshine.
            I dined off bread and cheese and this excellent bardiwac.
            The drinks were delicious: blood-red bardiwac as well as light,
            sweet Rhenish.
bardiwac is a heavy red alcoholic beverage made from grapes




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   5 / 107
Introduction   The distributional hypothesis


Real-life concordance & word sketch
http://beta.sketchengine.co.uk/




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   6 / 107
Introduction   The distributional hypothesis


Real-life concordance & word sketch
http://beta.sketchengine.co.uk/




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                     wordspace.collocations.de   7 / 107
Introduction    The distributional hypothesis


A thought experiment: deciphering hieroglyphs

                                      get            sij      ius       hir           iit     kil

       (knife)             naif         51           20        84          0          3           0
       (cat)                  ket       52           58         4          4          6          26
       ???                    dog      115           83        10         42          33         17
       (boat)                 beut      59           39        23          4          0           0
       (cup)                   kap      98           14         6          2          1           0
       (pig)                  pigij     12           17         3          2          9          27
       (banana)          nana           11            2         2          0          18          0




 © Evert/Baroni/Lenci (CC-by-sa)             DSM Tutorial                      wordspace.collocations.de   8 / 107
Introduction    The distributional hypothesis


A thought experiment: deciphering hieroglyphs

                                      get            sij      ius       hir           iit     kil

       (knife)             naif         51           20        84          0          3           0
       (cat)                  ket       52           58         4          4          6          26
       ???                    dog      115           83        10         42          33         17
       (boat)                 beut      59           39        23          4          0           0
       (cup)                   kap      98           14         6          2          1           0
       (pig)                  pigij     12           17         3          2          9          27
       (banana)          nana           11            2         2          0          18          0


                               sim(dog,      naif           ) = 0.770

 © Evert/Baroni/Lenci (CC-by-sa)             DSM Tutorial                      wordspace.collocations.de   8 / 107
Introduction    The distributional hypothesis


A thought experiment: deciphering hieroglyphs

                                       get             sij      ius       hir           iit     kil

       (knife)             naif           51           20        84          0          3           0
       (cat)                  ket         52           58         4          4          6          26
       ???                    dog        115           83        10         42          33         17
       (boat)                 beut        59           39        23          4          0           0
       (cup)                   kap        98           14         6          2          1           0
       (pig)                  pigij       12           17         3          2          9          27
       (banana)          nana             11            2         2          0          18          0


                                   sim(dog,      pigij       ) = 0.939

 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                      wordspace.collocations.de   8 / 107
Introduction    The distributional hypothesis


A thought experiment: deciphering hieroglyphs

                                       get             sij      ius       hir           iit     kil

       (knife)             naif           51           20        84          0          3           0
       (cat)                  ket         52           58         4          4          6          26
       ???                    dog        115           83        10         42          33         17
       (boat)                 beut        59           39        23          4          0           0
       (cup)                   kap        98           14         6          2          1           0
       (pig)                  pigij       12           17         3          2          9          27
       (banana)          nana             11            2         2          0          18          0


                                   sim(dog,      ket         ) = 0.961

 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                      wordspace.collocations.de   8 / 107
Introduction    The distributional hypothesis


English as seen by the computer . . .

                                   get        see         use       hear            eat      kill
                                   get       sij          ius      hir              iit   kil

         knife          naif       51         20          84          0             3         0
         cat              ket      52         58          4           4             6        26
         dog               dog     115        83          10         42             33       17
         boat             beut     59         39          23          4             0         0
         cup               kap     98         14          6           2             1         0
         pig              pigij    12         17          3           2             9        27
         banana       nana         11          2          2           0             18        0

                                      verb-object counts from British National Corpus



 © Evert/Baroni/Lenci (CC-by-sa)           DSM Tutorial                      wordspace.collocations.de   9 / 107
Introduction   The distributional hypothesis


Geometric interpretation

       row vector xdog
       describes usage of
       word dog in the                            get       see       use         hear    eat         kill
       corpus                        knife        51        20        84           0       3           0
                                       cat        52        58         4           4       6          26
       can be seen as
                                      dog         115       83        10           42     33          17
       coordinates of point          boat         59        39        23           4       0           0
       in n-dimensional               cup         98        14         6           2       1           0
       Euclidean space                 pig        12        17         3           2       9          27
                                   banana         11         2         2           0      18           0

                                                  co-occurrence matrix M




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                    wordspace.collocations.de      10 / 107
Introduction      The distributional hypothesis


Geometric interpretation
                                                        Two dimensions of English V−Obj DSM
       row vector xdog




                                         120
       describes usage of
       word dog in the




                                         100
       corpus
                                                                       knife
       can be seen as                                                   q




                                         80
       coordinates of point
       in n-dimensional            use

                                         60
       Euclidean space
       illustrated for two
                                         40



       dimensions:                                                        boat
                                                                           q
       get and use
                                         20




                                                                                                       dog
                                                                        cat                             q
       xdog = (115, 10)                                                  q
                                         0




                                                0       20        40           60      80       100         120

                                                                               get


 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                     wordspace.collocations.de         11 / 107
Introduction      The distributional hypothesis


Geometric interpretation
                                                        Two dimensions of English V−Obj DSM
       similarity = spatial




                                         120
       proximity
       (Euclidean dist.)




                                         100
       location depends on                                             knife
       frequency of noun                                                q




                                         80
       (fdog ≈ 2.7 · fcat )
                                   use

                                         60
                                         40




                                                                          boat
                                                                           q              d=5
                                                                                                7.5
                                         20




                                                                                                            dog
                                                                        cat          d = 63.3                q
                                                                         q
                                         0




                                                0       20        40           60        80           100     120

                                                                               get


 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                     wordspace.collocations.de           12 / 107
Introduction      The distributional hypothesis


Geometric interpretation
                                                        Two dimensions of English V−Obj DSM
       similarity = spatial




                                         120
       proximity
       (Euclidean dist.)




                                         100
       location depends on                                             knife
       frequency of noun                                                q




                                         80
       (fdog ≈ 2.7 · fcat )
                                   use
       direction more
       important than                    60
                                         40

       location
                                                                          boat
                                                                           q
                                         20




                                                                                                       dog
                                                                        cat                             q
                                                                         q
                                         0




                                                0       20        40           60      80       100         120

                                                                               get


 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                     wordspace.collocations.de         13 / 107
Introduction      The distributional hypothesis


Geometric interpretation
                                                        Two dimensions of English V−Obj DSM
       similarity = spatial




                                         120
       proximity
       (Euclidean dist.)




                                         100
       location depends on                                             knife
       frequency of noun                                                q




                                         80
       (fdog ≈ 2.7 · fcat )                                        q
                                   use
       direction more
       important than                    60
                                         40

       location
                                                                          boat        q
       normalise “length”                                                  q
                                         20




        xdog of vector                                                                                 dog
                                                                        cat                             q
                                                                                          q
                                                                         q
                                         0




                                                0       20        40           60         80    100         120

                                                                               get


 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                     wordspace.collocations.de         14 / 107
Introduction      The distributional hypothesis


Geometric interpretation
                                                        Two dimensions of English V−Obj DSM
       similarity = spatial




                                         120
       proximity
       (Euclidean dist.)




                                         100
       location depends on                                             knife
       frequency of noun                                                q




                                         80
       (fdog ≈ 2.7 · fcat )                                        q
                                   use
       direction more
                                         60
                                                                                               α = 54.3°
       important than                    40

       location
                                                                          boat        q
       normalise “length”                                                  q
                                         20




        xdog of vector                                                                                     dog
                                                                        cat                                 q
                                                                                          q
       or use angle α as                                                 q
                                         0




       distance measure                         0       20        40           60         80       100       120

                                                                               get


 © Evert/Baroni/Lenci (CC-by-sa)               DSM Tutorial                     wordspace.collocations.de          14 / 107
Introduction   The distributional hypothesis


Semantic distances
                                                                                             Word space clustering of concrete nouns (V−Obj from BNC)


        main result of distributional




                                                                   1.2
                                                                   1.0
        analysis are “semantic”




                                                                   0.8
        distances between words




                                                                   0.6
                                                    Cluster size

                                                                   0.4
        typical applications




                                                                   0.2
                nearest neighbours




                                                                   0.0
                                                                                qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq




                                                                                       potato
                                                                                        onion
                                                                                           cat
                                                                                     banana
                                                                                     chicken
                                                                                 mushroom
                                                                                         corn
                                                                                          dog
                                                                                         pear
                                                                                       cherry
                                                                                      lettuce
                                                                                    penguin
                                                                                        swan
                                                                                        eagle
                                                                                           owl
                                                                                         duck
                                                                                    elephant
                                                                                           pig
                                                                                          cow
                                                                                          lion
                                                                                  helicopter
                                                                                    peacock
                                                                                        turtle
                                                                                           car
                                                                                  pineapple
                                                                                         boat
                                                                                       rocket
                                                                                        truck
                                                                                 motorcycle
                                                                                         snail
                                                                                         ship
                                                                                       chisel
                                                                                    scissors
                                                                                screwdriver
                                                                                       pencil
                                                                                    hammer
                                                                                  telephone
                                                                                         knife
                                                                                       spoon
                                                                                          pen
                                                                                        kettle
                                                                                        bottle
                                                                                          cup
                                                                                         bowl
                clustering of related words
                construct semantic map
                                                                                                                             Semantic map (V−Obj from BNC)




                                                    0.6
                                                                                                                                 potato                   kettle
                                                                                                                onion                                         q                                      q   bird
                                                                                                                    q                 q
                                                                                                                                                                                                     q   groundAnimal
                                                                                         mushroom                                                                                                    q   fruitTree




                                                    0.4
                                                                                                    q               chicken                                                                          q   green
                                                                                                                q                                                  cup
                                                                                                                                                                    q                                q   tool
                                                                                                                         q   banana                                                                  q   vehicle
                                                                                         cat                                                                      bowl
                                                                                             q                  lettuce                                            q            bottle

                                                    0.2
                                                                                                                        q                                                              q                    pen
                                                                                              cherry        q                             q                                                                  q
                                                                                                                                       corn
                                                                                                        dog      q       q    pear
                                                                               pig           lion
                                                                                              q                              pineapple
                                                                               q
                                                                                                                                q
                                                                                                                                                                                           spoon
                                                    0.0


                                                                                              q                 ship                          q
                                                                                     cow                                                          car                                        q
                                                                                                                    q                     boat    q
                                                                                elephant                                                                      telephone        q           q knife
                                                                                                                     snail
                                                                                        qq                                                                                                 pencil
                                                                     eagle duck                                          q
                                                                                                                                    rocket                                                   q
                                                                       q
                                                    −0.2




                                                                                                                                          q
                                                                         q owl
                                                                   swan                                                          qq    motorcycle                             hammer
                                                                         q                        peacock                      truck                                                   q
                                                                                    q                   q
                                                                              penguin
                                                                                                        q
                                                                                                                     q   helicopter                                chisel
                                                    −0.4




                                                                                                    turtle                                                              q          q   screwdriver
                                                                                                                                                                               q
                                                                                                                                                                            scissors

                                                                             −0.4                       −0.2                        0.0                 0.2                   0.4                    0.6                0.8




 © Evert/Baroni/Lenci (CC-by-sa)           DSM Tutorial                                                                                       wordspace.collocations.de                                                       15 / 107
Introduction   General overview


Outline

     Introduction
         The distributional hypothesis
         General overview
         Three famous DSM examples

     Taxonomy of DSM parameters
        Definition & overview
        DSM parameters
        Examples

     Usage and evaluation of DSM
        What to do with DSM distances
        Evaluation: semantic similarity and relatedness
        Attributional similarity
        Relational similarity

 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   16 / 107
Introduction   General overview


Tutorial overview

        1. Introduction & examples
        2. Taxonomy of DSM parameters
        3. Usage and evaluation of DSM spaces
        4. Elements of matrix algebra
        5. Making sense of DSM
        6. Current research topics & future directions


     Realistically, we’ll get through parts 1–3 today. But you can find out about
     matrix algebra and the other advanced topics in the handouts available from
     the course Web site.




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   17 / 107
Introduction   General overview


Further information

            Handouts & other materials vailable from homepage at
                             http://wordspace.collocations.de/
will be extended during the next few months
            Tutorial is open source (CC), and can be downloaded from
                      http://r-forge.r-project.org/projects/wordspace/
            Compact DSM textbook in preparation for Synthesis Lectures
            on Human Language Technologies (Morgan & Claypool)



                          This tutorial is based on joint work with
                            Marco Baroni and Alessandro Lenci




 © Evert/Baroni/Lenci (CC-by-sa)             DSM Tutorial               wordspace.collocations.de   18 / 107
Introduction   General overview


A very brief history of DSM

            Introduced to computational linguistics in early 1990s
            following the probabilistic revolution (Schütze 1992, 1998)
            Other early work in psychology (Landauer and Dumais 1997;
            Lund and Burgess 1996)
influenced by Latent Semantic Indexing (Dumais et al. 1988)
                 and efficient software implementations (Berry 1992)




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   19 / 107
Introduction   General overview


A very brief history of DSM

            Introduced to computational linguistics in early 1990s
            following the probabilistic revolution (Schütze 1992, 1998)
            Other early work in psychology (Landauer and Dumais 1997;
            Lund and Burgess 1996)
influenced by Latent Semantic Indexing (Dumais et al. 1988)
                 and efficient software implementations (Berry 1992)
            Renewed interest in recent years
                    2007: CoSMo Workshop (at Context ’07)




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   19 / 107
Introduction   General overview


A very brief history of DSM

            Introduced to computational linguistics in early 1990s
            following the probabilistic revolution (Schütze 1992, 1998)
            Other early work in psychology (Landauer and Dumais 1997;
            Lund and Burgess 1996)
influenced by Latent Semantic Indexing (Dumais et al. 1988)
                 and efficient software implementations (Berry 1992)
            Renewed interest in recent years
                    2007: CoSMo Workshop (at Context ’07)
                    2008: ESSLLI Lexical Semantics Workshop & Shared Task,
                    Special Issue of the Italian Journal of Linguistics




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   19 / 107
Introduction   General overview


A very brief history of DSM

            Introduced to computational linguistics in early 1990s
            following the probabilistic revolution (Schütze 1992, 1998)
            Other early work in psychology (Landauer and Dumais 1997;
            Lund and Burgess 1996)
influenced by Latent Semantic Indexing (Dumais et al. 1988)
                 and efficient software implementations (Berry 1992)
            Renewed interest in recent years
                    2007: CoSMo Workshop (at Context ’07)
                    2008: ESSLLI Lexical Semantics Workshop & Shared Task,
                    Special Issue of the Italian Journal of Linguistics
                    2009: GeMS Workshop (EACL 2009), DiSCo Workshop
                    (CogSci 2009), ESSLLI Advanced Course on DSM




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   19 / 107
Introduction   General overview


A very brief history of DSM

            Introduced to computational linguistics in early 1990s
            following the probabilistic revolution (Schütze 1992, 1998)
            Other early work in psychology (Landauer and Dumais 1997;
            Lund and Burgess 1996)
influenced by Latent Semantic Indexing (Dumais et al. 1988)
                 and efficient software implementations (Berry 1992)
            Renewed interest in recent years
                    2007: CoSMo Workshop (at Context ’07)
                    2008: ESSLLI Lexical Semantics Workshop & Shared Task,
                    Special Issue of the Italian Journal of Linguistics
                    2009: GeMS Workshop (EACL 2009), DiSCo Workshop
                    (CogSci 2009), ESSLLI Advanced Course on DSM
                    2010: 2nd GeMS Workshop (ACL 2010), ESSLLI Workhsop on
                    Compositionality & DSM, Special Issue of JNLE (in prep.),
                    Computational Neurolinguistics Workshop (NAACL-HLT 2010
                    — don’t miss it this Sunday!)

 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   19 / 107
Introduction   General overview


Some applications in computational linguistics

            Unsupervised part-of-speech induction (Schütze 1995)
            Word sense disambiguation (Schütze 1998)
            Query expansion in information retrieval (Grefenstette 1994)
            Synonym tasks & other language tests
             (Landauer and Dumais 1997; Turney et al. 2003)
            Thesaurus compilation (Lin 1998a; Rapp 2004)
            Ontology & wordnet expansion (Pantel et al. 2009)
            Attachment disambiguation (Pantel 2000)
            Probabilistic language models (Bengio et al. 2003)
            Subsymbolic input representation for neural networks
            Many other tasks in computational semantics:
            entailment detection, noun compound interpretation,
            identification of noncompositional expressions, . . .

 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial               wordspace.collocations.de   20 / 107
Introduction   Three famous DSM examples


Outline

     Introduction
         The distributional hypothesis
         General overview
         Three famous DSM examples

     Taxonomy of DSM parameters
        Definition & overview
        DSM parameters
        Examples

     Usage and evaluation of DSM
        What to do with DSM distances
        Evaluation: semantic similarity and relatedness
        Attributional similarity
        Relational similarity

 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                wordspace.collocations.de   21 / 107
Introduction   Three famous DSM examples


Latent Semantic Analysis (Landauer and Dumais 1997)
            Corpus: 30,473 articles from Grolier’s Academic American
            Encyclopedia (4.6 million words in total)
articles were limited to first 2,000 characters
            Word-article frequency matrix for 60,768 words
                    row vector shows frequency of word in each article
            Logarithmic frequencies scaled by word entropy
            Reduced to 300 dim. by singular value decomposition (SVD)
                 borrowed from LSI (Dumais et al. 1988)
central claim: SVD reveals latent semantic features,
                 not just a data reduction technique
            Evaluated on TOEFL synonym test (80 items)
                    LSA model achieved 64.4% correct answers
                    also simulation of learning rate based on TOEFL results



 © Evert/Baroni/Lenci (CC-by-sa)            DSM Tutorial                wordspace.collocations.de   22 / 107
Introduction   Three famous DSM examples


Word Space (Schütze 1992, 1993, 1998)

            Corpus: ≈ 60 million words of news messages (New York
            Times News Service)
            Word-word co-occurrence matrix
                    20,000 target words & 2,000 context words as features
                    row vector records how often each context word occurs close
                    to the target word (co-occurrence)
                    co-occurrence window: left/right 50 words (Schütze 1998)
                    or ≈ 1000 characters (Schütze 1992)
            Rows weighted by inverse document frequency (tf.idf)
            Context vector = centroid of word vectors (bag-of-words)
goal: determine “meaning” of a context
            Reduced to 100 SVD dimensions (mainly for efficiency)
            Evaluated on unsupervised word sense induction by clustering
            of context vectors (for an ambiguous word)
                    induced word senses improve information retrieval performance
 © Evert/Baroni/Lenci (CC-by-sa)           DSM Tutorial                wordspace.collocations.de   23 / 107
Introduction   Three famous DSM examples


HAL (Lund and Burgess 1996)

            HAL = Hyperspace Analogue to Language
            Corpus: 160 million words from newsgroup postings
            Word-word co-occurrence matrix
                    same 70,000 words used as targets and features
                    co-occurrence window of 1 – 10 words
            Separate counts for left and right co-occurrence
                    i.e. the context is structured
            In later work, co-occurrences are weighted by (inverse)
            distance (Li et al. 2000)
            Applications include construction of semantic vocabulary
            maps by multidimensional scaling to 2 dimensions




 © Evert/Baroni/Lenci (CC-by-sa)             DSM Tutorial                wordspace.collocations.de   24 / 107
Introduction   Three famous DSM examples


Many parameters . . .

            Enormous range of DSM parameters and applications
            Examples showed three entirely different models, each tuned
            to its particular application
       ¯ Need overview of DSM parameters & understand their effects




 © Evert/Baroni/Lenci (CC-by-sa)          DSM Tutorial                wordspace.collocations.de   25 / 107
Taxonomy of DSM parameters   Definition & overview


Outline

     Introduction
         The distributional hypothesis
         General overview
         Three famous DSM examples

     Taxonomy of DSM parameters
        Definition & overview
        DSM parameters
        Examples

     Usage and evaluation of DSM
        What to do with DSM distances
        Evaluation: semantic similarity and relatedness
        Attributional similarity
        Relational similarity

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   26 / 107
Taxonomy of DSM parameters   Definition & overview


General definition of DSMs


     A distributional semantic model (DSM) is a scaled and/or
     transformed co-occurrence matrix M, such that each row x
     represents the distribution of a target term across contexts.

                             get         see         use         hear             eat         kill
              knife         0.027      -0.024       0.206       -0.022          -0.044      -0.042
               cat          0.031       0.143      -0.243       -0.015          -0.009      0.131
               dog         -0.026       0.021      -0.212       0.064           0.013       0.014
              boat         -0.022       0.009      -0.044       -0.040          -0.074      -0.042
               cup         -0.014      -0.173      -0.249       -0.099          -0.119      -0.042
                pig        -0.069       0.094      -0.158       0.000           0.094       0.265
            banana          0.047      -0.139      -0.104       -0.022          0.267       -0.042

     Term = word, lemma, phrase, morpheme, . . .

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   27 / 107
Taxonomy of DSM parameters   Definition & overview


General definition of DSMs

     Mathematical notation:
         m × n co-occurrence matrix M (example: 7 × 6 matrix)
                    m rows = target terms
                    n columns = features or dimensions
                                                          
                                         x11 x12 · · · x1n
                                        x21 x22 · · · x2n 
                                 M= .
                                                          
                                               .        . 
                                        ..    .
                                               .        . 
                                                        .
                                        xm1 xm2 · · · xmn
            distribution vector xi = i-th row of M, e.g. x3 = xdog
            components xi = (xi1 , xi2 , . . . , xin ) = features of i-th term:

                       x3 = (−0.026, 0.021, −0.212, 0.064, 0.013, 0.014)
                            = (x31 , x32 , x33 , x34 , x35 , x36 )


 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   28 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                      Linguistic pre-processing (definition of terms)




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                      Linguistic pre-processing (definition of terms)
                                             ⇓
                           Term-context vs. term-term matrix




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                        Normalisation of rows and/or columns




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                        Normalisation of rows and/or columns
                                           ⇓
                            Similarity / distance measure



 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   Definition & overview


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                        Normalisation of rows and/or columns
                                           ⇓
                            Similarity / distance measure
                                           ⇓
                                     Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de   29 / 107
Taxonomy of DSM parameters   DSM parameters


Outline

     Introduction
         The distributional hypothesis
         General overview
         Three famous DSM examples

     Taxonomy of DSM parameters
        Definition & overview
        DSM parameters
        Examples

     Usage and evaluation of DSM
        What to do with DSM distances
        Evaluation: semantic similarity and relatedness
        Attributional similarity
        Relational similarity

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   30 / 107
Taxonomy of DSM parameters   DSM parameters


Corpus pre-processing

            Minimally, corpus must be tokenised § identify terms
            Linguistic annotation
                    part-of-speech tagging
                    lemmatisation / stemming
                    word sense disambiguation (rare)
                    shallow syntactic patterns
                    dependency parsing




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   31 / 107
Taxonomy of DSM parameters   DSM parameters


Corpus pre-processing

            Minimally, corpus must be tokenised § identify terms
            Linguistic annotation
                    part-of-speech tagging
                    lemmatisation / stemming
                    word sense disambiguation (rare)
                    shallow syntactic patterns
                    dependency parsing
            Generalisation of terms
                    often lemmatised to reduce data sparseness:
                    go, goes, went, gone, going § go
                    POS disambiguation (light/N vs. light/A vs. light/V)
                    word sense disambiguation (bankriver vs. bankfinance )




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   31 / 107
Taxonomy of DSM parameters   DSM parameters


Corpus pre-processing

            Minimally, corpus must be tokenised § identify terms
            Linguistic annotation
                    part-of-speech tagging
                    lemmatisation / stemming
                    word sense disambiguation (rare)
                    shallow syntactic patterns
                    dependency parsing
            Generalisation of terms
                    often lemmatised to reduce data sparseness:
                    go, goes, went, gone, going § go
                    POS disambiguation (light/N vs. light/A vs. light/V)
                    word sense disambiguation (bankriver vs. bankfinance )
            Trade-off between deeper linguistic analysis and
                    need for language-specific resources
                    possible errors introduced at each stage of the analysis
                    even more parameters to optimise / cognitive plausibility
 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   31 / 107
Taxonomy of DSM parameters   DSM parameters


Effects of pre-processing
                               Nearest neighbours of walk (BNC)

         word forms                                             lemmatised corpus
                 stroll                                             hurry
                 walking                                            stroll
                 walked                                             stride
                 go                                                 trudge
                 path                                               amble
                 drive                                              wander
                 ride                                               walk-nn
                 wander                                             walking
                 sprinted                                           retrace
                 sauntered                                          scuttle

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                 wordspace.collocations.de   32 / 107
Taxonomy of DSM parameters   DSM parameters


Effects of pre-processing
                        Nearest neighbours of arrivare (Repubblica)

         word forms                                             lemmatised corpus
                 giungere                                           giungere
                 raggiungere                                        aspettare
                 arrivi                                             attendere
                 raggiungimento                                     arrivo-nn
                 raggiunto                                          ricevere
                 trovare                                            accontentare
                 raggiunge                                          approdare
                 arrivasse                                          pervenire
                 arriverà                                           venire
                 concludere                                         piombare

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                wordspace.collocations.de   33 / 107
Taxonomy of DSM parameters   DSM parameters


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                        Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                        Normalisation of rows and/or columns
                                           ⇓
                            Similarity / distance measure
                                           ⇓
                                     Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   34 / 107
Taxonomy of DSM parameters   DSM parameters


Term-context vs. term-term matrix

     Term-context matrix records frequency of term in each individual
     context (typically a sentence or document)

                                          doc1       doc2       doc3      ···
                                   boat    1          3          0        ···
                                   cat     0          0          2        ···
                                   dog     1          0          1        ···

            Typical contexts are non-overlapping textual units (Web page,
            encyclopaedia article, paragraph, sentence, . . . )




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial               wordspace.collocations.de   35 / 107
Taxonomy of DSM parameters   DSM parameters


Term-context vs. term-term matrix

     Term-context matrix records frequency of term in each individual
     context (typically a sentence or document)

                                          doc1       doc2       doc3      ···
                                   boat    1          3          0        ···
                                   cat     0          0          2        ···
                                   dog     1          0          1        ···

            Typical contexts are non-overlapping textual units (Web page,
            encyclopaedia article, paragraph, sentence, . . . )
            Contexts can also be generalised, e.g.
                    bag of content words
                    specific pattern of POS tags
                    subcategorisation pattern of target verb
            Term-context matrix is usually very sparse

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial               wordspace.collocations.de   35 / 107
Taxonomy of DSM parameters    DSM parameters


Term-context vs. term-term matrix

     Term-term matrix records co-occurrence frequencies of context
     terms for each target term (often target terms = context terms)

                                             see      use         hear     ···
                                   boat      39       23            4      ···
                                   cat       58        4            4      ···
                                   dog       83       10           42      ···




 © Evert/Baroni/Lenci (CC-by-sa)                   DSM Tutorial                  wordspace.collocations.de   36 / 107
Taxonomy of DSM parameters    DSM parameters


Term-context vs. term-term matrix

     Term-term matrix records co-occurrence frequencies of context
     terms for each target term (often target terms = context terms)

                                             see      use         hear     ···
                                   boat      39       23            4      ···
                                   cat       58        4            4      ···
                                   dog       83       10           42      ···

            Different types of contexts (Evert 2008)
                    surface context (word or character window)
                    textual context (non-overlapping segments)
                    syntactic contxt (specific syntagmatic relation)
            Can be seen as smoothing of term-context matrix
                    average over similar contexts (with same context terms)
                    data sparseness reduced, except for small windows

 © Evert/Baroni/Lenci (CC-by-sa)                   DSM Tutorial                  wordspace.collocations.de   36 / 107
Taxonomy of DSM parameters   DSM parameters


Overview of DSM parameters

                 Linguistic pre-processing (definition of terms)
                                        ⇓
                      Term-context vs. term-term matrix
                                        ⇓
            Size & type of context / structured vs. unstructered
                                        ⇓
                   Geometric vs. probabilistic interpretation
                                        ⇓
                                 Feature scaling
                                        ⇓
                    Normalisation of rows and/or columns
                                        ⇓
                         Similarity / distance measure
                                        ⇓
                                  Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   37 / 107
Taxonomy of DSM parameters   DSM parameters


Surface context

       Context term occurs within a window of k words around target.

     The silhouette of the sun beyond a wide-open bay on the lake; the
     sun still glitters although evening has arrived in Kuhmo. It’s
     midsummer; the living room has its instruments and other objects
     in each of its corners.

     Parameters:
         window size (in words or characters)
            symmetric vs. one-sided window
            uniform or “triangular” (distance-based) weighting
            window clamped to sentences or other textual units?



 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   38 / 107
Taxonomy of DSM parameters   DSM parameters


Effect of different window sizes
                               Nearest neighbours of dog (BNC)

         2-word window                                          30-word window
                 cat                                                kennel
                 horse                                              puppy
                 fox                                                pet
                 pet                                                bitch
                 rabbit                                             terrier
                 pig                                                rottweiler
                 animal                                             canine
                 mongrel                                            cat
                 sheep                                              to bark
                 pigeon                                             Alsatian


 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                 wordspace.collocations.de   39 / 107
Taxonomy of DSM parameters   DSM parameters


Textual context

                Context term is in the same linguistic unit as target.

     The silhouette of the sun beyond a wide-open bay on the lake; the
     sun still glitters although evening has arrived in Kuhmo. It’s
     midsummer; the living room has its instruments and other objects
     in each of its corners.

     Parameters:
         type of linguistic unit
                    sentence
                    paragraph
                    turn in a conversation
                    Web page




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   40 / 107
Taxonomy of DSM parameters   DSM parameters


Syntactic context

           Context term is linked to target by a syntactic dependency
                          (e.g. subject, modifier, . . . ).

     The silhouette of the sun beyond a wide-open bay on the lake; the
     sun still glitters although evening has arrived in Kuhmo. It’s
     midsummer; the living room has its instruments and other objects
     in each of its corners.

     Parameters:
         types of syntactic dependency (Padó and Lapata 2007)
         direct vs. indirect dependency paths
                    direct dependencies
                    direct + indirect dependencies
            homogeneous data (e.g. only verb-object) vs.
            heterogeneous data (e.g. all children and parents of the verb)
            maximal length of dependency path
 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   41 / 107
Taxonomy of DSM parameters   DSM parameters


“Knowledge pattern” context

         Context term is linked to target by a lexico-syntactic pattern
      (text mining, cf. Hearst 1992, Pantel & Pennacchiotti 2008, etc.).

     In Provence, Van Gogh painted with bright colors such as red and
     yellow. These colors produce incredible effects on anybody looking
     at his paintings.

     Parameters:
         inventory of lexical patterns
                    lots of research to identify semantically interesting patterns
                    (cf. Almuhareb & Poesio 2004, Veale & Hao 2008, etc.)
            fixed vs. flexible patterns
                    patterns are mined from large corpora and automatically
                    generalised (optional elements, POS tags or semantic classes)


 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   42 / 107
Taxonomy of DSM parameters   DSM parameters


Structured vs. unstructured context




            In unstructered models, context specification acts as a filter
                    determines whether context tokens counts as co-occurrence
                    e.g. linked by specific syntactic relation such as verb-object




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   43 / 107
Taxonomy of DSM parameters   DSM parameters


Structured vs. unstructured context




            In unstructered models, context specification acts as a filter
                    determines whether context tokens counts as co-occurrence
                    e.g. linked by specific syntactic relation such as verb-object

            In structured models, context words are subtyped
                    depending on their position in the context
                    e.g. left vs. right context, type of syntactic relation, etc.




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   43 / 107
Taxonomy of DSM parameters   DSM parameters


Structured vs. unstructured surface context

     A dog bites a man. The man’s dog bites a dog. A dog bites a man.

                                        unstructured            bite
                                                 dog             4
                                                man              3




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   44 / 107
Taxonomy of DSM parameters   DSM parameters


Structured vs. unstructured surface context

     A dog bites a man. The man’s dog bites a dog. A dog bites a man.

                                        unstructured              bite
                                                 dog               4
                                                man                3



     A dog bites a man. The man’s dog bites a dog. A dog bites a man.

                                    structured           bite-l    bite-r
                                           dog             3         1
                                          man              1         2




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial               wordspace.collocations.de   44 / 107
Taxonomy of DSM parameters   DSM parameters


Structured vs. unstructured dependency context

     A dog bites a man. The man’s dog bites a dog. A dog bites a man.

                                        unstructured            bite
                                                 dog             4
                                                man              2




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   45 / 107
Taxonomy of DSM parameters   DSM parameters


Structured vs. unstructured dependency context

     A dog bites a man. The man’s dog bites a dog. A dog bites a man.

                                        unstructured            bite
                                                 dog             4
                                                man              2



     A dog bites a man. The man’s dog bites a dog. A dog bites a man.

                                   structured       bite-subj      bite-obj
                                          dog           3             1
                                         man            0             2




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   45 / 107
Taxonomy of DSM parameters   DSM parameters


Comparison
            Unstructured context
                    data less sparse (e.g. man kills and kills man both map to the
                    kill dimension of the vector xman )

            Structured context
                    more sensitive to semantic distinctions
                    (kill-subj and kill-obj are rather different things!)
                    dependency relations provide a form of syntactic “typing” of
                    the DSM dimensions (the “subject” dimensions, the
                    “recipient” dimensions, etc.)
                    important to account for word-order and compositionality




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   46 / 107
Taxonomy of DSM parameters   DSM parameters


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                     Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                       Normalisation of rows and/or columns
                                           ⇓
                            Similarity / distance measure
                                           ⇓
                                     Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   47 / 107
Taxonomy of DSM parameters   DSM parameters


Geometric vs. probabilistic interpretation
            Geometric interpretation
                    row vectors as points or arrows in n-dim. space
                    very intuitive, good for visualisation
                    use techniques from geometry and linear algebra




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   48 / 107
Taxonomy of DSM parameters   DSM parameters


Geometric vs. probabilistic interpretation
            Geometric interpretation
                    row vectors as points or arrows in n-dim. space
                    very intuitive, good for visualisation
                    use techniques from geometry and linear algebra

            Probabilistic interpretation
                    co-occurrence matrix as observed sample statistic
                    “explained” by generative probabilistic model
                    recent work focuses on hierarchical Bayesian models
                    probabilistic LSA (Hoffmann 1999), Latent Semantic
                    Clustering (Rooth et al. 1999), Latent Dirichlet Allocation
                    (Blei et al. 2003), etc.
                    explicitly accounts for random variation of frequency counts
                    intuitive and plausible as topic model




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   48 / 107
Taxonomy of DSM parameters   DSM parameters


Geometric vs. probabilistic interpretation
            Geometric interpretation
                    row vectors as points or arrows in n-dim. space
                    very intuitive, good for visualisation
                    use techniques from geometry and linear algebra

            Probabilistic interpretation
                    co-occurrence matrix as observed sample statistic
                    “explained” by generative probabilistic model
                    recent work focuses on hierarchical Bayesian models
                    probabilistic LSA (Hoffmann 1999), Latent Semantic
                    Clustering (Rooth et al. 1999), Latent Dirichlet Allocation
                    (Blei et al. 2003), etc.
                    explicitly accounts for random variation of frequency counts
                    intuitive and plausible as topic model
focus exclusively on geometric interpretation in this tutorial

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   48 / 107
Taxonomy of DSM parameters   DSM parameters


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                   Feature scaling
                                           ⇓
                        Normalisation of rows and/or columns
                                           ⇓
                            Similarity / distance measure
                                           ⇓
                                     Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   49 / 107
Taxonomy of DSM parameters   DSM parameters


Feature scaling

     Feature scaling is used to “discount” less important features:
            Logarithmic scaling: x = log(x + 1)
            (cf. Weber-Fechner law for human perception)




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   50 / 107
Taxonomy of DSM parameters   DSM parameters


Feature scaling

     Feature scaling is used to “discount” less important features:
            Logarithmic scaling: x = log(x + 1)
            (cf. Weber-Fechner law for human perception)
            Relevance weighting, e.g. tf.idf (information retrieval)




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   50 / 107
Taxonomy of DSM parameters   DSM parameters


Feature scaling

     Feature scaling is used to “discount” less important features:
            Logarithmic scaling: x = log(x + 1)
            (cf. Weber-Fechner law for human perception)
            Relevance weighting, e.g. tf.idf (information retrieval)
            Statistical association measures (Evert 2004, 2008) take
            frequency of target word and context feature into account
                    the less frequent the target word and (more importantly) the
                    context feature are, the higher the weight given to their
                    observed co-occurrence count should be (because their
                    expected chance co-occurrence frequency is low)
                    different measures – e.g., mutual information, log-likelihood
                    ratio – differ in how they balance observed and expected
                    co-occurrence frequencies




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   50 / 107
Taxonomy of DSM parameters   DSM parameters


Association measures: Mutual Information (MI)

                    word1          word2                 fobs        f1         f2
                    dog            small                 855    33,338    490,580
                    dog            domesticated           29    33,338        918




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   51 / 107
Taxonomy of DSM parameters   DSM parameters


Association measures: Mutual Information (MI)

                    word1          word2                 fobs            f1         f2
                    dog            small                 855        33,338    490,580
                    dog            domesticated           29        33,338        918

     Expected co-occurrence frequency:
                                                         f1 · f 2
                                              fexp =
                                                           N




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                 wordspace.collocations.de   51 / 107
Taxonomy of DSM parameters    DSM parameters


Association measures: Mutual Information (MI)

                    word1          word2                 fobs            f1         f2
                    dog            small                 855        33,338    490,580
                    dog            domesticated           29        33,338        918

     Expected co-occurrence frequency:
                                                         f1 · f 2
                                              fexp =
                                                           N
     Mutual Information compares observed vs. expected frequency:
                                                         fobs        N · fobs
                            MI(w1 , w2 ) = log2               = log2
                                                         fexp         f 1 · f2




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                 wordspace.collocations.de   51 / 107
Taxonomy of DSM parameters    DSM parameters


Association measures: Mutual Information (MI)

                    word1          word2                 fobs            f1         f2
                    dog            small                 855        33,338    490,580
                    dog            domesticated           29        33,338        918

     Expected co-occurrence frequency:
                                                         f1 · f 2
                                              fexp =
                                                           N
     Mutual Information compares observed vs. expected frequency:
                                                         fobs        N · fobs
                            MI(w1 , w2 ) = log2               = log2
                                                         fexp         f 1 · f2

     Disadvantage: MI overrates combinations of rare terms.


 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                 wordspace.collocations.de   51 / 107
Taxonomy of DSM parameters   DSM parameters


Other association measures

     Log-likelihood ratio (Dunning 1993) has more complex form, but
     its “core” is known as local MI (Evert 2004).

                              local-MI(w1 , w2 ) = fobs · MI(w1 , w2 )




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   52 / 107
Taxonomy of DSM parameters   DSM parameters


Other association measures

     Log-likelihood ratio (Dunning 1993) has more complex form, but
     its “core” is known as local MI (Evert 2004).

                              local-MI(w1 , w2 ) = fobs · MI(w1 , w2 )

                     word1         word2                 fobs      MI     local-MI
                     dog           small                 855      3.96     3382.87
                     dog           domesticated           29      6.85      198.76
                     dog           sgjkj                    1    10.31       10.31




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   52 / 107
Taxonomy of DSM parameters   DSM parameters


Other association measures

     Log-likelihood ratio (Dunning 1993) has more complex form, but
     its “core” is known as local MI (Evert 2004).

                              local-MI(w1 , w2 ) = fobs · MI(w1 , w2 )

                     word1         word2                 fobs      MI     local-MI
                     dog           small                 855      3.96     3382.87
                     dog           domesticated           29      6.85      198.76
                     dog           sgjkj                    1    10.31       10.31

     The t-score measure (Church and Hanks 1990) is popular in
     lexicography:
                                          fobs − fexp
                       t-score(w1 , w2 ) = √
                                               fobs
     Details & many more measures: http://www.collocations.de/

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   52 / 107
Taxonomy of DSM parameters   DSM parameters


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                      Normalisation of rows and/or columns
                                           ⇓
                            Similarity / distance measure
                                           ⇓
                                     Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   53 / 107
Taxonomy of DSM parameters         DSM parameters


Normalisation of row vectors

                                                                          Two dimensions of English V−Obj DSM
        geometric distances only




                                                                120
        make sense if vectors are




                                                                100
        normalised to unit length
                                                                                      knife
        divide vector by its length:                                                   q




                                                                80
                                                                                  q




                                                         use
                         x/ x



                                                                60
                                                                40
        normalisation depends on
                                                                                         boat       q
        distance measure!                                                                 q

                                                                20
                                                                                                                   dog
                                                                                                                    q
        special case: scale to                                                         cat
                                                                                        q
                                                                                                        q
                                                                0



        relative frequencies with                                     0   20     40           60        80   100     120

         x 1 = |x1 | + · · · + |xn |                                                          get




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                          wordspace.collocations.de           54 / 107
Taxonomy of DSM parameters   DSM parameters


Scaling of column vectors

            In statistical analysis and machine learning, features are
            usually centred and scaled so that

                                                  mean          µ=0
                                              variance          σ2 = 1

            In DSM research, this step is less common for columns of M
                    centring is a prerequisite for certain dimensionality reduction
                    and data analysis techniques (esp. PCA)
                    scaling may give too much weight to rare features




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   55 / 107
Taxonomy of DSM parameters   DSM parameters


Scaling of column vectors

            In statistical analysis and machine learning, features are
            usually centred and scaled so that

                                                  mean          µ=0
                                              variance          σ2 = 1

            In DSM research, this step is less common for columns of M
                    centring is a prerequisite for certain dimensionality reduction
                    and data analysis techniques (esp. PCA)
                    scaling may give too much weight to rare features
            M cannot be row-normalised and column-scaled at the same
            time (result depends on ordering of the two steps)




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   55 / 107
Taxonomy of DSM parameters   DSM parameters


Overview of DSM parameters

                    Linguistic pre-processing (definition of terms)
                                           ⇓
                         Term-context vs. term-term matrix
                                           ⇓
                 Size & type of context / structured vs. unstructered
                                           ⇓
                       Geometric vs. probabilistic interpretation
                                           ⇓
                                    Feature scaling
                                           ⇓
                        Normalisation of rows and/or columns
                                           ⇓
                           Similarity / distance measure
                                           ⇓
                                     Compression

 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial             wordspace.collocations.de   56 / 107
Taxonomy of DSM parameters   DSM parameters


Geometric distance

          Distance between vectors                                 x2

          u, v ∈ Rn § (dis)similarity                              6
                                                                                                u
                  u = (u1 , . . . , un )
                                                                   5
                  v = (v1 , . . . , vn )                                                            d1 (u, v) = 5
                                                                   4
                                                                          d2 (u, v) = 3.6
                                                                   3

                                                                                                            v
                                                                   2

                                                                   1


                                                                                                                    x1
                                                                            1      2        3   4       5       6




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de            57 / 107
Taxonomy of DSM parameters   DSM parameters


Geometric distance

          Distance between vectors                                 x2

          u, v ∈ Rn § (dis)similarity                              6
                                                                                                u
                  u = (u1 , . . . , un )
                                                                   5
                  v = (v1 , . . . , vn )                                                            d1 (u, v) = 5
                                                                   4
          Euclidean distance d2 (u, v)                                    d2 (u, v) = 3.6
                                                                   3

                                                                                                            v
                                                                   2

                                                                   1


                                                                                                                    x1
                                                                            1      2        3   4       5       6




                        d2 (u, v) :=        (u1 − v1 )2 + · · · + (un − vn )2




 © Evert/Baroni/Lenci (CC-by-sa)                 DSM Tutorial                   wordspace.collocations.de            57 / 107
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models
Distributional semantic models

Weitere ähnliche Inhalte

Was ist angesagt?

πως λειτουργούν οι μυες
πως λειτουργούν οι μυεςπως λειτουργούν οι μυες
πως λειτουργούν οι μυεςKostas Chorianopoulos
 
Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...
Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...
Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...Sociedad Yoruba Mexico
 
Η γιορτή των Τριών Ιεραρχών
Η γιορτή των Τριών ΙεραρχώνΗ γιορτή των Τριών Ιεραρχών
Η γιορτή των Τριών ΙεραρχώνDimitra Mylonaki
 
ιαπωνικα θρησκευματα
ιαπωνικα θρησκευματαιαπωνικα θρησκευματα
ιαπωνικα θρησκευματαEleni Athanasiou
 
Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄
Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄
Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄Χρήστος Χαρμπής
 

Was ist angesagt? (10)

Παραδοσιακές συνταγές της Πρέβεζας
 Παραδοσιακές  συνταγές  της Πρέβεζας Παραδοσιακές  συνταγές  της Πρέβεζας
Παραδοσιακές συνταγές της Πρέβεζας
 
petrelaio
petrelaiopetrelaio
petrelaio
 
πως λειτουργούν οι μυες
πως λειτουργούν οι μυεςπως λειτουργούν οι μυες
πως λειτουργούν οι μυες
 
Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...
Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...
Orishas que Hablan por Ifá y la Búsqueda del Caracol de Oggún: Dos Tareas Tit...
 
Moxloi
MoxloiMoxloi
Moxloi
 
Η γιορτή των Τριών Ιεραρχών
Η γιορτή των Τριών ΙεραρχώνΗ γιορτή των Τριών Ιεραρχών
Η γιορτή των Τριών Ιεραρχών
 
αγιο ορος
αγιο οροςαγιο ορος
αγιο ορος
 
ιαπωνικα θρησκευματα
ιαπωνικα θρησκευματαιαπωνικα θρησκευματα
ιαπωνικα θρησκευματα
 
Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄
Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄
Φυσική Ε΄. Ενότητα 3. Κεφάλαιο 1: Η ενέργεια έχει πολλά ΄΄πρόσωπα΄΄
 
ΤΑ ΠΟΤΑΜΙΑ ΤΗΣ ΓΑΛΛΙΑΣ!
 ΤΑ ΠΟΤΑΜΙΑ ΤΗΣ ΓΑΛΛΙΑΣ! ΤΑ ΠΟΤΑΜΙΑ ΤΗΣ ΓΑΛΛΙΑΣ!
ΤΑ ΠΟΤΑΜΙΑ ΤΗΣ ΓΑΛΛΙΑΣ!
 

Kürzlich hochgeladen

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Kürzlich hochgeladen (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Distributional semantic models

  • 1. Distributional Semantic Models Tutorial at NAACL-HLT 2010, Los Angeles, CA — part 1 — Stefan Evert1 with contributions from Marco Baroni2 and Alessandro Lenci3 1 University of Osnabrück, Germany 2 University of Trento, Italy 3 University of Pisa, Italy Los Angeles, 1 June 2010 Copyright © 2009–2010 Baroni, Evert & Lenci | Licensed under CC-by-sa version 3.0 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 1 / 107
  • 2. Outline Outline Introduction The distributional hypothesis General overview Three famous DSM examples Taxonomy of DSM parameters Definition & overview DSM parameters Examples Usage and evaluation of DSM What to do with DSM distances Evaluation: semantic similarity and relatedness Attributional similarity Relational similarity © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 2 / 107
  • 3. Introduction The distributional hypothesis Outline Introduction The distributional hypothesis General overview Three famous DSM examples Taxonomy of DSM parameters Definition & overview DSM parameters Examples Usage and evaluation of DSM What to do with DSM distances Evaluation: semantic similarity and relatedness Attributional similarity Relational similarity © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 3 / 107
  • 4. Introduction The distributional hypothesis Meaning & distribution “Die Bedeutung eines Wortes liegt in seinem Gebrauch.” — Ludwig Wittgenstein © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 4 / 107
  • 5. Introduction The distributional hypothesis Meaning & distribution “Die Bedeutung eines Wortes liegt in seinem Gebrauch.” — Ludwig Wittgenstein “You shall know a word by the company it keeps!” — J. R. Firth (1957) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 4 / 107
  • 6. Introduction The distributional hypothesis Meaning & distribution “Die Bedeutung eines Wortes liegt in seinem Gebrauch.” — Ludwig Wittgenstein “You shall know a word by the company it keeps!” — J. R. Firth (1957) Distributional hypothesis (Zellig Harris 1954) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 4 / 107
  • 7. Introduction The distributional hypothesis What is the meaning of “bardiwac”? © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 8. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 9. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. Beef dishes are made to complement the bardiwacs. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 10. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. Beef dishes are made to complement the bardiwacs. Nigel staggered to his feet, face flushed from too much bardiwac. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 11. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. Beef dishes are made to complement the bardiwacs. Nigel staggered to his feet, face flushed from too much bardiwac. Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 12. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. Beef dishes are made to complement the bardiwacs. Nigel staggered to his feet, face flushed from too much bardiwac. Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine. I dined off bread and cheese and this excellent bardiwac. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 13. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. Beef dishes are made to complement the bardiwacs. Nigel staggered to his feet, face flushed from too much bardiwac. Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine. I dined off bread and cheese and this excellent bardiwac. The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 14. Introduction The distributional hypothesis What is the meaning of “bardiwac”? He handed her her glass of bardiwac. Beef dishes are made to complement the bardiwacs. Nigel staggered to his feet, face flushed from too much bardiwac. Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine. I dined off bread and cheese and this excellent bardiwac. The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish.
  • 15. bardiwac is a heavy red alcoholic beverage made from grapes © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 5 / 107
  • 16. Introduction The distributional hypothesis Real-life concordance & word sketch http://beta.sketchengine.co.uk/ © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 6 / 107
  • 17. Introduction The distributional hypothesis Real-life concordance & word sketch http://beta.sketchengine.co.uk/ © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 7 / 107
  • 18. Introduction The distributional hypothesis A thought experiment: deciphering hieroglyphs get sij ius hir iit kil (knife) naif 51 20 84 0 3 0 (cat) ket 52 58 4 4 6 26 ??? dog 115 83 10 42 33 17 (boat) beut 59 39 23 4 0 0 (cup) kap 98 14 6 2 1 0 (pig) pigij 12 17 3 2 9 27 (banana) nana 11 2 2 0 18 0 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 8 / 107
  • 19. Introduction The distributional hypothesis A thought experiment: deciphering hieroglyphs get sij ius hir iit kil (knife) naif 51 20 84 0 3 0 (cat) ket 52 58 4 4 6 26 ??? dog 115 83 10 42 33 17 (boat) beut 59 39 23 4 0 0 (cup) kap 98 14 6 2 1 0 (pig) pigij 12 17 3 2 9 27 (banana) nana 11 2 2 0 18 0 sim(dog, naif ) = 0.770 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 8 / 107
  • 20. Introduction The distributional hypothesis A thought experiment: deciphering hieroglyphs get sij ius hir iit kil (knife) naif 51 20 84 0 3 0 (cat) ket 52 58 4 4 6 26 ??? dog 115 83 10 42 33 17 (boat) beut 59 39 23 4 0 0 (cup) kap 98 14 6 2 1 0 (pig) pigij 12 17 3 2 9 27 (banana) nana 11 2 2 0 18 0 sim(dog, pigij ) = 0.939 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 8 / 107
  • 21. Introduction The distributional hypothesis A thought experiment: deciphering hieroglyphs get sij ius hir iit kil (knife) naif 51 20 84 0 3 0 (cat) ket 52 58 4 4 6 26 ??? dog 115 83 10 42 33 17 (boat) beut 59 39 23 4 0 0 (cup) kap 98 14 6 2 1 0 (pig) pigij 12 17 3 2 9 27 (banana) nana 11 2 2 0 18 0 sim(dog, ket ) = 0.961 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 8 / 107
  • 22. Introduction The distributional hypothesis English as seen by the computer . . . get see use hear eat kill get sij ius hir iit kil knife naif 51 20 84 0 3 0 cat ket 52 58 4 4 6 26 dog dog 115 83 10 42 33 17 boat beut 59 39 23 4 0 0 cup kap 98 14 6 2 1 0 pig pigij 12 17 3 2 9 27 banana nana 11 2 2 0 18 0 verb-object counts from British National Corpus © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 9 / 107
  • 23. Introduction The distributional hypothesis Geometric interpretation row vector xdog describes usage of word dog in the get see use hear eat kill corpus knife 51 20 84 0 3 0 cat 52 58 4 4 6 26 can be seen as dog 115 83 10 42 33 17 coordinates of point boat 59 39 23 4 0 0 in n-dimensional cup 98 14 6 2 1 0 Euclidean space pig 12 17 3 2 9 27 banana 11 2 2 0 18 0 co-occurrence matrix M © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 10 / 107
  • 24. Introduction The distributional hypothesis Geometric interpretation Two dimensions of English V−Obj DSM row vector xdog 120 describes usage of word dog in the 100 corpus knife can be seen as q 80 coordinates of point in n-dimensional use 60 Euclidean space illustrated for two 40 dimensions: boat q get and use 20 dog cat q xdog = (115, 10) q 0 0 20 40 60 80 100 120 get © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 11 / 107
  • 25. Introduction The distributional hypothesis Geometric interpretation Two dimensions of English V−Obj DSM similarity = spatial 120 proximity (Euclidean dist.) 100 location depends on knife frequency of noun q 80 (fdog ≈ 2.7 · fcat ) use 60 40 boat q d=5 7.5 20 dog cat d = 63.3 q q 0 0 20 40 60 80 100 120 get © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 12 / 107
  • 26. Introduction The distributional hypothesis Geometric interpretation Two dimensions of English V−Obj DSM similarity = spatial 120 proximity (Euclidean dist.) 100 location depends on knife frequency of noun q 80 (fdog ≈ 2.7 · fcat ) use direction more important than 60 40 location boat q 20 dog cat q q 0 0 20 40 60 80 100 120 get © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 13 / 107
  • 27. Introduction The distributional hypothesis Geometric interpretation Two dimensions of English V−Obj DSM similarity = spatial 120 proximity (Euclidean dist.) 100 location depends on knife frequency of noun q 80 (fdog ≈ 2.7 · fcat ) q use direction more important than 60 40 location boat q normalise “length” q 20 xdog of vector dog cat q q q 0 0 20 40 60 80 100 120 get © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 14 / 107
  • 28. Introduction The distributional hypothesis Geometric interpretation Two dimensions of English V−Obj DSM similarity = spatial 120 proximity (Euclidean dist.) 100 location depends on knife frequency of noun q 80 (fdog ≈ 2.7 · fcat ) q use direction more 60 α = 54.3° important than 40 location boat q normalise “length” q 20 xdog of vector dog cat q q or use angle α as q 0 distance measure 0 20 40 60 80 100 120 get © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 14 / 107
  • 29. Introduction The distributional hypothesis Semantic distances Word space clustering of concrete nouns (V−Obj from BNC) main result of distributional 1.2 1.0 analysis are “semantic” 0.8 distances between words 0.6 Cluster size 0.4 typical applications 0.2 nearest neighbours 0.0 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq potato onion cat banana chicken mushroom corn dog pear cherry lettuce penguin swan eagle owl duck elephant pig cow lion helicopter peacock turtle car pineapple boat rocket truck motorcycle snail ship chisel scissors screwdriver pencil hammer telephone knife spoon pen kettle bottle cup bowl clustering of related words construct semantic map Semantic map (V−Obj from BNC) 0.6 potato kettle onion q q bird q q q groundAnimal mushroom q fruitTree 0.4 q chicken q green q cup q q tool q banana q vehicle cat bowl q lettuce q bottle 0.2 q q pen cherry q q q corn dog q q pear pig lion q pineapple q q spoon 0.0 q ship q cow car q q boat q elephant telephone q q knife snail qq pencil eagle duck q rocket q q −0.2 q q owl swan qq motorcycle hammer q peacock truck q q q penguin q q helicopter chisel −0.4 turtle q q screwdriver q scissors −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 15 / 107
  • 30. Introduction General overview Outline Introduction The distributional hypothesis General overview Three famous DSM examples Taxonomy of DSM parameters Definition & overview DSM parameters Examples Usage and evaluation of DSM What to do with DSM distances Evaluation: semantic similarity and relatedness Attributional similarity Relational similarity © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 16 / 107
  • 31. Introduction General overview Tutorial overview 1. Introduction & examples 2. Taxonomy of DSM parameters 3. Usage and evaluation of DSM spaces 4. Elements of matrix algebra 5. Making sense of DSM 6. Current research topics & future directions Realistically, we’ll get through parts 1–3 today. But you can find out about matrix algebra and the other advanced topics in the handouts available from the course Web site. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 17 / 107
  • 32. Introduction General overview Further information Handouts & other materials vailable from homepage at http://wordspace.collocations.de/
  • 33. will be extended during the next few months Tutorial is open source (CC), and can be downloaded from http://r-forge.r-project.org/projects/wordspace/ Compact DSM textbook in preparation for Synthesis Lectures on Human Language Technologies (Morgan & Claypool) This tutorial is based on joint work with Marco Baroni and Alessandro Lenci © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 18 / 107
  • 34. Introduction General overview A very brief history of DSM Introduced to computational linguistics in early 1990s following the probabilistic revolution (Schütze 1992, 1998) Other early work in psychology (Landauer and Dumais 1997; Lund and Burgess 1996)
  • 35. influenced by Latent Semantic Indexing (Dumais et al. 1988) and efficient software implementations (Berry 1992) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 19 / 107
  • 36. Introduction General overview A very brief history of DSM Introduced to computational linguistics in early 1990s following the probabilistic revolution (Schütze 1992, 1998) Other early work in psychology (Landauer and Dumais 1997; Lund and Burgess 1996)
  • 37. influenced by Latent Semantic Indexing (Dumais et al. 1988) and efficient software implementations (Berry 1992) Renewed interest in recent years 2007: CoSMo Workshop (at Context ’07) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 19 / 107
  • 38. Introduction General overview A very brief history of DSM Introduced to computational linguistics in early 1990s following the probabilistic revolution (Schütze 1992, 1998) Other early work in psychology (Landauer and Dumais 1997; Lund and Burgess 1996)
  • 39. influenced by Latent Semantic Indexing (Dumais et al. 1988) and efficient software implementations (Berry 1992) Renewed interest in recent years 2007: CoSMo Workshop (at Context ’07) 2008: ESSLLI Lexical Semantics Workshop & Shared Task, Special Issue of the Italian Journal of Linguistics © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 19 / 107
  • 40. Introduction General overview A very brief history of DSM Introduced to computational linguistics in early 1990s following the probabilistic revolution (Schütze 1992, 1998) Other early work in psychology (Landauer and Dumais 1997; Lund and Burgess 1996)
  • 41. influenced by Latent Semantic Indexing (Dumais et al. 1988) and efficient software implementations (Berry 1992) Renewed interest in recent years 2007: CoSMo Workshop (at Context ’07) 2008: ESSLLI Lexical Semantics Workshop & Shared Task, Special Issue of the Italian Journal of Linguistics 2009: GeMS Workshop (EACL 2009), DiSCo Workshop (CogSci 2009), ESSLLI Advanced Course on DSM © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 19 / 107
  • 42. Introduction General overview A very brief history of DSM Introduced to computational linguistics in early 1990s following the probabilistic revolution (Schütze 1992, 1998) Other early work in psychology (Landauer and Dumais 1997; Lund and Burgess 1996)
  • 43. influenced by Latent Semantic Indexing (Dumais et al. 1988) and efficient software implementations (Berry 1992) Renewed interest in recent years 2007: CoSMo Workshop (at Context ’07) 2008: ESSLLI Lexical Semantics Workshop & Shared Task, Special Issue of the Italian Journal of Linguistics 2009: GeMS Workshop (EACL 2009), DiSCo Workshop (CogSci 2009), ESSLLI Advanced Course on DSM 2010: 2nd GeMS Workshop (ACL 2010), ESSLLI Workhsop on Compositionality & DSM, Special Issue of JNLE (in prep.), Computational Neurolinguistics Workshop (NAACL-HLT 2010 — don’t miss it this Sunday!) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 19 / 107
  • 44. Introduction General overview Some applications in computational linguistics Unsupervised part-of-speech induction (Schütze 1995) Word sense disambiguation (Schütze 1998) Query expansion in information retrieval (Grefenstette 1994) Synonym tasks & other language tests (Landauer and Dumais 1997; Turney et al. 2003) Thesaurus compilation (Lin 1998a; Rapp 2004) Ontology & wordnet expansion (Pantel et al. 2009) Attachment disambiguation (Pantel 2000) Probabilistic language models (Bengio et al. 2003) Subsymbolic input representation for neural networks Many other tasks in computational semantics: entailment detection, noun compound interpretation, identification of noncompositional expressions, . . . © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 20 / 107
  • 45. Introduction Three famous DSM examples Outline Introduction The distributional hypothesis General overview Three famous DSM examples Taxonomy of DSM parameters Definition & overview DSM parameters Examples Usage and evaluation of DSM What to do with DSM distances Evaluation: semantic similarity and relatedness Attributional similarity Relational similarity © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 21 / 107
  • 46. Introduction Three famous DSM examples Latent Semantic Analysis (Landauer and Dumais 1997) Corpus: 30,473 articles from Grolier’s Academic American Encyclopedia (4.6 million words in total)
  • 47. articles were limited to first 2,000 characters Word-article frequency matrix for 60,768 words row vector shows frequency of word in each article Logarithmic frequencies scaled by word entropy Reduced to 300 dim. by singular value decomposition (SVD) borrowed from LSI (Dumais et al. 1988)
  • 48. central claim: SVD reveals latent semantic features, not just a data reduction technique Evaluated on TOEFL synonym test (80 items) LSA model achieved 64.4% correct answers also simulation of learning rate based on TOEFL results © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 22 / 107
  • 49. Introduction Three famous DSM examples Word Space (Schütze 1992, 1993, 1998) Corpus: ≈ 60 million words of news messages (New York Times News Service) Word-word co-occurrence matrix 20,000 target words & 2,000 context words as features row vector records how often each context word occurs close to the target word (co-occurrence) co-occurrence window: left/right 50 words (Schütze 1998) or ≈ 1000 characters (Schütze 1992) Rows weighted by inverse document frequency (tf.idf) Context vector = centroid of word vectors (bag-of-words)
  • 50. goal: determine “meaning” of a context Reduced to 100 SVD dimensions (mainly for efficiency) Evaluated on unsupervised word sense induction by clustering of context vectors (for an ambiguous word) induced word senses improve information retrieval performance © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 23 / 107
  • 51. Introduction Three famous DSM examples HAL (Lund and Burgess 1996) HAL = Hyperspace Analogue to Language Corpus: 160 million words from newsgroup postings Word-word co-occurrence matrix same 70,000 words used as targets and features co-occurrence window of 1 – 10 words Separate counts for left and right co-occurrence i.e. the context is structured In later work, co-occurrences are weighted by (inverse) distance (Li et al. 2000) Applications include construction of semantic vocabulary maps by multidimensional scaling to 2 dimensions © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 24 / 107
  • 52. Introduction Three famous DSM examples Many parameters . . . Enormous range of DSM parameters and applications Examples showed three entirely different models, each tuned to its particular application ¯ Need overview of DSM parameters & understand their effects © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 25 / 107
  • 53. Taxonomy of DSM parameters Definition & overview Outline Introduction The distributional hypothesis General overview Three famous DSM examples Taxonomy of DSM parameters Definition & overview DSM parameters Examples Usage and evaluation of DSM What to do with DSM distances Evaluation: semantic similarity and relatedness Attributional similarity Relational similarity © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 26 / 107
  • 54. Taxonomy of DSM parameters Definition & overview General definition of DSMs A distributional semantic model (DSM) is a scaled and/or transformed co-occurrence matrix M, such that each row x represents the distribution of a target term across contexts. get see use hear eat kill knife 0.027 -0.024 0.206 -0.022 -0.044 -0.042 cat 0.031 0.143 -0.243 -0.015 -0.009 0.131 dog -0.026 0.021 -0.212 0.064 0.013 0.014 boat -0.022 0.009 -0.044 -0.040 -0.074 -0.042 cup -0.014 -0.173 -0.249 -0.099 -0.119 -0.042 pig -0.069 0.094 -0.158 0.000 0.094 0.265 banana 0.047 -0.139 -0.104 -0.022 0.267 -0.042 Term = word, lemma, phrase, morpheme, . . . © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 27 / 107
  • 55. Taxonomy of DSM parameters Definition & overview General definition of DSMs Mathematical notation: m × n co-occurrence matrix M (example: 7 × 6 matrix) m rows = target terms n columns = features or dimensions   x11 x12 · · · x1n  x21 x22 · · · x2n  M= .   . .   .. . . .  . xm1 xm2 · · · xmn distribution vector xi = i-th row of M, e.g. x3 = xdog components xi = (xi1 , xi2 , . . . , xin ) = features of i-th term: x3 = (−0.026, 0.021, −0.212, 0.064, 0.013, 0.014) = (x31 , x32 , x33 , x34 , x35 , x36 ) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 28 / 107
  • 56. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 57. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 58. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 59. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 60. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 61. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 62. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 63. Taxonomy of DSM parameters Definition & overview Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 29 / 107
  • 64. Taxonomy of DSM parameters DSM parameters Outline Introduction The distributional hypothesis General overview Three famous DSM examples Taxonomy of DSM parameters Definition & overview DSM parameters Examples Usage and evaluation of DSM What to do with DSM distances Evaluation: semantic similarity and relatedness Attributional similarity Relational similarity © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 30 / 107
  • 65. Taxonomy of DSM parameters DSM parameters Corpus pre-processing Minimally, corpus must be tokenised § identify terms Linguistic annotation part-of-speech tagging lemmatisation / stemming word sense disambiguation (rare) shallow syntactic patterns dependency parsing © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 31 / 107
  • 66. Taxonomy of DSM parameters DSM parameters Corpus pre-processing Minimally, corpus must be tokenised § identify terms Linguistic annotation part-of-speech tagging lemmatisation / stemming word sense disambiguation (rare) shallow syntactic patterns dependency parsing Generalisation of terms often lemmatised to reduce data sparseness: go, goes, went, gone, going § go POS disambiguation (light/N vs. light/A vs. light/V) word sense disambiguation (bankriver vs. bankfinance ) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 31 / 107
  • 67. Taxonomy of DSM parameters DSM parameters Corpus pre-processing Minimally, corpus must be tokenised § identify terms Linguistic annotation part-of-speech tagging lemmatisation / stemming word sense disambiguation (rare) shallow syntactic patterns dependency parsing Generalisation of terms often lemmatised to reduce data sparseness: go, goes, went, gone, going § go POS disambiguation (light/N vs. light/A vs. light/V) word sense disambiguation (bankriver vs. bankfinance ) Trade-off between deeper linguistic analysis and need for language-specific resources possible errors introduced at each stage of the analysis even more parameters to optimise / cognitive plausibility © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 31 / 107
  • 68. Taxonomy of DSM parameters DSM parameters Effects of pre-processing Nearest neighbours of walk (BNC) word forms lemmatised corpus stroll hurry walking stroll walked stride go trudge path amble drive wander ride walk-nn wander walking sprinted retrace sauntered scuttle © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 32 / 107
  • 69. Taxonomy of DSM parameters DSM parameters Effects of pre-processing Nearest neighbours of arrivare (Repubblica) word forms lemmatised corpus giungere giungere raggiungere aspettare arrivi attendere raggiungimento arrivo-nn raggiunto ricevere trovare accontentare raggiunge approdare arrivasse pervenire arriverà venire concludere piombare © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 33 / 107
  • 70. Taxonomy of DSM parameters DSM parameters Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 34 / 107
  • 71. Taxonomy of DSM parameters DSM parameters Term-context vs. term-term matrix Term-context matrix records frequency of term in each individual context (typically a sentence or document) doc1 doc2 doc3 ··· boat 1 3 0 ··· cat 0 0 2 ··· dog 1 0 1 ··· Typical contexts are non-overlapping textual units (Web page, encyclopaedia article, paragraph, sentence, . . . ) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 35 / 107
  • 72. Taxonomy of DSM parameters DSM parameters Term-context vs. term-term matrix Term-context matrix records frequency of term in each individual context (typically a sentence or document) doc1 doc2 doc3 ··· boat 1 3 0 ··· cat 0 0 2 ··· dog 1 0 1 ··· Typical contexts are non-overlapping textual units (Web page, encyclopaedia article, paragraph, sentence, . . . ) Contexts can also be generalised, e.g. bag of content words specific pattern of POS tags subcategorisation pattern of target verb Term-context matrix is usually very sparse © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 35 / 107
  • 73. Taxonomy of DSM parameters DSM parameters Term-context vs. term-term matrix Term-term matrix records co-occurrence frequencies of context terms for each target term (often target terms = context terms) see use hear ··· boat 39 23 4 ··· cat 58 4 4 ··· dog 83 10 42 ··· © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 36 / 107
  • 74. Taxonomy of DSM parameters DSM parameters Term-context vs. term-term matrix Term-term matrix records co-occurrence frequencies of context terms for each target term (often target terms = context terms) see use hear ··· boat 39 23 4 ··· cat 58 4 4 ··· dog 83 10 42 ··· Different types of contexts (Evert 2008) surface context (word or character window) textual context (non-overlapping segments) syntactic contxt (specific syntagmatic relation) Can be seen as smoothing of term-context matrix average over similar contexts (with same context terms) data sparseness reduced, except for small windows © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 36 / 107
  • 75. Taxonomy of DSM parameters DSM parameters Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 37 / 107
  • 76. Taxonomy of DSM parameters DSM parameters Surface context Context term occurs within a window of k words around target. The silhouette of the sun beyond a wide-open bay on the lake; the sun still glitters although evening has arrived in Kuhmo. It’s midsummer; the living room has its instruments and other objects in each of its corners. Parameters: window size (in words or characters) symmetric vs. one-sided window uniform or “triangular” (distance-based) weighting window clamped to sentences or other textual units? © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 38 / 107
  • 77. Taxonomy of DSM parameters DSM parameters Effect of different window sizes Nearest neighbours of dog (BNC) 2-word window 30-word window cat kennel horse puppy fox pet pet bitch rabbit terrier pig rottweiler animal canine mongrel cat sheep to bark pigeon Alsatian © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 39 / 107
  • 78. Taxonomy of DSM parameters DSM parameters Textual context Context term is in the same linguistic unit as target. The silhouette of the sun beyond a wide-open bay on the lake; the sun still glitters although evening has arrived in Kuhmo. It’s midsummer; the living room has its instruments and other objects in each of its corners. Parameters: type of linguistic unit sentence paragraph turn in a conversation Web page © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 40 / 107
  • 79. Taxonomy of DSM parameters DSM parameters Syntactic context Context term is linked to target by a syntactic dependency (e.g. subject, modifier, . . . ). The silhouette of the sun beyond a wide-open bay on the lake; the sun still glitters although evening has arrived in Kuhmo. It’s midsummer; the living room has its instruments and other objects in each of its corners. Parameters: types of syntactic dependency (Padó and Lapata 2007) direct vs. indirect dependency paths direct dependencies direct + indirect dependencies homogeneous data (e.g. only verb-object) vs. heterogeneous data (e.g. all children and parents of the verb) maximal length of dependency path © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 41 / 107
  • 80. Taxonomy of DSM parameters DSM parameters “Knowledge pattern” context Context term is linked to target by a lexico-syntactic pattern (text mining, cf. Hearst 1992, Pantel & Pennacchiotti 2008, etc.). In Provence, Van Gogh painted with bright colors such as red and yellow. These colors produce incredible effects on anybody looking at his paintings. Parameters: inventory of lexical patterns lots of research to identify semantically interesting patterns (cf. Almuhareb & Poesio 2004, Veale & Hao 2008, etc.) fixed vs. flexible patterns patterns are mined from large corpora and automatically generalised (optional elements, POS tags or semantic classes) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 42 / 107
  • 81. Taxonomy of DSM parameters DSM parameters Structured vs. unstructured context In unstructered models, context specification acts as a filter determines whether context tokens counts as co-occurrence e.g. linked by specific syntactic relation such as verb-object © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 43 / 107
  • 82. Taxonomy of DSM parameters DSM parameters Structured vs. unstructured context In unstructered models, context specification acts as a filter determines whether context tokens counts as co-occurrence e.g. linked by specific syntactic relation such as verb-object In structured models, context words are subtyped depending on their position in the context e.g. left vs. right context, type of syntactic relation, etc. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 43 / 107
  • 83. Taxonomy of DSM parameters DSM parameters Structured vs. unstructured surface context A dog bites a man. The man’s dog bites a dog. A dog bites a man. unstructured bite dog 4 man 3 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 44 / 107
  • 84. Taxonomy of DSM parameters DSM parameters Structured vs. unstructured surface context A dog bites a man. The man’s dog bites a dog. A dog bites a man. unstructured bite dog 4 man 3 A dog bites a man. The man’s dog bites a dog. A dog bites a man. structured bite-l bite-r dog 3 1 man 1 2 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 44 / 107
  • 85. Taxonomy of DSM parameters DSM parameters Structured vs. unstructured dependency context A dog bites a man. The man’s dog bites a dog. A dog bites a man. unstructured bite dog 4 man 2 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 45 / 107
  • 86. Taxonomy of DSM parameters DSM parameters Structured vs. unstructured dependency context A dog bites a man. The man’s dog bites a dog. A dog bites a man. unstructured bite dog 4 man 2 A dog bites a man. The man’s dog bites a dog. A dog bites a man. structured bite-subj bite-obj dog 3 1 man 0 2 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 45 / 107
  • 87. Taxonomy of DSM parameters DSM parameters Comparison Unstructured context data less sparse (e.g. man kills and kills man both map to the kill dimension of the vector xman ) Structured context more sensitive to semantic distinctions (kill-subj and kill-obj are rather different things!) dependency relations provide a form of syntactic “typing” of the DSM dimensions (the “subject” dimensions, the “recipient” dimensions, etc.) important to account for word-order and compositionality © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 46 / 107
  • 88. Taxonomy of DSM parameters DSM parameters Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 47 / 107
  • 89. Taxonomy of DSM parameters DSM parameters Geometric vs. probabilistic interpretation Geometric interpretation row vectors as points or arrows in n-dim. space very intuitive, good for visualisation use techniques from geometry and linear algebra © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 48 / 107
  • 90. Taxonomy of DSM parameters DSM parameters Geometric vs. probabilistic interpretation Geometric interpretation row vectors as points or arrows in n-dim. space very intuitive, good for visualisation use techniques from geometry and linear algebra Probabilistic interpretation co-occurrence matrix as observed sample statistic “explained” by generative probabilistic model recent work focuses on hierarchical Bayesian models probabilistic LSA (Hoffmann 1999), Latent Semantic Clustering (Rooth et al. 1999), Latent Dirichlet Allocation (Blei et al. 2003), etc. explicitly accounts for random variation of frequency counts intuitive and plausible as topic model © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 48 / 107
  • 91. Taxonomy of DSM parameters DSM parameters Geometric vs. probabilistic interpretation Geometric interpretation row vectors as points or arrows in n-dim. space very intuitive, good for visualisation use techniques from geometry and linear algebra Probabilistic interpretation co-occurrence matrix as observed sample statistic “explained” by generative probabilistic model recent work focuses on hierarchical Bayesian models probabilistic LSA (Hoffmann 1999), Latent Semantic Clustering (Rooth et al. 1999), Latent Dirichlet Allocation (Blei et al. 2003), etc. explicitly accounts for random variation of frequency counts intuitive and plausible as topic model
  • 92. focus exclusively on geometric interpretation in this tutorial © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 48 / 107
  • 93. Taxonomy of DSM parameters DSM parameters Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 49 / 107
  • 94. Taxonomy of DSM parameters DSM parameters Feature scaling Feature scaling is used to “discount” less important features: Logarithmic scaling: x = log(x + 1) (cf. Weber-Fechner law for human perception) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 50 / 107
  • 95. Taxonomy of DSM parameters DSM parameters Feature scaling Feature scaling is used to “discount” less important features: Logarithmic scaling: x = log(x + 1) (cf. Weber-Fechner law for human perception) Relevance weighting, e.g. tf.idf (information retrieval) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 50 / 107
  • 96. Taxonomy of DSM parameters DSM parameters Feature scaling Feature scaling is used to “discount” less important features: Logarithmic scaling: x = log(x + 1) (cf. Weber-Fechner law for human perception) Relevance weighting, e.g. tf.idf (information retrieval) Statistical association measures (Evert 2004, 2008) take frequency of target word and context feature into account the less frequent the target word and (more importantly) the context feature are, the higher the weight given to their observed co-occurrence count should be (because their expected chance co-occurrence frequency is low) different measures – e.g., mutual information, log-likelihood ratio – differ in how they balance observed and expected co-occurrence frequencies © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 50 / 107
  • 97. Taxonomy of DSM parameters DSM parameters Association measures: Mutual Information (MI) word1 word2 fobs f1 f2 dog small 855 33,338 490,580 dog domesticated 29 33,338 918 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 51 / 107
  • 98. Taxonomy of DSM parameters DSM parameters Association measures: Mutual Information (MI) word1 word2 fobs f1 f2 dog small 855 33,338 490,580 dog domesticated 29 33,338 918 Expected co-occurrence frequency: f1 · f 2 fexp = N © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 51 / 107
  • 99. Taxonomy of DSM parameters DSM parameters Association measures: Mutual Information (MI) word1 word2 fobs f1 f2 dog small 855 33,338 490,580 dog domesticated 29 33,338 918 Expected co-occurrence frequency: f1 · f 2 fexp = N Mutual Information compares observed vs. expected frequency: fobs N · fobs MI(w1 , w2 ) = log2 = log2 fexp f 1 · f2 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 51 / 107
  • 100. Taxonomy of DSM parameters DSM parameters Association measures: Mutual Information (MI) word1 word2 fobs f1 f2 dog small 855 33,338 490,580 dog domesticated 29 33,338 918 Expected co-occurrence frequency: f1 · f 2 fexp = N Mutual Information compares observed vs. expected frequency: fobs N · fobs MI(w1 , w2 ) = log2 = log2 fexp f 1 · f2 Disadvantage: MI overrates combinations of rare terms. © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 51 / 107
  • 101. Taxonomy of DSM parameters DSM parameters Other association measures Log-likelihood ratio (Dunning 1993) has more complex form, but its “core” is known as local MI (Evert 2004). local-MI(w1 , w2 ) = fobs · MI(w1 , w2 ) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 52 / 107
  • 102. Taxonomy of DSM parameters DSM parameters Other association measures Log-likelihood ratio (Dunning 1993) has more complex form, but its “core” is known as local MI (Evert 2004). local-MI(w1 , w2 ) = fobs · MI(w1 , w2 ) word1 word2 fobs MI local-MI dog small 855 3.96 3382.87 dog domesticated 29 6.85 198.76 dog sgjkj 1 10.31 10.31 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 52 / 107
  • 103. Taxonomy of DSM parameters DSM parameters Other association measures Log-likelihood ratio (Dunning 1993) has more complex form, but its “core” is known as local MI (Evert 2004). local-MI(w1 , w2 ) = fobs · MI(w1 , w2 ) word1 word2 fobs MI local-MI dog small 855 3.96 3382.87 dog domesticated 29 6.85 198.76 dog sgjkj 1 10.31 10.31 The t-score measure (Church and Hanks 1990) is popular in lexicography: fobs − fexp t-score(w1 , w2 ) = √ fobs Details & many more measures: http://www.collocations.de/ © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 52 / 107
  • 104. Taxonomy of DSM parameters DSM parameters Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 53 / 107
  • 105. Taxonomy of DSM parameters DSM parameters Normalisation of row vectors Two dimensions of English V−Obj DSM geometric distances only 120 make sense if vectors are 100 normalised to unit length knife divide vector by its length: q 80 q use x/ x 60 40 normalisation depends on boat q distance measure! q 20 dog q special case: scale to cat q q 0 relative frequencies with 0 20 40 60 80 100 120 x 1 = |x1 | + · · · + |xn | get © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 54 / 107
  • 106. Taxonomy of DSM parameters DSM parameters Scaling of column vectors In statistical analysis and machine learning, features are usually centred and scaled so that mean µ=0 variance σ2 = 1 In DSM research, this step is less common for columns of M centring is a prerequisite for certain dimensionality reduction and data analysis techniques (esp. PCA) scaling may give too much weight to rare features © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 55 / 107
  • 107. Taxonomy of DSM parameters DSM parameters Scaling of column vectors In statistical analysis and machine learning, features are usually centred and scaled so that mean µ=0 variance σ2 = 1 In DSM research, this step is less common for columns of M centring is a prerequisite for certain dimensionality reduction and data analysis techniques (esp. PCA) scaling may give too much weight to rare features M cannot be row-normalised and column-scaled at the same time (result depends on ordering of the two steps) © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 55 / 107
  • 108. Taxonomy of DSM parameters DSM parameters Overview of DSM parameters Linguistic pre-processing (definition of terms) ⇓ Term-context vs. term-term matrix ⇓ Size & type of context / structured vs. unstructered ⇓ Geometric vs. probabilistic interpretation ⇓ Feature scaling ⇓ Normalisation of rows and/or columns ⇓ Similarity / distance measure ⇓ Compression © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 56 / 107
  • 109. Taxonomy of DSM parameters DSM parameters Geometric distance Distance between vectors x2 u, v ∈ Rn § (dis)similarity 6 u u = (u1 , . . . , un ) 5 v = (v1 , . . . , vn ) d1 (u, v) = 5 4 d2 (u, v) = 3.6 3 v 2 1 x1 1 2 3 4 5 6 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 57 / 107
  • 110. Taxonomy of DSM parameters DSM parameters Geometric distance Distance between vectors x2 u, v ∈ Rn § (dis)similarity 6 u u = (u1 , . . . , un ) 5 v = (v1 , . . . , vn ) d1 (u, v) = 5 4 Euclidean distance d2 (u, v) d2 (u, v) = 3.6 3 v 2 1 x1 1 2 3 4 5 6 d2 (u, v) := (u1 − v1 )2 + · · · + (un − vn )2 © Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial wordspace.collocations.de 57 / 107