SlideShare ist ein Scribd-Unternehmen logo
1 von 80
High Tech Campus, Philips Research
                         Eindhoven, Netherlands




Random Indexing and Quantum
   Negation for TV-Shows
  Retrieval and Classification
             Cataldo Musto, Ph.D. Student
  cataldomusto@di.uniba.it - cataldo.musto@philips.com
           University of Bari “Aldo Moro” (Italy), SWAP Research Group
          Philips Research Center - Eindhoven (Netherlands) - HI&E Group
                                  14.07.11
outline
                •     part 1:    introduction
                     •      information overload, personalization, information filtering, recommender
                            systems

                •     part 2:    approaches
                     •      vector space model, random indexing, quantum negation

                •     part 3:    scenario
                     •      tv-show recommendation, description of the data, description of the tasks

                •     part 4:    experimental evaluation
                     •      results, discussion, future work



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 1: introduction
                                       what are we talking about?




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
TV
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
text messages
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
phone calls
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
internet navigation
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
scenario
                •     Daily interaction with electronic
                      devices

                     •     eMail, Web navigation, Social
                           media, instant messaging


                •     Continuous flow of
                      information

                     •     in 2007, 500.000 terabyte of
                           information have been produced
                           on the Web in one year

                     •     By including also telephone,
                           radio, TV and so on we reach 18
                           exabytes of data!



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
information overload
                •     Consequences:
                      cognitive overload

                     •     It is impossible to
                           effectively deal with
                           this surplus of
                           information

                     •     It is difficult to quickly
                           find the information
                           we really need

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Solution:


personalization
information filtering
                     ”
                     An information filtering system is a
                   system that removes redundant of
                unwanted information from an information
                        stream using automated methods ”
                                                                                                      Wikipedia.


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
information filtering systems

                • How do they work?
                 • Usually, in three steps
                   • Training Step
                   • User Modeling
                   • Filtering
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Step 1:


                                                     Training
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Step 2:


                                    User Modeling
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Step 3:


                                                    Filtering
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
recommender systems
                •     A specific type of Information Filtering system
                      that attempts                  to recommend
                      information items (films, television, video on
                      demand, music, books,  etc) that are likely to be of
                      interest to the user


                     •      Everyday we interact with recommender
                            systems, even if we do not know it!

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Amazon
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
YouTube
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
recommendation approaches
                •     Content-based filtering
                     •       No interactions between users. Each user is an atomic            entity
                     •       Prerequisite: each item to be recommended has to be described through a                set of
                             textual features
                     •       We store in a user profile the features that often
                             occur in the items she like
                •     Assumption: if a user usually likes items in whose description often occurs a specific feature we
                      can assume that he      will like that items also in the future

                •     e.g.
                     •       If User_A likes a news with the features “Football” and “Internazionale FC” inside
                     •       We can recommend her other news about both Football or Internazionale
                             FC



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 2: approaches
  vector space model, random indexing,quantum negation




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
vector space model
                •     Introduced by Salton in
                      1975

                     •     Given a set of M documents
                           (items) d = (d1.....dM)

                     •     Given N features describing
                           the documents

                     •     Each document (item) is
                           represented in a an N-
                           dimensional vector space

                     •     The whole corpus is
                           represented in a N*M matrix
                           called term/document
                           matrix



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
vector space model
                     •      VSM in a recommendation scenario
                          •      Document: point in the vector space
                          •      User profile: point in the vector space
                               •      e.g. built as the sum of the vector space representation of the documents
                                      liked in the past by the user
                          •      Goal: to find the documents that are the most relevant ones for that user profile
                          •      Assumption

                               •      the most           similar documents in the vector space are the most
                                      relevant ones

                               •      Cosine Similarity to compute the similarity between query and
                                      documents




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
vsm analysis (2)
                     •      Weak Points
                          •      Not incremental

                               •      The whole Vector Space has to be generated from                      scratch
                                      whenever a new item is added to the repository
                          •      High Dimensionality
                               •      NLP operations (stopwords elimination, stemming and so on)
                          •      Does not manage negative evidence
                               •      The vector space representation only depends on the features that occur in
                                      the document, there are no assumption about the features that don’t occur
                          •      Does not manage the latent semantic of documents

                               •      Any permutation of the terms in a document has                the same
                                      VSM representation!


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
idea
                    • To introduce tools and techniques
                          able to overcome these drawbacks
                         • Random Indexing
                          • Dimensionality reduction technique
                                     Sahlgren, 2005


                         • Quantum Negation
                          • Based on Quantum Logic
                                     Widdows, 2007



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
random indexing
                     •      Random Indexing (RI) is an incremental and effective
                            technique for dimensionality reduction
                     •      Distributional Models
                          •      Assumption: we can infer information about terms
                                 by analyzing how are they used in large corpus of data


                     •      Based on the so-called “Distributional Hypothesis”
                          •      “Words that occur in the same context tend to have
                                 similar meanings”
                          •      “Meaning is its use” (Wittgenstein)


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
how it works?



                                         Random Indexing reduces the original
                                      dimensional term/doc matrix to a new lower
                                                  dimensional matrix




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
how it works?
          •     How?
               •      By multiplying the original
                      matrix with a random
                      one, built in an incremental
                      way
                    •      formally: An,m * Rm,k = Bn,k
                    •      k << m
               •      After projection, the
                      distance between points in
                      the vector space is preserved
                    •      Johnson-Lindenstrauss
                           Lemma
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
random matrix
                      •      How is the random matrix build?
                      •      The whole process is based on the concept of “context”
                           •      Given a term, its “context” could be the whole document, a
                                  paragraph, a sentence, a sliding window of words and so on.

                           •      The definition of the context influences the structure of the
                                  matrix


                      •      The matrix is built in an iterative and incremental way

                           •      The vector representing each document depends on the terms
                                  that occur in it

                           •      The vector representing each term depends on its context




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
item representation
                •     A context vector is assigned for each context (for simplicity, we
                      assume as context the whole document)
                     •      This vector has a fixed dimension (k) and it can contain only values in
                            -1, 0,1. Values are distributed in a random way but the number of non-
                            zero elements is much smaller.

                •     The Vector Space representation of a term is obtained by summing all
                      its context (the documents it occurs in).

                •     The Vector Space representation of a document (item) is
                      obtained by summing the context vectors of the terms that occur in it


                •     Output: lower-dimensional vector space representation
                      based on random context vectors


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
quantum negation
                •     Random Indexing is still not capable of managing negative evidence
                •     RI can be coupled with Quantum Negation (QN) operator
                     •      Definition inherited by Quantum logic

                     •      Negation as a form of orthogonality                                   between
                            vectors
                     •      Given two vectors A e B , we can define the vector A                                  not B
                          •      It represents the projection of the vector A on the subspace
                                 orthogonal to those generated by vector B
                          •      In a recommendation scenario, this operator could be used to
                                 model two vectors, the first one representing positive
                                 evidence and the second one for modeling negative ones


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
...summing up
              •      VSM is an effective model for document retrieval

              •      It can be exploited in recommendation scenarios
              •      It suffers from some well-known drawbacks
              •      Solutions
                    •     Random Indexing is an incremental and effective approach
                          that can catch the high-dimensionality problem
                    •     Quantum Negation can effectively model negative evidence

                    •     The combined use of RI and QN is a good
                          alternative to VSM, especially for real-life scenarios


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 3: scenario
                                       tv-shows recommendation




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Scenario:
                              EPG (Electronic Program Guides)
                                      personalization
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
scenario

                •     Given a set of TV-Shows
                      we want to provide
                      user a set of
                      suggestions about the
                      shows that she should
                      watch, according on her
                      preferences




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
approach

            Currently the recommendation
            model is implemented through
            the Vector Space Model (VSM)


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
data
              •     TV shows gathered from a set of
                    47 German-language broadcast
                    channel

              •     Each TV show is described
                    through a set of          textual
                    features (title, synopsis,
                    description, etc.) gathered from an
                    XML feed

              •     Each TV-Show is mapped to a fixed
                    program type (Movie, Sport,
                    Documentary, Magazine, etc.)



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
problems
              •      How to represent the data?

                    •     We compared two                            approaches
                         •     Bag of Words (BOW)
                         •     Tag.me

              •      Which ones are the                     typical use cases?
                    •     We identified two tasks
                         •     Classification Task
                         •     Retrieval Task


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
data representation
                    • Bag of Words
                     • Each item i is described through the
                               words that appear in the text

                         • Weighting of the words
                          • Counting of the occurrences,
                                     normalization, TF-IDF weighting, etc.



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
BOW representation
                    • To improve BOW representation
                       • Usually textual description are very noisy
                       • Full of uninformative words
                       • Further processing can improve
                                     the classical BOW representation
                                   •      Stopword removal: filtering of all the
                                          uninformative words (articles, adverbs,
                                          adjectives and so on)



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
data representation
                     • Tag.me
                      • Online tool developed by the University
                                 of Pisa (Italy)
                          •      Goal: to identify Wikipedia concepts that
                                 occur in the text
                          •      Idea: to process original text through Tag.me
                                 in order to avoid noise and provide a novel
                                 representation based on high-level
                                 Wikipedia concepts


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
tag.me web interface




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
final output
   Bow




  Tag.me



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
description of the tasks
                •     task 1: classification

                     •      Given a flow of TV shows, we would classify
                            them against a the set of program types

                •     task 2: retrieval

                     •      Given a set of program type and a repository
                            of TV shows, we would retrieve the shows
                            that belong to a specific program type

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Steps
                 • 1) Build a vector space for the tv shows
                 • 2) Build a vector for each program type
                 • 3) Use cosine similarity to compare tv shows
                            and program types
                     •      4) Assign the TV show to the program type that got
                            the      highest cosine similarity
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Step 1: build a vector space
                      representation of the TV-shows
                     •      For each TV show we collected a set of words by
                            using the synopsis and the title of the show
                     •      We filtered out the set of the words through a
                            fixed set of 996                          stopwords for
                            German language
                     •      We calculated the TF-IDF score for each
                            document

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Step 2: build a vector for each
                      program type
                     • Given the vector space representation of
                            each document
                     • The vector space representation of each
                            program type is the sum of the
                            vector space representations of each tv-
                            show that belongs to that program type

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                •     Given a set of TV-shows

                     •      T=(s1...sn)
                •     Given a set of program types

                     •      P=(t1...tm)
                •     We define a function pt: P T
                     •      It returns the program type of a tv show
                •     We can build the set S(t_i) as the set of the tv-shows that belong to t_i
                     •      It returns the program type of a tv show
                     •

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Given the set
                      S(t_i) with a
                      cardinality of k,
                      the vector space
                      representation of
                      the program
                      type is simply
                      given by

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Step 3 and Step 4
                • Given the vector space representation of both
                      program types and tv shows
                     •      Use of cosine similarity to compare each TV
                            shows against the set of the program types
                     •      We assigned the TV show to the program type
                            that got the highest                                cosine similarity

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI for TV shows classification

                •     Steps
                     •      1) Build a vector space for the tv shows
                     •      2) Reduce the vector space through the
                            Random Indexing algorithm
                     •      3) Build a vector for each program type on the (reduced)
                            vector space
                     •      4) Use cosine similarity to compare tv shows and
                            program types
                     •      5) Assign the TV show to the program type that got the
                            highest cosine similarity


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI for TV shows retrieval

                •     Steps
                     •      1) Build a vector space for the tv shows
                     •      2) Reduce the vector space through the Random
                            Indexing algorithm
                     •      3) Build a positive vector for each program type on the
                            (reduced) vector space
                     •      4) Use cosine similarity to compare tv shows and
                            program types
                     •      5) Rank the tv shows and assign the first N to
                            the program type


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI+QN for TV shows retrieval

                •     Steps
                     •      1) Build a vector space for the tv shows
                     •      2) Reduce the vector space through the Random Indexing
                            algorithm
                     •      3) Build a positive vector for each program type on the
                            (reduced) vector space
                     •      4) Build a negative vector for each program type
                            on the (reduced) vector space
                     •      5) Use cosine similarity to compare tv shows with
                            both positive and negative program types vectors
                     •      6) Rank the tv shows and assign the first N to the program type


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI+QN for TV shows retrieval

                •     Given a set of TV-shows

                     •      T=(s1...sn)
                •     Given a set of program types

                     •      P=(t1...tm)
                •     We define a function pt: P T
                     •      It returns the program type of a tv show
                •     We can build the set S(t_i) as the set of the tv-shows that belong to t_i
                     •      It returns the program type of a tv show
                     •

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI+QN for TV shows retrieval
                •     Given the sets S(t_i) and
                      its complement with a
                      cardinality of k and z the
                      vector space
                      representation of the
                      program type is simply
                      given by
                •     The positive and negative
                      vector will be combined in
                      order to emphasize the
                      features that occur in the
                      positive vector and avoid
                      the ones that occur in the
                      negative one


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
...summing up
                •     Classification task
                     •      Comparison of VSM and RI
                     •      We build a vector space
                     •      Applied RI to reduce the vector space
                     •      We tried to classify TV shows in the complete vector space and in the reduced
                            one, comparing the accuracy
                •     Retrieval task
                     •      Comparison of RI and RI+QN
                     •      We build a vector space
                     •      Applied RI to reduce the vector space
                     •      Build both positive and negative program types vectors and applied QN
                     •      We tried to retrieve TV shows and we compared the the RI without negation and
                            the RI with negation



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 4: experimental evaluation
                                  results, discussion, future work




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
dataset
                                                                             program
                 tv shows                          133.579                                                          17
                                                                              types

                  features                                                  features
                                                   306,006                                                     74,599
                   (BOW)                                                    (Tag.me)

                                                                               avg
            avg features
                                                      42.11                 features                              9.21
              (BOW)
                                                                            (Tag.me)



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
experimental design
                • 10-fold cross validation
                 • Dataset splitted in 10 partitions
                 • 9 partitions for training the models, the
                            last one for testing

                     • Results averaged over all the
                            partitions

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
metrics

                • classification task
                 • precision =
                • retrieval task
                • precision @n =
                • precision @k% =
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
tuning of parameters
                •     Random Indexing algorithm
                     •      Dimension of the vectors
                          •      Classification task: 500, 700
                          •      Retrieval task: 500, 1000, 1500, 2000
                     •      Minimum number of occurrences
                          •      Classification task: 2
                          •      Retrieval task: 1, 3
                     •      Training Cycles
                          •      Classification task: 1, 2
                          •      Retrieval task: 1


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification task - results
                   size                  occur.                 cycles                 tag.me                      bow

                    500                         2                     1                   37.38                    42.91

                    700                         2                     1                   40.28                    47.76

                    500                         2                     1                   44.61                    54.32

                    700                         2                     1                   45.33                    54.33




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification task: comparison
                                                                                                                      68.7

                                                                  54.3                      54.3

                                      47.7
            42.9




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification - results per program type




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification task - outcomes
                •     BOW better than Tag.me
                     •      Representation too poor

                     •      Difficult to learn a solid and effective model for text classification
                •     Dimension of the vector space and the second training cycles affect the
                      predictive accuracy
                •     RI does not overcome the baseline

                     •      Vector space reduced          over 99% (from 133579 to 500 or 700)
                     •      Too much loss of information

                     •      but
                          •       Splitting the results for single program types the Random Indexing got better results in
                                  10 out of 17 program types
                          •       Need to investigate the reasons of that



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@n

                  82.6%

                  66.3%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@n


                                                         65.9%


                                                         45.2%



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@n



                                                                                                            58.1%


                                                                                                            36.5%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@k%
                86.0%


                58.1%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@k%


                                                                                                               55.4%

                                                                                                               35.4%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@n
                  61.9%



                  47.9%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@n


                                                       53.7%



                                                       40.9%



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@n


                                                                                                               51.6%



                                                                                                               39.0%



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@k%
                 76.6%



                 57.9%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@k%



                                                                                                           49.6%


                                                                                                           35.4%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - overview
                     82.6%




                     61.9%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - overview



                                                              65.0%


                                                            53.0%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - overview




                                                                                                         58.3%

                                                                                                         53.2%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - outcomes

                • BOW always better than Tag.me
                 • Between 5 and 20% difference
                • Parameters do not affect the accuracy
                • QN operator improves the retrieval
                      accuracy by almost 20%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
conclusions & future work
                •     In scenarios where the recommender system has to deal with a continous flow of
                      information the VSM is not suitable

                     •      RI is able to effectively catch typical VSM drawbacks
                               •      Classification task

                                     •     Even if its accuracy is lower, these preliminar results need to be further
                                           investigated, for example testing the algorithm with different values
                                           of the parameters

                                     •     Is a worsening in precision suitable for an algorithm that provides a big
                                           improvement in scalability and efficiency?
                               •      Retrieval Task

                                     •     QN improves the predictive accuracy of the model in the
                                           retrieval tasks

                                     •     Novel operator, this is important outcome with                  a good
                                           scientific impact

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Thanks for you
                                     attention.




                               Cataldo Musto, Ph.D. Student
                    cataldomusto@di.uniba.it - cataldo.musto@philips.com
                                       University of Bari “Aldo Moro” (Italy), SWAP Research Group
                                      Philips Research Center - Eindhoven (Netherlands) - HI&E Group
                                                                   14.07.11

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11

Weitere ähnliche Inhalte

Ähnlich wie Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification

Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk BollenMaduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
imec.archive
 
Break out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di FioreBreak out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di Fiore
imec.archive
 
T bc(김은희)
T bc(김은희)T bc(김은희)
T bc(김은희)
eunhui kim
 

Ähnlich wie Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification (13)

Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk BollenMaduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
 
Huawei STW 2018 public
Huawei STW 2018 publicHuawei STW 2018 public
Huawei STW 2018 public
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
 
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
 
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
 
Break out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di FioreBreak out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di Fiore
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
T bc(김은희)
T bc(김은희)T bc(김은희)
T bc(김은희)
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
 

Mehr von Cataldo Musto

Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Cataldo Musto
 

Mehr von Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification

  • 1. High Tech Campus, Philips Research Eindhoven, Netherlands Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification Cataldo Musto, Ph.D. Student cataldomusto@di.uniba.it - cataldo.musto@philips.com University of Bari “Aldo Moro” (Italy), SWAP Research Group Philips Research Center - Eindhoven (Netherlands) - HI&E Group 14.07.11
  • 2. outline • part 1: introduction • information overload, personalization, information filtering, recommender systems • part 2: approaches • vector space model, random indexing, quantum negation • part 3: scenario • tv-show recommendation, description of the data, description of the tasks • part 4: experimental evaluation • results, discussion, future work C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 3. part 1: introduction what are we talking about? C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 4. TV C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 5. text messages C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 6. phone calls C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 7. internet navigation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 8. scenario • Daily interaction with electronic devices • eMail, Web navigation, Social media, instant messaging • Continuous flow of information • in 2007, 500.000 terabyte of information have been produced on the Web in one year • By including also telephone, radio, TV and so on we reach 18 exabytes of data! C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 9. information overload • Consequences: cognitive overload • It is impossible to effectively deal with this surplus of information • It is difficult to quickly find the information we really need C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 11. information filtering ” An information filtering system is a system that removes redundant of unwanted information from an information stream using automated methods ” Wikipedia. C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 12. information filtering systems • How do they work? • Usually, in three steps • Training Step • User Modeling • Filtering C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 13. Step 1: Training C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 14. Step 2: User Modeling C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 15. Step 3: Filtering C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 16. recommender systems • A specific type of Information Filtering system that attempts to recommend information items (films, television, video on demand, music, books,  etc) that are likely to be of interest to the user • Everyday we interact with recommender systems, even if we do not know it! C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 17. Amazon C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 18. YouTube C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 19. recommendation approaches • Content-based filtering • No interactions between users. Each user is an atomic entity • Prerequisite: each item to be recommended has to be described through a set of textual features • We store in a user profile the features that often occur in the items she like • Assumption: if a user usually likes items in whose description often occurs a specific feature we can assume that he will like that items also in the future • e.g. • If User_A likes a news with the features “Football” and “Internazionale FC” inside • We can recommend her other news about both Football or Internazionale FC C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 20. part 2: approaches vector space model, random indexing,quantum negation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 21. vector space model • Introduced by Salton in 1975 • Given a set of M documents (items) d = (d1.....dM) • Given N features describing the documents • Each document (item) is represented in a an N- dimensional vector space • The whole corpus is represented in a N*M matrix called term/document matrix C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 22. vector space model • VSM in a recommendation scenario • Document: point in the vector space • User profile: point in the vector space • e.g. built as the sum of the vector space representation of the documents liked in the past by the user • Goal: to find the documents that are the most relevant ones for that user profile • Assumption • the most similar documents in the vector space are the most relevant ones • Cosine Similarity to compute the similarity between query and documents C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 23. vsm analysis (2) • Weak Points • Not incremental • The whole Vector Space has to be generated from scratch whenever a new item is added to the repository • High Dimensionality • NLP operations (stopwords elimination, stemming and so on) • Does not manage negative evidence • The vector space representation only depends on the features that occur in the document, there are no assumption about the features that don’t occur • Does not manage the latent semantic of documents • Any permutation of the terms in a document has the same VSM representation! C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 24. idea • To introduce tools and techniques able to overcome these drawbacks • Random Indexing • Dimensionality reduction technique Sahlgren, 2005 • Quantum Negation • Based on Quantum Logic Widdows, 2007 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 25. random indexing • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Distributional Models • Assumption: we can infer information about terms by analyzing how are they used in large corpus of data • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 26. how it works? Random Indexing reduces the original dimensional term/doc matrix to a new lower dimensional matrix C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 27. how it works? • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m * Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preserved • Johnson-Lindenstrauss Lemma C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 28. random matrix • How is the random matrix build? • The whole process is based on the concept of “context” • Given a term, its “context” could be the whole document, a paragraph, a sentence, a sliding window of words and so on. • The definition of the context influences the structure of the matrix • The matrix is built in an iterative and incremental way • The vector representing each document depends on the terms that occur in it • The vector representing each term depends on its context C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 29. item representation • A context vector is assigned for each context (for simplicity, we assume as context the whole document) • This vector has a fixed dimension (k) and it can contain only values in -1, 0,1. Values are distributed in a random way but the number of non- zero elements is much smaller. • The Vector Space representation of a term is obtained by summing all its context (the documents it occurs in). • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in it • Output: lower-dimensional vector space representation based on random context vectors C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 30. quantum negation • Random Indexing is still not capable of managing negative evidence • RI can be coupled with Quantum Negation (QN) operator • Definition inherited by Quantum logic • Negation as a form of orthogonality between vectors • Given two vectors A e B , we can define the vector A not B • It represents the projection of the vector A on the subspace orthogonal to those generated by vector B • In a recommendation scenario, this operator could be used to model two vectors, the first one representing positive evidence and the second one for modeling negative ones C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 31. ...summing up • VSM is an effective model for document retrieval • It can be exploited in recommendation scenarios • It suffers from some well-known drawbacks • Solutions • Random Indexing is an incremental and effective approach that can catch the high-dimensionality problem • Quantum Negation can effectively model negative evidence • The combined use of RI and QN is a good alternative to VSM, especially for real-life scenarios C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 32. part 3: scenario tv-shows recommendation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 33. Scenario: EPG (Electronic Program Guides) personalization C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 34. scenario • Given a set of TV-Shows we want to provide user a set of suggestions about the shows that she should watch, according on her preferences C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 35. approach Currently the recommendation model is implemented through the Vector Space Model (VSM) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 36. data • TV shows gathered from a set of 47 German-language broadcast channel • Each TV show is described through a set of textual features (title, synopsis, description, etc.) gathered from an XML feed • Each TV-Show is mapped to a fixed program type (Movie, Sport, Documentary, Magazine, etc.) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 37. problems • How to represent the data? • We compared two approaches • Bag of Words (BOW) • Tag.me • Which ones are the typical use cases? • We identified two tasks • Classification Task • Retrieval Task C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 38. data representation • Bag of Words • Each item i is described through the words that appear in the text • Weighting of the words • Counting of the occurrences, normalization, TF-IDF weighting, etc. C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 39. BOW representation • To improve BOW representation • Usually textual description are very noisy • Full of uninformative words • Further processing can improve the classical BOW representation • Stopword removal: filtering of all the uninformative words (articles, adverbs, adjectives and so on) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 40. data representation • Tag.me • Online tool developed by the University of Pisa (Italy) • Goal: to identify Wikipedia concepts that occur in the text • Idea: to process original text through Tag.me in order to avoid noise and provide a novel representation based on high-level Wikipedia concepts C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 41. tag.me web interface C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 42. final output Bow Tag.me C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 43. description of the tasks • task 1: classification • Given a flow of TV shows, we would classify them against a the set of program types • task 2: retrieval • Given a set of program type and a repository of TV shows, we would retrieve the shows that belong to a specific program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 44. VSM for TV shows classification • Steps • 1) Build a vector space for the tv shows • 2) Build a vector for each program type • 3) Use cosine similarity to compare tv shows and program types • 4) Assign the TV show to the program type that got the highest cosine similarity C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 45. VSM for TV shows classification • Step 1: build a vector space representation of the TV-shows • For each TV show we collected a set of words by using the synopsis and the title of the show • We filtered out the set of the words through a fixed set of 996 stopwords for German language • We calculated the TF-IDF score for each document C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 46. VSM for TV shows classification • Step 2: build a vector for each program type • Given the vector space representation of each document • The vector space representation of each program type is the sum of the vector space representations of each tv- show that belongs to that program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 47. VSM for TV shows classification • Given a set of TV-shows • T=(s1...sn) • Given a set of program types • P=(t1...tm) • We define a function pt: P T • It returns the program type of a tv show • We can build the set S(t_i) as the set of the tv-shows that belong to t_i • It returns the program type of a tv show • C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 48. VSM for TV shows classification • Given the set S(t_i) with a cardinality of k, the vector space representation of the program type is simply given by C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 49. VSM for TV shows classification • Step 3 and Step 4 • Given the vector space representation of both program types and tv shows • Use of cosine similarity to compare each TV shows against the set of the program types • We assigned the TV show to the program type that got the highest cosine similarity C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 50. RI for TV shows classification • Steps • 1) Build a vector space for the tv shows • 2) Reduce the vector space through the Random Indexing algorithm • 3) Build a vector for each program type on the (reduced) vector space • 4) Use cosine similarity to compare tv shows and program types • 5) Assign the TV show to the program type that got the highest cosine similarity C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 51. RI for TV shows retrieval • Steps • 1) Build a vector space for the tv shows • 2) Reduce the vector space through the Random Indexing algorithm • 3) Build a positive vector for each program type on the (reduced) vector space • 4) Use cosine similarity to compare tv shows and program types • 5) Rank the tv shows and assign the first N to the program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 52. RI+QN for TV shows retrieval • Steps • 1) Build a vector space for the tv shows • 2) Reduce the vector space through the Random Indexing algorithm • 3) Build a positive vector for each program type on the (reduced) vector space • 4) Build a negative vector for each program type on the (reduced) vector space • 5) Use cosine similarity to compare tv shows with both positive and negative program types vectors • 6) Rank the tv shows and assign the first N to the program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 53. RI+QN for TV shows retrieval • Given a set of TV-shows • T=(s1...sn) • Given a set of program types • P=(t1...tm) • We define a function pt: P T • It returns the program type of a tv show • We can build the set S(t_i) as the set of the tv-shows that belong to t_i • It returns the program type of a tv show • C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 54. RI+QN for TV shows retrieval • Given the sets S(t_i) and its complement with a cardinality of k and z the vector space representation of the program type is simply given by • The positive and negative vector will be combined in order to emphasize the features that occur in the positive vector and avoid the ones that occur in the negative one C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 55. ...summing up • Classification task • Comparison of VSM and RI • We build a vector space • Applied RI to reduce the vector space • We tried to classify TV shows in the complete vector space and in the reduced one, comparing the accuracy • Retrieval task • Comparison of RI and RI+QN • We build a vector space • Applied RI to reduce the vector space • Build both positive and negative program types vectors and applied QN • We tried to retrieve TV shows and we compared the the RI without negation and the RI with negation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 56. part 4: experimental evaluation results, discussion, future work C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 57. dataset program tv shows 133.579 17 types features features 306,006 74,599 (BOW) (Tag.me) avg avg features 42.11 features 9.21 (BOW) (Tag.me) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 58. experimental design • 10-fold cross validation • Dataset splitted in 10 partitions • 9 partitions for training the models, the last one for testing • Results averaged over all the partitions C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 59. metrics • classification task • precision = • retrieval task • precision @n = • precision @k% = C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 60. tuning of parameters • Random Indexing algorithm • Dimension of the vectors • Classification task: 500, 700 • Retrieval task: 500, 1000, 1500, 2000 • Minimum number of occurrences • Classification task: 2 • Retrieval task: 1, 3 • Training Cycles • Classification task: 1, 2 • Retrieval task: 1 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 61. classification task - results size occur. cycles tag.me bow 500 2 1 37.38 42.91 700 2 1 40.28 47.76 500 2 1 44.61 54.32 700 2 1 45.33 54.33 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 62. classification task: comparison 68.7 54.3 54.3 47.7 42.9 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 63. classification - results per program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 64. classification task - outcomes • BOW better than Tag.me • Representation too poor • Difficult to learn a solid and effective model for text classification • Dimension of the vector space and the second training cycles affect the predictive accuracy • RI does not overcome the baseline • Vector space reduced over 99% (from 133579 to 500 or 700) • Too much loss of information • but • Splitting the results for single program types the Random Indexing got better results in 10 out of 17 program types • Need to investigate the reasons of that C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 65. retrieval task - bow - p@n 82.6% 66.3% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 66. retrieval task - bow - p@n 65.9% 45.2% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 67. retrieval task - bow - p@n 58.1% 36.5% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 68. retrieval task - bow - p@k% 86.0% 58.1% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 69. retrieval task - bow - p@k% 55.4% 35.4% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 70. retrieval task - tagme - p@n 61.9% 47.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 71. retrieval task - tagme - p@n 53.7% 40.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 72. retrieval task - tagme - p@n 51.6% 39.0% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 73. retrieval task - tagme - p@k% 76.6% 57.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 74. retrieval task - tagme - p@k% 49.6% 35.4% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 75. retrieval task - overview 82.6% 61.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 76. retrieval task - overview 65.0% 53.0% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 77. retrieval task - overview 58.3% 53.2% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 78. retrieval task - outcomes • BOW always better than Tag.me • Between 5 and 20% difference • Parameters do not affect the accuracy • QN operator improves the retrieval accuracy by almost 20% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 79. conclusions & future work • In scenarios where the recommender system has to deal with a continous flow of information the VSM is not suitable • RI is able to effectively catch typical VSM drawbacks • Classification task • Even if its accuracy is lower, these preliminar results need to be further investigated, for example testing the algorithm with different values of the parameters • Is a worsening in precision suitable for an algorithm that provides a big improvement in scalability and efficiency? • Retrieval Task • QN improves the predictive accuracy of the model in the retrieval tasks • Novel operator, this is important outcome with a good scientific impact C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 80. Thanks for you attention. Cataldo Musto, Ph.D. Student cataldomusto@di.uniba.it - cataldo.musto@philips.com University of Bari “Aldo Moro” (Italy), SWAP Research Group Philips Research Center - Eindhoven (Netherlands) - HI&E Group 14.07.11 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n