SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Kavita Ganesan, ChengXiang Zhai & Evelyne Viegas
Go to project page
Opinion Summary for iPod Touch




Current methods: Focus on
generating aspect based ratings
for an entity
[Lu et al., 2009; Lerman et al., 2009;..]
Opinion Summary for iPod Touch


To know more: read many
redundant sentences
Opinion Summary for iPod Touch


To know more: read many
redundant sentences




         Structured summaries useful, but
                   insufficient!
Textual opinion summaries:
       • big and clear screen
       • battery lasts long without wifi
       • great selection of apps
       ………………


Textual summaries can provide more
           information…
   Represent the major opinions
     Key complaints/praise in text
     Important critical information

   Readable/well formed
     Easily understood by readers

   Compact/Concise
     Can be viewed on all screen sizes
      ▪ e.g. phone, pda, tablets, dekstops
     Maximize information conveyed
   Generate a set of non-redundant phrases:
     Summarizing key opinions in text
     Short (2-5 words)            Micropinions
     Readable
           Micropinion summary for a restaurant:
                  “Good service”
                  “Delicious soup dishes”

   Ultra-concise nature of phrases to allow flexible
    adjustment of summaries according to display
    constraints
   Has been widely studied
    [Radev et al.2000; Erkan & Radev, 2004; Mihalcea & Tarau, 2004…]

   Fairly easy to implement

   Problem: Not suitable for concise summaries
     Bias – with limit on summary size
      ▪ selected sentence/phrase may have missed critical info
     Verbose - may contain irrelevant information
      ▪ not suitable for smaller devices
 Understand the original text and “re-tell” story in a
    fewer words  Hard to achieve!

 Some methods require manual effort
    [DeJong 1982; Radev & McKeown 1998; Finley & Harabagiu 2002]
     Need to define templates
     Later filled with info using IE techniques

   Some methods rely on deep NL understanding
    [Saggion & Lapalme 2002; Jing & McKeown 2000]
     Domain dependent
     Impractical – high computational costs
   Goal is to extract important phrases from text
     Traditionally used to characterize documents
     Can be potentially used to select key opinion phrases
     Closest to our goal!

   Problem:
     Only topic phrases may be selected
      ▪ E.g. battery life, screen size  candidate phrases
      ▪ Enough to characterize documents, but we need more info!
     Readability aspect is not much of a concern
      ▪ E.g. “Battery short life” == “Short battery life”
     Most methods are supervised – need training data
      [Tomokiyo & Hurst 2003; Witten et al. 1999; Medelyan & Witten 2008]
   Unsupervised, lightweight & general approach to
    generating ultra-concise summaries of opinions
   Idea is to use existing words in original text to
    compose meaningful summaries
   Emphasis on 3 aspects:
     Compactness
      ▪ summaries should use as few words as possible
     Representativeness
      ▪ summaries should reflect major opinions in text
     Readability
      ▪ summaries should be fairly well formed
k
M   arg max      mi...mk           Srep (mi)     Sread (mi)
                             i 1

                 subject to
                  k
                        mi         ss
                  i 1

               Srep (mi)           rep

              Sread (mi)           read

        sim( mi , mj )   sim       i, j   1, k
k
M     arg max        mi...mk           Srep (mi)     Sread (mi)
                               i 1
Micropinion Summary, M subject     to
2.3 very clean rooms
2.1 friendly service       k
1.8 dirty lobby and pool      mi       ss
1.3 nice and polite staff i 1

                   Srep (mi)           rep

                  Sread (mi)           read

            sim( mi , mj )   sim       i, j   1, k
k
  M      arg max        mi...mk           Srep (mi)     Sread (mi)
                                  i 1
  Micropinion Summary, M subject      to
m1 2.3 very clean rooms
m2 2.1 friendly service       k
m3 1.8 dirty lobby and pool      mi       ss
mk 1.3 nice and polite staff
                             i 1

                      Srep (mi)           rep

                     Sread (mi)           read

               sim( mi , mj )   sim       i, j   1, k
Representativeness score of mi
                             k
M   arg max      mi...mk           Srep (mi)          Sread (mi)
                             i 1

                 subject to Readability score of mi
                  k
                        mi         ss
                  i 1

               Srep (mi)           rep

              Sread (mi)           read

        sim( mi , mj )   sim       i, j      1, k
Objective function

                              k
M   arg max      mi...mk           Srep (mi)     Sread (mi)
                             i 1

                 subject to
                  k
                        mi         ss
                  i 1

               Srep (mi)           rep

              Sread (mi)           read

        sim( mi , mj )   sim       i, j   1, k
k
M   arg max      mi...mk           Srep (mi)       Sread (mi)
                             i 1
                                                 constraints
                 subject to
                  k
                        mi         ss
                  i 1

               Srep (mi)           rep

              Sread (mi)           read

        sim( mi , mj )   sim       i, j   1, k
k
M       arg max              mi...mk             Srep (mi)         Sread (mi)
                                           i 1

Objective function: Optimize
                              subject to m
                                                    1 2.3   very clean rooms
representativeness & k                            m2 2.1    friendly service
readability scores         mi                     m
                                                 ss 3 1.8   dirty lobby and pool
Ensure: summaries reflect key opinions &          mk 1.3    nice and polite staff
                               i 1
reasonably well formed
                        S (m ) rep     i         rep Objective function value

                       Sread (mi)                read

                 sim( mi , mj )   sim            i, j   1, k
k
M   arg max     mi...mk           Srep (mi) Sread (mi)
                                     Constraint 1: Maximum
                            i 1
                                         length of summary.
                 subject to              •User adjustable
                  k                      •Captures compactness.
                       mi         ss
                 i 1

               Srep (mi)          rep

             Sread (mi)           read

        sim(mi , mj )   sim       i, j     1, k
k
M   arg max     mi...mk         Srep (mi)     Sread (mi)
                            i 1Constraint 2 & 3: Min
                               representativeness &
                 subject      to
                               readability.
                  k            •Helps improve efficiency
                       mi      •Does not affect performance
                                ss
                 i 1

               Srep (mi)        rep

             Sread (mi)         read

        sim(mi , mj )   sim     i, j   1, k
k
M   arg max     mi...mk           Srep (mi)     Sread (mi)
                            i 1

                 subject to
                  k
                       mi          Constraint 4: Max
                                  ss
                 i 1               similarity between phrases
               Srep (mi)           • User adjustable
                                  rep
                                   • Captures compactness by
             Sread (mi)              minimizing redundancies
                                  read

        sim(mi , mj )   sim       i, j   1, k
Ensures representativeness &
readability                         k
  M     arg max        mi...mk            Srep (mi)     Sread (mi)
                                    i 1

                        subject to Capture compactness
                         k
                               mi         ss
                        i 1

                     Srep (mi)            rep

                    Sread (mi)            Capture compactness
                                          read

              sim( mi , mj )   sim        i, j   1, k
   Now, we need to know how to compute:
     Similarity scores: sim(mi, mj) ?
     Readability scores: Sread(mi, mj) ?
     Representativeness scores: Srep(mi, mj) ?
   Score similarity between 2 phrases
   User adjustable parameter
   Why important?
     Allows user to control redundancy
     E.g. With small devices users may desire summaries with
      good coverage of information less redundancies !

   Measure used:
     Standard Jaccard Similarity Measure
     Can use other measures (e.g. cosine) – not our focus
   Purpose: Measure how well a phrase
    represents opinions from the original text?

   2 properties of a highly representative phrase:
    1. Words should be strongly associated in text
    2. Words should be sufficiently frequent in text


   Captured by a modified pointwise mutual
    information (PMI) function
Captures property 1:
                                        Measures strength of
                                        association between
   Original PMI function,          pmi(wi,wj)
                                        words
                               p( wi, wj )
    pmi ( wi , wj )   log 2
                            p( wi ) p( wj )
   Original PMI function, pmi(wi,wj)
                                 p( wi, wj )
    pmi ( wi , wj )     log 2                 Captures property 2:
                              p( wi ) p( wj ) Rewards well
                                              associated words with
   Modified PMI function,                    high co-occurrences
                                        pmi’(wi,wj)
                                                        Add frequency of
                              p( wi, wj ) c( wi, wj )   occurrence within
    pmi ' ( wi , wj )    log2
                                 p( wi ) p( wj )        a window
To compute representativeness of a phrase:
                        n           Take average strength of
                    1
 Srep (w1.. wn)                     association (pmi’) of each
                              pmilocal ( wi )
                    n   i 1         word, wi in phrase with C
                                    neighboring words
                              i C
                       1
  pmilocal ( wi )   [                 pmi ' ( wi, wj )]
                      2C      j i C
Gives a good estimate of
        how strongly associated
To   compute representativeness of
        the words are in a phrase                            a phrase:
                           n
                       1
 Srep (w1.. wn)                  pmilocal ( wi )
                       n   i 1


                                 i C
                          1
     pmilocal ( wi )   [                 pmi ' ( wi, wj )]
                         2C      j i C
   Purpose: Measure well-formedness of phrases
     Phrases are constructed from seed words
      ▪ can have new phrases not in original text
   No guarantee phrases would be well-formed
                                      chain rule to compute
                                      joint probability in terms of
 Our readability scoring:            conditional probabilities
                                      (averaged)
   Based on Microsoft's Web N-gram model (publicly available)
     N-gram model used to obtain conditional probabilities of phrases
                                        n
                             1
      S read ( wk ... wn )     log 2         p ( wk |wk   q   1...   wk   )
                                                                          1
                             K         k q
     Intuition: A phrase is more readable if it occurs more frequently on
      the web
Example: Readability scores of phrases using tri-gram LM

         Ungrammatical                  Grammatical
  “sucks life battery”   -4.51   “battery life sucks”   -2.93

  “life battery is poor” -3.66   “battery life is poor” -2.37
   Scoring each candidate phrase is not practical!

   Why?
     Phrases composed using words from original text
     Potential solution space would be Huge!


   Our solution: greedy summarization algorithm
     Explore solution space with heuristic pruning
     Touch only the most promising candidates
Input          Unigrams           Seed Bigrams
                 ….
                 very              very +    nice
                 nice              very +    clean
                                   very +    dirty
                 place
                                   clean +   place Srep > σ
                 clean             clean +   room           rep
 Text to be      problem
                 dirty
                                   dirty +   place …
summarized
                 room …
              Step 1: Shortlist    Step 2: Form seed
              high freq unigrams   bigrams by pairing
              (count > median)     unigrams. Shortlist
                                   by Srep. (Srep > σrep)
Higher order n-grams                                            Summary

Candidates + Seed Bi-grams = New Candidates
                   clean rooms   =    very clean rooms           0.9   very clean rooms
  very clean   +   clean bed          very clean bed             0.8   friendly service
                   dirty room          very dirty room           0.7   dirty lobby and pool
  very dirty   +                 =                               0.5   nice and polite staff
                   dirty pool          very dirty pool
                                                                 …..
   very nice   +   nice place    =     very nice place           …..
                   nice room           very nice room
                                                                       Sorted Candidates
                                     Srep<σrep ; Sread<σread
Step 3: Generate higher order n-grams.                         Step 4: Final summary.
• Concatenate existing candidates + seed bigrams               Sort by objective
• Prune non-promising candidates (Srep & Sread)                function value. Add
• Eliminate redundancies (sim(mi,mj))                          phrases until |M|< σss
• Repeat process on shortlisted candidates
  (until no possbility of expansion)
   Product reviews from CNET (330 products)
     Each product has minimum of 5 reviews
                                   CNET REVIEW
                xxxxxx


                                                 PROS


        Content Summarized
                                                 CONS

                                                 FULL REVIEW
Given


          Textual reviews about a product


                                        Constraints
             Micropinion Summary, M
           m1 2.3 easy to use           summary size, σss
Task       m2 2.1 lense is not clear    redundancy, σsim
           m3 1.8 too big for pocket
           mk 1.3 expensive batteries   representativeness, σrep
                                        readability , σread
        argmax Srep(mi) + Sread(mi)
   Human composed summaries
     Two human summarizers
      ▪ Each summarize 165 product reviews (total 330)


     Top 10 phrases from pros & cons provided as hints
      ▪ Hints help with topic coverage  reduce bias


     Summarizers asked to compose a set of short
      phrases (2-5 words) summarizing key opinions on
      the product
   3 representative baselines:
     Tf-Idf – unsupervised method commonly used for key phrase
      extraction tasks
      ▪ Selected only adjective containing n-grams (performance reasons)
      ▪ Redundancy removal used to generate non-redundant phrases

     KEA – state of the art supervised key phrase extraction
      model [witten et al. 1999]
      ▪ Uses a Naive Bayes model
      ▪ Trained using 100 review documents withheld from dataset

     Opinosis – unsupervised abstractive summarizer designed to
      generate textual opinion summaries [ganesan et al. 2010]
      ▪ Shown to be effective in generating concise opinion summaries
      ▪ Designed for highly redundant text
   ROUGE - to determine quality of summary
    (ROUGE-1 & ROUGE-2)
     Standard measure for summarization tasks
     Measures precision & recall of overlapping units
     between computer generated summary & human
     summaries
   Questionnaire – to assess potential utility of generated
    summaries to users
     Answered by 2 assessors
     Original reviews provided as reference
                               Questionnaire
                        [1] Very Poor - [5] Very Good

      Grammaticality          Non-redundancy           Informativeness
          [DUC 2005]             [DUC 2005]              [Filippova 2010]


                                                              Do the phrases
            Are the phrases       Are the phrases in         convey important
              readable?             the summary             information about
                                       unique?                 the product?
   Our approach is referred to as WebNGram
WebNgram: Performs
                 0.09                            the best for this task
                 0.08
                 0.07                          Opinosis: much better
ROUGE-2 RECALL




                 0.06                          than KEA & tfidf
                 0.05                      KEA: slightly        KEA
                 0.04                      better than tfidf Worst
                                                         Tfidf: Tfidf
                 0.03                                    performance
                                                                Opinosis
                 0.02                                            WebNGram
                 0.01
                 0.00
                        5   10     15     20      25     30
                            Summary Size (max words)
   Intuition: Well-formed phrases tend to be
    longer in general
          very clear screen vs. very clear
          good battery life vs. good battery
          screen is bright vs. screen is

   A few longer phrases is more desirable than
    many fragmented (i.e. short) phrases
KEA: Generates most # of
                       12       phrases (i.e. favors short
                                                                 11.75
                                phrases)
                       11                                     WebNGram: Generates
                                                              fewest phrases on average.
                       10
Average # of Phrases




                                                       9.86   (each phrase is longer)
                       9                                                    WebNgram
                                                                 8.74
                       8                                                    Tfidf
                                                       7.36                 Opinosis
                       7
                                                                            Kea
                       6           6.04

                       5
                                  4.53
                       4    WebNGram phrases are generally more
                               15          well-formed 30
                                                 25
                                       Summary Size (max words)
   In our algorithm: n-grams are generated from
    seed words
     potential of forming new phrases not in original text


   Why not use existing n-grams?
     With redundant opinions, using seen n-grams may
      be sufficient
     Performed a run by forcing only seen n-grams to
      appear as candidate phrases.
0.09        System generated
                 0.08             n-grams
                 0.07
                            Seen n-grams
ROUGE-2 Recall




                 0.06
                 0.05
                 0.04                               WebNGram
                 0.03                               WebNGramseen
                 0.02
                 0.01
                 0.00
                        Our search 15 20 25 30 discover
                         5    10    algorithm helps
                                 Summary Size (max words)
                                 useful new phrases
Example:
Unseen N-Gram (Acer AL2216 Monitor)
“wide screen lcd monitor is bright”
readability : -1.88
representativeness: 4.25


         “…plus the monitor is very bright…”                 Related
         “…it is a wide screen, great color, great quality…” snippets in
         “…this lcd monitor is quite bright and clear…”      original text
   Purpose of σrep & σread:
     Control minimum representativeness & readability
     Helps prune non-promising candidates  improves
     efficiency of algorithm

   Without σrep & σread - we would still arrive at a
    solution, however:
     Time to convergence would be much longer
     Results could be skewed
     These parameters need to be set correctly!
ROUGE-2 scores at                              ROUGE-2 scores at
various σread settings                         various σrep settings
                              0.06                        0.040


                              0.04




                                                ROUGE-2
                                     ROUGE-2
                                                          0.035
                              0.02


                              0.00                        0.030
     -4     -3      -2
                 σread   -1                                       1   2     3
                                                                          σrep   4   5


             Performance is stable except in extreme
            conditions - thresholds are too restrictive
ROUGE-2 scores at                                ROUGE-2 scores at
various σread settings                           various σrep settings
                                0.06                        0.040


                                0.04




                                                  ROUGE-2
                                       ROUGE-2
                                                            0.035
                                0.02


                                0.00                        0.030
     -4     -3      -2
                 σread     -1                                       1   2     3
                                                                            σrep   4   5
                         Ideal setting:
                         σread : between -2 and -4
                         σrep : between 1 and 4
Questionnaire [Score 1-5]
                         Score < 3  poor
                         Score > 3  good


   Grammaticality      Non-redundancy            Informativeness
        [DUC 2005]            [DUC 2005]            [Filippova 2010]


WebNgram 4.2/5        WebNgram 3.9/5             WebNgram 3.2/5
TfIDF        2.0/5    TfIDF          2.3/5       TfIDF         1.7/5
Human        4.7/5    Human          4.5/5       Human         3.6/5
Questionnaire [Score 1-5]
                          Score < 3  poor
                          Score > 3  good


  Grammaticality         Non-redundancy           Informativeness
        [DUC 2005]              [DUC 2005]          [Filippova 2010]


WebNgram 4.2/5          WebNgram 3.9/5            WebNgram 3.2/5
TfIDF        2.0/5      TfIDF          2.3/5      TfIDF        1.7/5
Human        4.7/5      Human          4.5/5      Human        3.6/5



                 TFIDF: Poor scores on all 3 aspects
                           (all below 3.0)
Questionnaire [Score 1-5]
                             Score < 3  poor
                             Score > 3  good


   Grammaticality          Non-redundancy            Informativeness
        [DUC 2005]                [DUC 2005]            [Filippova 2010]


WebNgram 4.2/5            WebNgram 3.9/5             WebNgram 3.2/5
TfIDF        2.0/5        TfIDF          2.3/5       TfIDF         1.7/5
Human        4.7/5        Human          4.5/5       Human         3.6/5



                     WebNGram: All scores above 3.0
                        Close to human scores
Questionnaire [Score 1-5]
                         Score < 3  poor
                         Score > 3  good


   Grammaticality      Non-redundancy            Informativeness
        [DUC 2005]            [DUC 2005]            [Filippova 2010]


WebNgram 4.2/5        WebNgram 3.9/5             WebNgram 3.2/5
TfIDF        2.0/5    TfIDF          2.3/5       TfIDF         1.7/5
Human        4.7/5    Human          4.5/5       Human         3.6/5


                                           Subjective aspect. Human
                                           summaries scored slightly
                                           better than WebNgram
   Semantic redundancies
    “Good sound quality” ≠ “Excellent audio”  semantically similar
     Due to directly using surface words
     Solution: opinion/phrase normalization before
      summarizing

   Some phrases are not opinion phrases
    “I bought this for Christmas”  grammatical & representative
     Due to our candidate selection strategy
      ▪ Plus Side: Very general approach, can summarize any text
      ▪ Down Side: Summaries may be too general
     Solution: stricter selection of phrases
      ▪ E.g. Select only opinion containing phrases
Canon Powershot SX120 IS
Easy to use
Good picture quality
Crisp and clear
Good video quality



Useful for pushing opinions
to devices where the screen
is small
                              E-reader/    Smart   Cell Phones
                                Tablet    Phones
   Optimization framework to generate ultra-concise
    summaries of opinions
     Emphasis: representativeness, readability & compactness

   Evaluation shows our summaries are:
     Well-formed and convey essential information
     More effective than other competing methods

   Our approach is unsupervised, lightweight &
    general
     Can summarize any other textual content
      (e.g. news articles, tweets, user comments, etc.)
Key Opinion Phrases for iPod Touch

Weitere ähnliche Inhalte

Andere mochten auch

AI & NLP pada @begobet
AI & NLP pada @begobetAI & NLP pada @begobet
AI & NLP pada @begobetJim Geovedi
 
Ai for games seminar: N-Grams prediction + intro to bayes inference
Ai for games seminar:  N-Grams prediction + intro to bayes inferenceAi for games seminar:  N-Grams prediction + intro to bayes inference
Ai for games seminar: N-Grams prediction + intro to bayes inferenceAndrea Tucci
 
Presente continúo
Presente continúoPresente continúo
Presente continúoNini Paz
 
Docentes I. E. D. El Tequendama
Docentes I. E. D. El TequendamaDocentes I. E. D. El Tequendama
Docentes I. E. D. El TequendamaIED_EL_TEQUENDAMA
 
Listening 3 activity
Listening 3 activityListening 3 activity
Listening 3 activityNini Paz
 
Beer baron presentation
Beer baron presentationBeer baron presentation
Beer baron presentationthebeerbaron
 
Home pageslides revised v2
Home pageslides   revised v2Home pageslides   revised v2
Home pageslides revised v2fooserv
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Kavita Ganesan
 
Interactive tv fri123 7
Interactive tv fri123 7Interactive tv fri123 7
Interactive tv fri123 7설란 문
 
BLOQUES DE REALIDAD NACIONAL
BLOQUES DE REALIDAD NACIONALBLOQUES DE REALIDAD NACIONAL
BLOQUES DE REALIDAD NACIONALJuan Wong
 
Defesa Dr Kleber Amancio
Defesa Dr Kleber AmancioDefesa Dr Kleber Amancio
Defesa Dr Kleber AmancioDaniela Bercot
 
Listening exercise ted 2
Listening exercise ted 2Listening exercise ted 2
Listening exercise ted 2Nini Paz
 

Andere mochten auch (19)

AI & NLP pada @begobet
AI & NLP pada @begobetAI & NLP pada @begobet
AI & NLP pada @begobet
 
Ai for games seminar: N-Grams prediction + intro to bayes inference
Ai for games seminar:  N-Grams prediction + intro to bayes inferenceAi for games seminar:  N-Grams prediction + intro to bayes inference
Ai for games seminar: N-Grams prediction + intro to bayes inference
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Flessas c.a 2011
Flessas c.a 2011Flessas c.a 2011
Flessas c.a 2011
 
Diversity in Army Recruiting
Diversity in Army RecruitingDiversity in Army Recruiting
Diversity in Army Recruiting
 
Presente continúo
Presente continúoPresente continúo
Presente continúo
 
Feelings
FeelingsFeelings
Feelings
 
Mars
MarsMars
Mars
 
Hansen
HansenHansen
Hansen
 
Docentes I. E. D. El Tequendama
Docentes I. E. D. El TequendamaDocentes I. E. D. El Tequendama
Docentes I. E. D. El Tequendama
 
Listening 3 activity
Listening 3 activityListening 3 activity
Listening 3 activity
 
Beer baron presentation
Beer baron presentationBeer baron presentation
Beer baron presentation
 
Home pageslides revised v2
Home pageslides   revised v2Home pageslides   revised v2
Home pageslides revised v2
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
 
Interactive tv fri123 7
Interactive tv fri123 7Interactive tv fri123 7
Interactive tv fri123 7
 
Clothes
ClothesClothes
Clothes
 
BLOQUES DE REALIDAD NACIONAL
BLOQUES DE REALIDAD NACIONALBLOQUES DE REALIDAD NACIONAL
BLOQUES DE REALIDAD NACIONAL
 
Defesa Dr Kleber Amancio
Defesa Dr Kleber AmancioDefesa Dr Kleber Amancio
Defesa Dr Kleber Amancio
 
Listening exercise ted 2
Listening exercise ted 2Listening exercise ted 2
Listening exercise ted 2
 

Kürzlich hochgeladen

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Kürzlich hochgeladen (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Key Opinion Phrases for iPod Touch

  • 1. Kavita Ganesan, ChengXiang Zhai & Evelyne Viegas Go to project page
  • 2. Opinion Summary for iPod Touch Current methods: Focus on generating aspect based ratings for an entity [Lu et al., 2009; Lerman et al., 2009;..]
  • 3. Opinion Summary for iPod Touch To know more: read many redundant sentences
  • 4. Opinion Summary for iPod Touch To know more: read many redundant sentences Structured summaries useful, but insufficient!
  • 5. Textual opinion summaries: • big and clear screen • battery lasts long without wifi • great selection of apps ……………… Textual summaries can provide more information…
  • 6. Represent the major opinions  Key complaints/praise in text  Important critical information  Readable/well formed  Easily understood by readers  Compact/Concise  Can be viewed on all screen sizes ▪ e.g. phone, pda, tablets, dekstops  Maximize information conveyed
  • 7. Generate a set of non-redundant phrases:  Summarizing key opinions in text  Short (2-5 words) Micropinions  Readable Micropinion summary for a restaurant: “Good service” “Delicious soup dishes”  Ultra-concise nature of phrases to allow flexible adjustment of summaries according to display constraints
  • 8.
  • 9. Has been widely studied [Radev et al.2000; Erkan & Radev, 2004; Mihalcea & Tarau, 2004…]  Fairly easy to implement  Problem: Not suitable for concise summaries  Bias – with limit on summary size ▪ selected sentence/phrase may have missed critical info  Verbose - may contain irrelevant information ▪ not suitable for smaller devices
  • 10.  Understand the original text and “re-tell” story in a fewer words  Hard to achieve!  Some methods require manual effort [DeJong 1982; Radev & McKeown 1998; Finley & Harabagiu 2002]  Need to define templates  Later filled with info using IE techniques  Some methods rely on deep NL understanding [Saggion & Lapalme 2002; Jing & McKeown 2000]  Domain dependent  Impractical – high computational costs
  • 11. Goal is to extract important phrases from text  Traditionally used to characterize documents  Can be potentially used to select key opinion phrases  Closest to our goal!  Problem:  Only topic phrases may be selected ▪ E.g. battery life, screen size  candidate phrases ▪ Enough to characterize documents, but we need more info!  Readability aspect is not much of a concern ▪ E.g. “Battery short life” == “Short battery life”  Most methods are supervised – need training data [Tomokiyo & Hurst 2003; Witten et al. 1999; Medelyan & Witten 2008]
  • 12. Unsupervised, lightweight & general approach to generating ultra-concise summaries of opinions  Idea is to use existing words in original text to compose meaningful summaries  Emphasis on 3 aspects:  Compactness ▪ summaries should use as few words as possible  Representativeness ▪ summaries should reflect major opinions in text  Readability ▪ summaries should be fairly well formed
  • 13. k M arg max mi...mk Srep (mi) Sread (mi) i 1 subject to k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 14. k M arg max mi...mk Srep (mi) Sread (mi) i 1 Micropinion Summary, M subject to 2.3 very clean rooms 2.1 friendly service k 1.8 dirty lobby and pool mi ss 1.3 nice and polite staff i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 15. k M arg max mi...mk Srep (mi) Sread (mi) i 1 Micropinion Summary, M subject to m1 2.3 very clean rooms m2 2.1 friendly service k m3 1.8 dirty lobby and pool mi ss mk 1.3 nice and polite staff i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 16. Representativeness score of mi k M arg max mi...mk Srep (mi) Sread (mi) i 1 subject to Readability score of mi k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 17. Objective function k M arg max mi...mk Srep (mi) Sread (mi) i 1 subject to k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 18. k M arg max mi...mk Srep (mi) Sread (mi) i 1 constraints subject to k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 19. k M arg max mi...mk Srep (mi) Sread (mi) i 1 Objective function: Optimize subject to m 1 2.3 very clean rooms representativeness & k m2 2.1 friendly service readability scores mi m ss 3 1.8 dirty lobby and pool Ensure: summaries reflect key opinions & mk 1.3 nice and polite staff i 1 reasonably well formed S (m ) rep i rep Objective function value Sread (mi) read sim( mi , mj ) sim i, j 1, k
  • 20. k M arg max mi...mk Srep (mi) Sread (mi) Constraint 1: Maximum i 1 length of summary. subject to •User adjustable k •Captures compactness. mi ss i 1 Srep (mi) rep Sread (mi) read sim(mi , mj ) sim i, j 1, k
  • 21. k M arg max mi...mk Srep (mi) Sread (mi) i 1Constraint 2 & 3: Min representativeness & subject to readability. k •Helps improve efficiency mi •Does not affect performance ss i 1 Srep (mi) rep Sread (mi) read sim(mi , mj ) sim i, j 1, k
  • 22. k M arg max mi...mk Srep (mi) Sread (mi) i 1 subject to k mi Constraint 4: Max ss i 1 similarity between phrases Srep (mi) • User adjustable rep • Captures compactness by Sread (mi) minimizing redundancies read sim(mi , mj ) sim i, j 1, k
  • 23. Ensures representativeness & readability k M arg max mi...mk Srep (mi) Sread (mi) i 1 subject to Capture compactness k mi ss i 1 Srep (mi) rep Sread (mi) Capture compactness read sim( mi , mj ) sim i, j 1, k
  • 24. Now, we need to know how to compute:  Similarity scores: sim(mi, mj) ?  Readability scores: Sread(mi, mj) ?  Representativeness scores: Srep(mi, mj) ?
  • 25. Score similarity between 2 phrases  User adjustable parameter  Why important?  Allows user to control redundancy  E.g. With small devices users may desire summaries with good coverage of information less redundancies !  Measure used:  Standard Jaccard Similarity Measure  Can use other measures (e.g. cosine) – not our focus
  • 26. Purpose: Measure how well a phrase represents opinions from the original text?  2 properties of a highly representative phrase: 1. Words should be strongly associated in text 2. Words should be sufficiently frequent in text  Captured by a modified pointwise mutual information (PMI) function
  • 27. Captures property 1: Measures strength of association between  Original PMI function, pmi(wi,wj) words p( wi, wj ) pmi ( wi , wj ) log 2 p( wi ) p( wj )
  • 28. Original PMI function, pmi(wi,wj) p( wi, wj ) pmi ( wi , wj ) log 2 Captures property 2: p( wi ) p( wj ) Rewards well associated words with  Modified PMI function, high co-occurrences pmi’(wi,wj) Add frequency of p( wi, wj ) c( wi, wj ) occurrence within pmi ' ( wi , wj ) log2 p( wi ) p( wj ) a window
  • 29. To compute representativeness of a phrase: n Take average strength of 1 Srep (w1.. wn) association (pmi’) of each pmilocal ( wi ) n i 1 word, wi in phrase with C neighboring words i C 1 pmilocal ( wi ) [ pmi ' ( wi, wj )] 2C j i C
  • 30. Gives a good estimate of how strongly associated To compute representativeness of the words are in a phrase a phrase: n 1 Srep (w1.. wn) pmilocal ( wi ) n i 1 i C 1 pmilocal ( wi ) [ pmi ' ( wi, wj )] 2C j i C
  • 31. Purpose: Measure well-formedness of phrases  Phrases are constructed from seed words ▪ can have new phrases not in original text  No guarantee phrases would be well-formed chain rule to compute joint probability in terms of  Our readability scoring: conditional probabilities (averaged)  Based on Microsoft's Web N-gram model (publicly available)  N-gram model used to obtain conditional probabilities of phrases n 1 S read ( wk ... wn ) log 2 p ( wk |wk q 1... wk ) 1 K k q  Intuition: A phrase is more readable if it occurs more frequently on the web
  • 32. Example: Readability scores of phrases using tri-gram LM Ungrammatical Grammatical “sucks life battery” -4.51 “battery life sucks” -2.93 “life battery is poor” -3.66 “battery life is poor” -2.37
  • 33. Scoring each candidate phrase is not practical!  Why?  Phrases composed using words from original text  Potential solution space would be Huge!  Our solution: greedy summarization algorithm  Explore solution space with heuristic pruning  Touch only the most promising candidates
  • 34. Input Unigrams Seed Bigrams …. very very + nice nice very + clean very + dirty place clean + place Srep > σ clean clean + room rep Text to be problem dirty dirty + place … summarized room … Step 1: Shortlist Step 2: Form seed high freq unigrams bigrams by pairing (count > median) unigrams. Shortlist by Srep. (Srep > σrep)
  • 35. Higher order n-grams Summary Candidates + Seed Bi-grams = New Candidates clean rooms = very clean rooms 0.9 very clean rooms very clean + clean bed very clean bed 0.8 friendly service dirty room very dirty room 0.7 dirty lobby and pool very dirty + = 0.5 nice and polite staff dirty pool very dirty pool ….. very nice + nice place = very nice place ….. nice room very nice room Sorted Candidates Srep<σrep ; Sread<σread Step 3: Generate higher order n-grams. Step 4: Final summary. • Concatenate existing candidates + seed bigrams Sort by objective • Prune non-promising candidates (Srep & Sread) function value. Add • Eliminate redundancies (sim(mi,mj)) phrases until |M|< σss • Repeat process on shortlisted candidates (until no possbility of expansion)
  • 36.
  • 37. Product reviews from CNET (330 products)  Each product has minimum of 5 reviews CNET REVIEW xxxxxx PROS Content Summarized CONS FULL REVIEW
  • 38. Given Textual reviews about a product Constraints Micropinion Summary, M m1 2.3 easy to use summary size, σss Task m2 2.1 lense is not clear redundancy, σsim m3 1.8 too big for pocket mk 1.3 expensive batteries representativeness, σrep readability , σread argmax Srep(mi) + Sread(mi)
  • 39. Human composed summaries  Two human summarizers ▪ Each summarize 165 product reviews (total 330)  Top 10 phrases from pros & cons provided as hints ▪ Hints help with topic coverage  reduce bias  Summarizers asked to compose a set of short phrases (2-5 words) summarizing key opinions on the product
  • 40. 3 representative baselines:  Tf-Idf – unsupervised method commonly used for key phrase extraction tasks ▪ Selected only adjective containing n-grams (performance reasons) ▪ Redundancy removal used to generate non-redundant phrases  KEA – state of the art supervised key phrase extraction model [witten et al. 1999] ▪ Uses a Naive Bayes model ▪ Trained using 100 review documents withheld from dataset  Opinosis – unsupervised abstractive summarizer designed to generate textual opinion summaries [ganesan et al. 2010] ▪ Shown to be effective in generating concise opinion summaries ▪ Designed for highly redundant text
  • 41. ROUGE - to determine quality of summary (ROUGE-1 & ROUGE-2)  Standard measure for summarization tasks  Measures precision & recall of overlapping units between computer generated summary & human summaries
  • 42. Questionnaire – to assess potential utility of generated summaries to users  Answered by 2 assessors  Original reviews provided as reference Questionnaire [1] Very Poor - [5] Very Good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010] Do the phrases Are the phrases Are the phrases in convey important readable? the summary information about unique? the product?
  • 43. Our approach is referred to as WebNGram
  • 44. WebNgram: Performs 0.09 the best for this task 0.08 0.07 Opinosis: much better ROUGE-2 RECALL 0.06 than KEA & tfidf 0.05 KEA: slightly KEA 0.04 better than tfidf Worst Tfidf: Tfidf 0.03 performance Opinosis 0.02 WebNGram 0.01 0.00 5 10 15 20 25 30 Summary Size (max words)
  • 45. Intuition: Well-formed phrases tend to be longer in general very clear screen vs. very clear good battery life vs. good battery screen is bright vs. screen is  A few longer phrases is more desirable than many fragmented (i.e. short) phrases
  • 46. KEA: Generates most # of 12 phrases (i.e. favors short 11.75 phrases) 11 WebNGram: Generates fewest phrases on average. 10 Average # of Phrases 9.86 (each phrase is longer) 9 WebNgram 8.74 8 Tfidf 7.36 Opinosis 7 Kea 6 6.04 5 4.53 4 WebNGram phrases are generally more 15 well-formed 30 25 Summary Size (max words)
  • 47. In our algorithm: n-grams are generated from seed words  potential of forming new phrases not in original text  Why not use existing n-grams?  With redundant opinions, using seen n-grams may be sufficient  Performed a run by forcing only seen n-grams to appear as candidate phrases.
  • 48. 0.09 System generated 0.08 n-grams 0.07 Seen n-grams ROUGE-2 Recall 0.06 0.05 0.04 WebNGram 0.03 WebNGramseen 0.02 0.01 0.00 Our search 15 20 25 30 discover 5 10 algorithm helps Summary Size (max words) useful new phrases
  • 49. Example: Unseen N-Gram (Acer AL2216 Monitor) “wide screen lcd monitor is bright” readability : -1.88 representativeness: 4.25 “…plus the monitor is very bright…” Related “…it is a wide screen, great color, great quality…” snippets in “…this lcd monitor is quite bright and clear…” original text
  • 50. Purpose of σrep & σread:  Control minimum representativeness & readability  Helps prune non-promising candidates  improves efficiency of algorithm  Without σrep & σread - we would still arrive at a solution, however:  Time to convergence would be much longer  Results could be skewed  These parameters need to be set correctly!
  • 51. ROUGE-2 scores at ROUGE-2 scores at various σread settings various σrep settings 0.06 0.040 0.04 ROUGE-2 ROUGE-2 0.035 0.02 0.00 0.030 -4 -3 -2 σread -1 1 2 3 σrep 4 5 Performance is stable except in extreme conditions - thresholds are too restrictive
  • 52. ROUGE-2 scores at ROUGE-2 scores at various σread settings various σrep settings 0.06 0.040 0.04 ROUGE-2 ROUGE-2 0.035 0.02 0.00 0.030 -4 -3 -2 σread -1 1 2 3 σrep 4 5 Ideal setting: σread : between -2 and -4 σrep : between 1 and 4
  • 53. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010] WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5 TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5 Human 4.7/5 Human 4.5/5 Human 3.6/5
  • 54. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010] WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5 TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5 Human 4.7/5 Human 4.5/5 Human 3.6/5 TFIDF: Poor scores on all 3 aspects (all below 3.0)
  • 55. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010] WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5 TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5 Human 4.7/5 Human 4.5/5 Human 3.6/5 WebNGram: All scores above 3.0 Close to human scores
  • 56. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010] WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5 TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5 Human 4.7/5 Human 4.5/5 Human 3.6/5 Subjective aspect. Human summaries scored slightly better than WebNgram
  • 57. Semantic redundancies “Good sound quality” ≠ “Excellent audio”  semantically similar  Due to directly using surface words  Solution: opinion/phrase normalization before summarizing  Some phrases are not opinion phrases “I bought this for Christmas”  grammatical & representative  Due to our candidate selection strategy ▪ Plus Side: Very general approach, can summarize any text ▪ Down Side: Summaries may be too general  Solution: stricter selection of phrases ▪ E.g. Select only opinion containing phrases
  • 58. Canon Powershot SX120 IS Easy to use Good picture quality Crisp and clear Good video quality Useful for pushing opinions to devices where the screen is small E-reader/ Smart Cell Phones Tablet Phones
  • 59. Optimization framework to generate ultra-concise summaries of opinions  Emphasis: representativeness, readability & compactness  Evaluation shows our summaries are:  Well-formed and convey essential information  More effective than other competing methods  Our approach is unsupervised, lightweight & general  Can summarize any other textual content (e.g. news articles, tweets, user comments, etc.)

Hinweis der Redaktion

  1. Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
  2. Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
  3. Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
  4. The problem is that non-informative phrases may be selected as candidate phrases. Phrases such as battery life and screen size may become candidate phrases since traditionally it is used only to characterize or tag documents.This is however non informative for an opinion summary.
  5. We try to capture these three criteria using the following optimzation framework.
  6. M represents a micropinion summary consisting of a set of short phrases
  7. Mi to mk represents each short phrase in the summary
  8. Srep(mi) is the representativeness score of miSread(mi) is the readability score of mi
  9. This equation is the objective function
  10. These are the constraints of the optimization framework
  11. Objective function ensure that the summaries reflect opinions from the original text and is reasonably well formed. This is an example of the objective function values
  12. -The first threshold constraint that controls the maximum length of the summary.-This is a user adjustable parameter and helps captures compactness.
  13. This constraint controls the similarity between phrases which is a user adjustable parameters.Italso captures compactness by minimizing the amount redundancies.
  14. To score similarity between phrases, we simply use the jaccard similarity measure
  15. Then the next is representativeness…Then in scoring representativeness, we have defined two properties of highly representative phrases:The words in each phrase should be strongly associated within a narrow window in the original textThe words in each phrase should be sufficiently frequent in the original textThese two properties is capture by a modified pointwise mutual information functionThen to scoreThe intuition is that if a generated phrase occurs frequently onthe web, then this phrase is readable.
  16. This is the original PMI formula. It captures property 1 by measuring the strength of association between two words.
  17. This is the modified PMI formula. We add the frequency of occurrence of words within a narrow window.This would capture property 2 by rewarding highly co-occuring word pairs. This also addresses the problem ofsparseness, where low frequency words tend to yield in high PMI scores.
  18. So, to actually score the representativeness of a phrase, we take the average strength of association which is pmi’ of a given word wi with C of its neighboring words. C is a window size that is empirically set. We will later show influence of C on the generated summaries.
  19. So, to actually score the representativeness of a phrase, we take the average strength of association which is pmi’ of a given word wi with C of its neighboring words. C is a window size that is empirically set. We will later show influence of C on the generated summaries.
  20. Readability scoring is to determine how well formed the constructed phrases are.Since the phrases are constructed from seed words, we can have new phrases that may not have occurredin the original text.
  21. Now lets look at some examples of readability scores, using the tri-gram lm. Let us say we want to summarize some facts about the battery life….
  22. Now we have all the scoring components that we need for the optimization frameworkBut we cannot score each candidate phrase because we are composing phrases using words from the original text….the potential solution space can be huge. Our solution is to use a greedy algorithm where we explore the solution with some heuristics pruning so that we only touch the most promising candidates.
  23. From the input text to be summarized, which is basically..some review texts,We shortlist a set of high freq unigrams where the count is larger than the median count (not counting 1’s). This acts as an initial reduction in search space. Then from these unigrams, we generate seed bigrams by pairing one unigram wiith every other unigram. We compute representativeness of the seed bigrams and keep only those that meet the minimum rep requirement. Pairunigram with every other unigram. Compute the representativeness scores and keep only those where the Srep &gt;σrepThe next step is trying to grow the seed bigrams into higher order promising n-grams. This is done by concatenating existing candidates with bigrams that share an overlapping word. At any point, if the representativeness or readability scores are below the corresponding thresholds, these n-grams are pruned.At this point wel aslo check redundancy of the generated candidates, and if two phrases have a similarity higher than the sim threshold, we would discard the one with a lower combined scoreof Srep + Sreadnal step is
  24. The third step is where we generate higher order n-grams that act as candidate summaries. This is done by concatenating existing candidates with seed bigrams that share an overlapping word. This gives you the new candidates. Then non-promising candidates are pruned based on their representativeness and readability scores. This further reduces the overall candidate search space. Then we also check for redundancy. If two candidates have high similarity scores, we would discard one these candidates.And this process repeats for the shortlisted candidates until there is no other expansion possibilitiesThen to get the final summary, we sort the candidate n-grams based on their objective function values (i.e., sum of Srep and Sread) and we gradually add phrases until the max summary length is reached
  25. Moving into the evaluation part….
  26. To evaluate this summarization task we use product reviews from CNET for 330 diff products for products that have a minimum of 5 reviews. The CNET review, has 3 parts to it. It has pros, cons and the full review. We ignored the pros and cons and only focused on summarizing the full reviews.
  27. As our first baseline, we use TF-IDF, an unsupervised language-independent method commonly used for keyphrase extraction tasks. To make the TF-IDF baseline competitive, welimit the n-grams in consideration to those that contain at least one adjective (i.e. favoring opinion containing n-grams). The performance is much worse without this selection step. Then, we used the same redundancy removal technique used in our approach to allow a set ofnon-redundant phrases to be generated.For our second baseline, we use KEA a state-of-the-art keyphrase extraction method. KEA buildsa Naive Bayes model with known keyphrases, and then uses the model to find keyphrases in new documents. The KEA model was trained using 100 review documents withheldfrom our dataset . With KEA as a baseline, we would gain insights into the applicabilityof existing keyphrase extraction approaches for the task of micropinion generation. The comparison is strictly speaking an unfair comparison.As our final representative baseline, we use Opinosis, an abstractive summarizer designed to generate textual summary of opinions.
  28. 3 key questions rated on a scale from 1 (Very Poor) to 5 (Very Good). Grammaticality: Are the phrases in the summary readable with no obvious errors? Non-redundancy: Are the phrases in the summary unique with unnecessary repetitions such as repeated facts or repeated use of noun phrases? Informativeness: Do the phrases convey important information regarding the product? This can be positive/negative opinions about the product or some critical facts about the product. Note that the first two aspects, grammaticality and non-redundancy are linguistic questions used at the 2005 Document Understanding Conference (DUC) [4]. The last aspect, informativeness, has been used in other summarization tasks and is key in measuring how much users would learn fromthe summaries.
  29. -This graph shows the rouge scores of thedifferent summarization methods for different summary lengths.-We have KEA is a supervised keyphrase extraction model-We have a simple tfidf based keyphrase extraction method-Then we have Opinosis-TF-IDF performs the worst. This is likely due to insucient redundancies of meaningful n-grams.-KEA does only slightly better than TF-IDF even though KEA is a supervised approach. While KEA is suitable for characterizing documents, such a supervised approach proves to be insufcient for the task of generating micropinions. The model may actually need a more sophisticated set of features for it to generate more meaningful summaries.-Opinosis performs better than both KEA and TFIDF-For this task, WebNgram performs the best
  30. Phrases that are well-formed tend to be in general longer…Thus, a few readable phrases is more desirable than many fragmented (or very short) phrases. So for example if we have two phrases very clear screen and very clear. Here it is obvious that the more readable phrase is the first one as it is not fragmented and is well formed. This well-formed phrase is also longer than the short fragmented one.So now we will look into the average number of phrases generated by each summarization method.
  31. -This graph shows the average number of phrases generated for different summary lengths using different summarization methods-Here we can see that KEA generates the most # of phrases, so it tends to favor very short phrases. On the other hand While the phrases may represent characteristics of the document, the phrases are not well formed enough to convey essential opinions.
  32. In our candidate generation approach, we form n-grams from seed words from the original text. So there is a potential of forming new phrases not in the text. It is reasonable to think that with sufficiently redundant opinions, searching the restricted space of all seen n-grams may be enough for generating reasonable summaries. To test this hypothesis, we performed a run by forcing onnly seen n-grams to appear as candidate summaries.
  33. -This graph shows the ROUGE-2 recall scores using seen n-grams vs. system generated n-grams for different summary lengths. -From these results it is clear that our search algorithm actually helps discover useful new phrases
  34. -Now we will look into the stability of the non-user dependent parameters, sigma rep and sigma read.-the purpose of these two parameters is to control the min rep and readability scores. -This actually helps us prune non-promising candidates which eventually helps improve the efficiency of the algorithm.-Without these two parameters we would still arrive at a reasonable solution but the time to convergence would be much longer and the results could be skewed…e.g. very representative but not readable…so these parameters need to be correctly set.
  35. -This first graph shows the ROUGE-2 recall scores at different sigma read settings-This second graph shows the ROUGE-2 recall scores at different sigma rep settings-So here we can see that the performance is actually stable at different settings except when the conditions are extreme – when the sigma read or sigma rep scores are too restrictive.
  36. The ideal setting is thus…
  37. -To assess potential utility of the generated summaries to real users, a subset of the summaries were manually evaluated.-As mentioned earlier, we look at 3 aspects: grammaticality, non-redundancy and informativeness scores -Each scored on a scale from 1 to 5-A score below 3 is considered `poor&apos; and a score above 3 is considered `good&apos;.-These are the results for the 3 aspects-The results on human summaries serves as an upper bound for comparison.
  38. -Earlier we showed that TF-IDF had lowest ROUGE scores amongst all the approaches, indicating that the summaries may not be very useful. -Human assessors agree with this conclusion as TF-IDF summaries received poor scores (all below 3) on all three dimensions compared to WebNgram and human summaries.
  39. In summary, in this work we proposed an optimi….with special emphasis on…of the generated summaries.Our evaluation shows that the …We give special emphasis on….