SlideShare a Scribd company logo
1 of 22
Download to read offline
Ravali Pochampally
Kamal Karlapalem
 [IIIT Hyderabad]
• WWW
  - diverse content
  - 100+ articles on major
    topics



• Google News/Amazon
  - organized
  - (yet) too much text




                             2
• Condenses information

          salient points
          length α (1/content)
          user-specified parameters


• Lacks Organization

         × delineation of issues
         × model diversity
         × too long (?)




                                       3
• A view intends to represent an issue pertaining to a
     set of related1 articles
      - organized (multiple concise views)
      - information exploration
      - detailed snapshot


  • Example2 : review dataset (hotel)
      - views               [positive, negative, food, facilities]
      - summary             [unorganized]


1. articles concerning a common topic (FIFA 2010, swine flu in India etc.)
2. http://sites.google.com/site/diverseviews/comparison
                                                                             4
• Related Work
• Problem


• Extraction of views
  • Ranking




• Results
  • Discussion


                        7
• Allison et. al
     • idea of multiple view-points
     • framework


   • Tombros et. al
     • clustering of top-ranking sentences


   • TextTiling 1
     • divide text into multi-paragraph units
     • unit represents a sub-topic

1. M. A. Hearst 1997                            8
Related
Articles


                  Problem
           [Mining Diverse Views]




                                    Ranked set of views


                                                      9
* http://sites.google.com/site/diverseviews/datasets

                                                       10
• Datasets
     • sources: google news, amazon.com, tripadvisor.com
     • crawling + parsing [html and rss]




   • Data cleaning & pre-processing
     • stopwords, stemming and duplicates
     • word-frequency, TF-IDF 1




1. http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html   11
• Main idea
  • We score each sentence in our dataset and extract the
    top-ranked ones. These sentences are used to generate
    views [Pruning]

  • We assign a Importance (I) score to each sentence


  • Importance Ik of a sentence Sk belonging to article dj of
    length r is

                            Ik = Πr Ti,j
                                    r
                  Ti,j = TF-IDF of wi є (Sk Λ dj)
                                                                12
• A measure of similarity is required to extract views
  from sentences
  • Semantic similarity : likeness of meaning


• Mihalcea et. al
  • specificity of a word can be determined by its idf
  • we use
       word-to-word similarity &
       specificity - to calculate semantic similarity




                                                         13
• Semantic similarity between sentences Si and Sj
      where w represents a word in a sentence is



                (symmetric relation & range Є [0,1])

  • Need to define maxSim(w, Sj)
    • Wordnet : sets of cognitive synonyms (synsets)
    • wup1 : based on path length between synsets

1. Z. Wu 1994                                          14
• Clustering is used to extract views from the set of
 important sentences

• Hierarchical Agglomerative Clustering (HAC) was
 used
  • upper triangle [symmetric matrix]
  • no restrictions on # of clusters
  • terminate clustering when scoring parameter converges


• We treat clusters as views discussing similar content
                                                            15
• Focus on average pair-wise similarity between the
 sentences in a view

• Cohesion

                      C = Σi,j Є V Si,j
                              len(v)
                     Si,j = sim(Ti , Tj )

              V = set of sentences (Ti) in the view
              len(v) = # of sentences in the view


                                                      16
• Most relevant view (MRV)
  • preference to views discussing similar content [greater
    cohesion]
  • top-ranked view


• Outlier view (OV)
  • single sentence
  • low semantic similarity with other sentences
  • Cohesion = 0 [ordered by importance]




                                                              17
HTML + Text                   Raw Text
Related                   Data Cleaning                      Extract
                          and                                Top-ranked
Articles
                          Preprocessing                      sentences




                                                     Top-n
                                                     Sentences




 Ranked                   Ranking                            Clustering
 Views                    (Cohesion)                         Engine
 MRV & OV
             Ranked
             Views                        Views



                                                                          18
• Number of top-ranking sentences (n) vs. cohesion
         • n which can maximize cohesion
         • median cohesion >= mean [outliers]
         • 20 <= n <= 35
         • incremental clustering   1




     • More top-ranking sentences need not necessarily lead
        to views with better cohesion

1. M. Charikar STOC 1997

                                                              19
20
• IR model as an alternate to summarization
        multiple diverse views
        easily navigable
        browse top x views
        detailed (yet organized) snapshot of a ToI
        clustering at sentence/phrase level*


   • Future work
       • polarity of a view
       • user feedback
         • Implicit [clicks, time-spent]
         • Explicit [user-ratings]
* as opposed to document clustering
                                                      21
Thanks!

More Related Content

Similar to Mining Diverse Views from Related Articles

Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
Math-Bridge Edit Authoring
Math-Bridge Edit AuthoringMath-Bridge Edit Authoring
Math-Bridge Edit Authoringmetamath
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...Dr.-Ing. Thomas Hartmann
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallJohn Mulhall
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...Joshua Shinavier
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalA. LE
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: IntroductionGuus Schreiber
 

Similar to Mining Diverse Views from Related Articles (20)

Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
Math-Bridge Edit Authoring
Math-Bridge Edit AuthoringMath-Bridge Edit Authoring
Math-Bridge Edit Authoring
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John Mulhall
 
Final presentation
Final presentationFinal presentation
Final presentation
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: Introduction
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 

More from RENDER project

Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012RENDER project
 
Internals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedInternals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedRENDER project
 
Unterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaUnterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaRENDER project
 
Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1RENDER project
 
Wiki case study - Review year 1
Wiki case study  - Review year 1Wiki case study  - Review year 1
Wiki case study - Review year 1RENDER project
 
Towards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaTowards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaRENDER project
 
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusDiversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusRENDER project
 
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...RENDER project
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...RENDER project
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuRENDER project
 
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicDiversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicRENDER project
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlRENDER project
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
Render Project introduction and overview
Render Project introduction and overviewRender Project introduction and overview
Render Project introduction and overviewRENDER project
 

More from RENDER project (17)

Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012
 
Internals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedInternals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News Feed
 
Unterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaUnterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für Wikipedia
 
Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1
 
Wiki case study - Review year 1
Wiki case study  - Review year 1Wiki case study  - Review year 1
Wiki case study - Review year 1
 
Towards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaTowards a diversity-minded Wikipedia
Towards a diversity-minded Wikipedia
 
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusDiversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
 
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
 
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicDiversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena Simperl
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Diversity toolkit
Diversity toolkitDiversity toolkit
Diversity toolkit
 
RENDER Telefonica
RENDER TelefonicaRENDER Telefonica
RENDER Telefonica
 
Defining Diversity
Defining DiversityDefining Diversity
Defining Diversity
 
Render Project introduction and overview
Render Project introduction and overviewRender Project introduction and overview
Render Project introduction and overview
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Mining Diverse Views from Related Articles

  • 2. • WWW - diverse content - 100+ articles on major topics • Google News/Amazon - organized - (yet) too much text 2
  • 3. • Condenses information  salient points  length α (1/content)  user-specified parameters • Lacks Organization × delineation of issues × model diversity × too long (?) 3
  • 4. • A view intends to represent an issue pertaining to a set of related1 articles - organized (multiple concise views) - information exploration - detailed snapshot • Example2 : review dataset (hotel) - views [positive, negative, food, facilities] - summary [unorganized] 1. articles concerning a common topic (FIFA 2010, swine flu in India etc.) 2. http://sites.google.com/site/diverseviews/comparison 4
  • 5.
  • 6.
  • 7. • Related Work • Problem • Extraction of views • Ranking • Results • Discussion 7
  • 8. • Allison et. al • idea of multiple view-points • framework • Tombros et. al • clustering of top-ranking sentences • TextTiling 1 • divide text into multi-paragraph units • unit represents a sub-topic 1. M. A. Hearst 1997 8
  • 9. Related Articles Problem [Mining Diverse Views] Ranked set of views 9
  • 11. • Datasets • sources: google news, amazon.com, tripadvisor.com • crawling + parsing [html and rss] • Data cleaning & pre-processing • stopwords, stemming and duplicates • word-frequency, TF-IDF 1 1. http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html 11
  • 12. • Main idea • We score each sentence in our dataset and extract the top-ranked ones. These sentences are used to generate views [Pruning] • We assign a Importance (I) score to each sentence • Importance Ik of a sentence Sk belonging to article dj of length r is Ik = Πr Ti,j r Ti,j = TF-IDF of wi є (Sk Λ dj) 12
  • 13. • A measure of similarity is required to extract views from sentences • Semantic similarity : likeness of meaning • Mihalcea et. al • specificity of a word can be determined by its idf • we use  word-to-word similarity &  specificity - to calculate semantic similarity 13
  • 14. • Semantic similarity between sentences Si and Sj where w represents a word in a sentence is (symmetric relation & range Є [0,1]) • Need to define maxSim(w, Sj) • Wordnet : sets of cognitive synonyms (synsets) • wup1 : based on path length between synsets 1. Z. Wu 1994 14
  • 15. • Clustering is used to extract views from the set of important sentences • Hierarchical Agglomerative Clustering (HAC) was used • upper triangle [symmetric matrix] • no restrictions on # of clusters • terminate clustering when scoring parameter converges • We treat clusters as views discussing similar content 15
  • 16. • Focus on average pair-wise similarity between the sentences in a view • Cohesion C = Σi,j Є V Si,j len(v) Si,j = sim(Ti , Tj ) V = set of sentences (Ti) in the view len(v) = # of sentences in the view 16
  • 17. • Most relevant view (MRV) • preference to views discussing similar content [greater cohesion] • top-ranked view • Outlier view (OV) • single sentence • low semantic similarity with other sentences • Cohesion = 0 [ordered by importance] 17
  • 18. HTML + Text Raw Text Related Data Cleaning Extract and Top-ranked Articles Preprocessing sentences Top-n Sentences Ranked Ranking Clustering Views (Cohesion) Engine MRV & OV Ranked Views Views 18
  • 19. • Number of top-ranking sentences (n) vs. cohesion • n which can maximize cohesion • median cohesion >= mean [outliers] • 20 <= n <= 35 • incremental clustering 1 • More top-ranking sentences need not necessarily lead to views with better cohesion 1. M. Charikar STOC 1997 19
  • 20. 20
  • 21. • IR model as an alternate to summarization  multiple diverse views  easily navigable  browse top x views  detailed (yet organized) snapshot of a ToI  clustering at sentence/phrase level* • Future work • polarity of a view • user feedback • Implicit [clicks, time-spent] • Explicit [user-ratings] * as opposed to document clustering 21