SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Text Summarization using
Clustering and Submodular
functions.
Introduction
Automatic Summarization of multiple documents into a single summary using
various clustering algorithms and submodular functions.
We use a greedy approach to generate summary.
We calculate ROGUE score for each of the generated summaries by comparing
them with human generated summaries.
Challenges
If we have n sentences then we have a set of 2n
possible candidate summaries.
We can’t follow this exponential approach and need a better approach.
For this we use a greedy approach which has a complexity of O(n2
).
Adding a sentence may increase diversity of the summary but it may decrease
the coverage and vice-versa.
Number of clusters are not defined for clustering algorithms.
Approach
We use a greedy approach and at each iteration we add a sentence to our
candidate summary which has the best score when added to the candidate
summary after previous iteration.
We use various clustering algorithms such as k-means, chinese whispers and
hierarchical clustering.
At each iteration we use submodular functions and calculate score for
coverage(L(S)) and diversity(R(S)) function for each sentence and choose the
sentence with maximum score. We use this formula for scorning : F
(S)=L(S)+λR(S)
Approach
Basic NLP tasks such as tokenization, stemming, removal of stop words are
applied for all the methods.
We calculate score for coverage and diversity functions for a candidate
summary using clusters.
For similarity between sentences we use idf(inverse document frequency) of
each token in sentences and term frequency of tokens in respective sentences.
Using these methods we generate our summary and calculate the ROGUE
score.
Advantages of this approach
Decreases the complexity by a huge amount.
Very efficient and gives a pretty decent ROGUE score.
Helpful in other tasks such as information retrieval and web search.
Dataset and Results
Average ROUGE1 score:
K-Means: 0.3071840
Chinese Whisper: 0.306831
Hierarchical: 0.303747
References
http://melodi.ee.washington.edu/~bilmes/mypubs/lin2011-class-submod-
sum.pdf
http://www.slideshare.net/kareemhashem/text-summarization
https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf
Thanks!
Team-39
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Youtube

Weitere ähnliche Inhalte

Was ist angesagt?

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
Pratik Kumar
 

Was ist angesagt? (20)

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docx
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Text Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptxText Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptx
 
Automatic keyword extraction.pptx
Automatic keyword extraction.pptxAutomatic keyword extraction.pptx
Automatic keyword extraction.pptx
 
Word embedding
Word embedding Word embedding
Word embedding
 
Nlp
NlpNlp
Nlp
 
Sms spam classification
Sms spam classificationSms spam classification
Sms spam classification
 
Bleu vs rouge
Bleu vs rougeBleu vs rouge
Bleu vs rouge
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 

Andere mochten auch (8)

Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarization
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
Automatic Text Summarization
Automatic Text SummarizationAutomatic Text Summarization
Automatic Text Summarization
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Ire final
Ire finalIre final
Ire final
 
Introduction to Automatic Summarization
Introduction to Automatic SummarizationIntroduction to Automatic Summarization
Introduction to Automatic Summarization
 
Automatic Document Summarization
Automatic Document SummarizationAutomatic Document Summarization
Automatic Document Summarization
 
工程師必備第一工具 - Git
工程師必備第一工具 - Git工程師必備第一工具 - Git
工程師必備第一工具 - Git
 

Ähnlich wie Text Summarization

SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
Viral Gupta
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Masahiro Kaneko
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Po-Chuan Chen
 

Ähnlich wie Text Summarization (20)

Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
 
D017232729
D017232729D017232729
D017232729
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Gender Detection In Blogs [Information Retrival and Extraction]
Gender Detection In Blogs [Information Retrival and Extraction]Gender Detection In Blogs [Information Retrival and Extraction]
Gender Detection In Blogs [Information Retrival and Extraction]
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
 
columbia-gwu
columbia-gwucolumbia-gwu
columbia-gwu
 
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATIONAPPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
APPROACH FOR THICKENING SENTENCE SCORE FOR AUTOMATIC TEXT SUMMARIZATION
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
 

Kürzlich hochgeladen

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Kürzlich hochgeladen (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Text Summarization

  • 1. Text Summarization using Clustering and Submodular functions.
  • 2. Introduction Automatic Summarization of multiple documents into a single summary using various clustering algorithms and submodular functions. We use a greedy approach to generate summary. We calculate ROGUE score for each of the generated summaries by comparing them with human generated summaries.
  • 3. Challenges If we have n sentences then we have a set of 2n possible candidate summaries. We can’t follow this exponential approach and need a better approach. For this we use a greedy approach which has a complexity of O(n2 ). Adding a sentence may increase diversity of the summary but it may decrease the coverage and vice-versa. Number of clusters are not defined for clustering algorithms.
  • 4. Approach We use a greedy approach and at each iteration we add a sentence to our candidate summary which has the best score when added to the candidate summary after previous iteration. We use various clustering algorithms such as k-means, chinese whispers and hierarchical clustering. At each iteration we use submodular functions and calculate score for coverage(L(S)) and diversity(R(S)) function for each sentence and choose the sentence with maximum score. We use this formula for scorning : F (S)=L(S)+λR(S)
  • 5. Approach Basic NLP tasks such as tokenization, stemming, removal of stop words are applied for all the methods. We calculate score for coverage and diversity functions for a candidate summary using clusters. For similarity between sentences we use idf(inverse document frequency) of each token in sentences and term frequency of tokens in respective sentences. Using these methods we generate our summary and calculate the ROGUE score.
  • 6. Advantages of this approach Decreases the complexity by a huge amount. Very efficient and gives a pretty decent ROGUE score. Helpful in other tasks such as information retrieval and web search.
  • 7. Dataset and Results Average ROUGE1 score: K-Means: 0.3071840 Chinese Whisper: 0.306831 Hierarchical: 0.303747
  • 9. Thanks! Team-39 Aishwary Gupta (201302216) B Prabhakar (201505618) Sahil Swami (201302071) Youtube