SlideShare a Scribd company logo
1 of 22
Download to read offline
1
Analytical methods at glance
Extractive Methods
Selecting set of sentences from the source text,
then arranging them to form a summary
Abstractive Methods
Using the natural language generation techniques
to write novel sentences
Methods
Luhn Edmunson
TextRank LexRank
SumBasic LSA
Methods
Sequence to Sequence
Sequence to Sequence
with attention
Pointer Generation
Network
Fast abstractive
Reinforcement
3
Extractive – Luhn Method
What is Luhn method?
Frequency of the word in the text
Relative position of the word in the
sentence
Significance factor
Bracketing
Pros
• Can highlight the important topics of the
document
Limitations
• Very few features of the text taken into
consideration,might not give accurate results
• More weight to sentences at the beginning of
the document or a paragraph
4
Extractive – Edmunson Method
What is Edmusonmethod?
Cue Method
• Cue dictionary comprises – Stigma, Bonus and Null words; Final cue weight
for each sentence will be generated
• Concordance program will provide – Frequency, Dispersion, Selection ratio
Pros
• Takes into considerationuser
defined features of the sentences
Limitations
• Designed for well formatted
document
• Performance declined for
disorganized document
Key Method
• Non-cue words based on the frequency will be selected as Key words and
positive weights will be assigned based on the frequency
• Final key weight of sentence is sum of key words’ weights
TitleMethod
• Title glossary words (non-null words of title, subtitle, and headings) will be
formed
• Positive weights will be assigned to the title glossary words and sentence
weight will be calculated
Location Method
• Based on 2 hypotheses that – Sentences following headings are more
relevant & topic sentences occur in the initial or last lines of the paragraph,
the probable relevance will be calculated
• Weights will be assigned based on the hypotheses test results
Weighted score
of 4 methods for
each sentence
5
Extractive – Latent Semantic Analysis(LSA)
What is LSA method?
Pros
• Captures salient and recurring
word combinationpatterns
• Suitablefor Big Data
Limitations
• LSA is very sensitive on a stop list
and a lemmatization process,
performance will differ a lot for
different languages.
• A sentence matrix is created by applyingTF-IDF
(Term Frequency – Inverse Document
Frequency) on document
• SingularValueDecomposition is applied,which
decomposes this into 3 different matrices
• They represent different interrelationsof words
and topics and their importance
6
Extractive – TextRank Method
What is TextRank method?
Pros
• It takes intoconsiderationboth the
number of links and their
weightage
Limitations
• Can only be used for single
document summarization
• Similar to Google’s famous PageRank
method(graph based)
• Data will be preprocessed to remove the
irrelevantdata
• Words will be vectorised and a ‘Cosine
Similarity Score’ is calculated
• ‘Similarity matrix’ will be created –
similarity b/w any two sentences
7
Extractive – LexRank Method
What is LexRank method?
Pros
• Can be used for multi document
summarization
• It takes intoconsiderationboth the
number of links and their weightage
Limitations
• Documents might have conflicting
ideas, might lead to an incorrect
summary
• Based on Google’s PageRank method
• Connectivityis based on cosine similarity
• Concept of eigenvector centralityin graph is used to set
sentence importance
8
Extractive – SumBasic, KLSum
What is SumBasic method?
Pros
• It gives the summarizer sensitivity
to context depending on what
has already been included
• Naturalway to deal with
redundancy
Limitations
• It gives informationabout word
frequency, but does not capture
all the topics accurately
• SumBasic is a way of computing sentence scores from a
multi-documentdataset
• Computes the probability distribution over the words in the
input
• Assigns weights to sentences as the average probability of
the words in them
• Pick the best scoring sentence
• Recalculatesprobabilitiesif summary length has not been
reached
What is KLsum method?
Improved SumBasic by minimizingthe ‘Kullback-Lieber’ (KL)
divergence between probabilitydistributionsof summary and
document
9
Abstractive – RNN and Seq2Seq
What is RNN?
• RNN (recurrent neural network) is a multi layer NN(neural
network) that works on principleof predicting current event
based on the recent and also the long term past
• RNN states holds information about sequentialevents.
• LSTM(Long short term memory) is a variantof RNN that
improves RNN predictiveabilityfor very long sequences
• LSTM have a selective memory, so it can ‘forget’ and
‘update’ the NN, which improves RNN state cell at every
LSTM cell
10
Abstractive – Sequence to Sequence Method
What is Sequence to Sequence method?
Pros
• Work in abstractive sense, performs similar to a
human.
• Based on machine learning
• optimization accuracycan be improved.
• Same network can trainedfor any other
language or for translation
Limitations
• Trainingtime increases as size of input data
increases(number of encoders)
• Summary is generatedby lookup from
vocabulary(limited abstraction)
• <UNK> tag is generated for names, places etc.
not in vocab
• No coverage of what has alreadybeen decoded
• Text is fed to encoder units and the
intermediateform(hidden state) is fed to
decoder.
• Stacked RNN/LSTM unitsare used for
encoding and decoding
11
Abstractive – Seq2Seq with Attention Method
What is Seq2Seq with Attention method?
Pros
• The decoder’s abilityto freely generate
words in any order
Limitations
• Represent factualinformation incorrectly
• Summary sometime repeats
• <UNK> tag is generated for names, places
etc. not in vocab
• No coverage of what has been decoded
• Attentiondistributionis used to prepare weighted sum of
encoder hiddenstates, known as context vector
• The context vector can be regarded as “what has been
read from the source text” on this step of the decode
• Context vector and decoder hidden state used to generate
vocabularydistribution
12
Abstractive – Pointer-generator Network Method
What is Pointer-generatorNetwork method?
Pros
• No repetition of words
• Out of source text words can be
generated
• Rare words (names/factual information)
represented correctly
Limitations
• Higher level abstractionis mission,
wording usually close to originaltext
• Incorrect composition of sentences.
• Hybrid network that can copy words from the source via pointing,
while retaining the ability to generate words from the fixed
vocabulary
• Generation probability is used to determine whether to copy the
word from source or to generate it from vocab
• Generation probability is used to weigh the attention distribution
and vocab distribution to generate final distribution
• Coverage is used to avoid repetition, it checks what has been
decoded so far
13
Abstractive – Reinforce-Selected Sentence Rewriting(fast_abs_rl)
What is fast_abs_rlNetworkmethod?
Pros
• 4x improvement on training speed 10-20x
improvement on inference speed
• Whole text is considered for abstractive
summarization
Limitations
• High level of abstractionstill missing as
vocab size is limited
• Fast summarization method that first select salient
sentences and then rewrites it as an abstractivesummary.
• Pipeline of Extractive-Abstractivesummarizer is used which
is optimized using Reinforcement learning
• The Convolutionencoder used is Extractive
• Seq2Seq pipelinewith ReRank used is Abstractive
• Based on decoder output extractor parameter are
optimized using POMDP
Benchmark metrics
Major3 metrics were considered: ROUGE(Lin 2004), METEOR and BLUE – String matching metrics
Brief:
• ROUGE-N measures the overlap of N-grams between the system and reference summary
• ROUGE-L is based on longest common subsequences. Takes into account sentence level similarity.
• ROUGE-S is the skip-gram variant
• METEOR score matches the unigrams between the system and reference summary with explicit care wrt to sentence ordering
Summarization method Rogue Score(ROUGE-L) METEOR Score
TextRank 0.500 -
LexRank 0.469 -
SumBasic 0.484 -
KLSum - -
LSA 0.432 -
Seq2Seq 31.2 -
Seq2Seq attention 33.8 -
Pointer generator 36.38 18.72
Fast_abs_rl 38.54 20.38
Method Source
Luhn https://ieeexplore.ieee.org/document/5392672/
Edmuson http://courses.ischool.berkeley.edu/i256/f06/papers/edmonson69.pdf
LexRank https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html
TextRank https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
LSA www.kiv.zcu.cz/~jstein/publikace/isim2004.pdf
SumBasic http://www.cis.upenn.edu/~nenkova/papers/ipm.pdf
KLSum http://www.aclweb.org/anthology/N09-1041
Reduction https://github.com/adamfabish/Reduction/blob/master/Source/reduction.py
Seqto seq https://arxiv.org/pdf/1409.3215.pdf
Seqto seqwithattention http://www.aclweb.org/anthology/N16-1012
Seqto seqwithattentionandpointer generation Get To The Point: Summarization with Pointer-Generator Networks
Fast Abstractive SummarizationwithReinforce-
SelectedSentence Rewriting
https://arxiv.org/pdf/1805.11080.pdf
References
Use Case: Planning and Procurements Influenced by Trends
Text Summarizer
The articles are passed
through the summarizer
that is trained to pick up
specific design elements
Design
The inputs are provided to
the planning team and to
help them arrive at better
decision
Fashion Articles
News and magazines
articles showing latest
fashion trends and top
selling designs
Tagging
The various design
elements are tagged
according on clothing
category
Summarizing and tagging the Current trends for any business can provide valuable insights and make the planning and
procurement process better
Examples:
• Fashion articles can augmentthe Design SuccessProbability project(illustrated below)
• Latest Auto news can support Novelis R&D and customer negotiations
• Shutdownsin factorieswhich supply raw mater can help in taking pre-emptive measuresand protectfrom increased raw materialprices
Use Case: Competition Benchmarking
Analysisof news articles can help in
businesses in consolidatingall competition
related informationand also plan for reactive
actions to competitionnews
Examples:
• Jio launching a new campaign or increasing
services to new circles
• Aditya Birla Capital could leverage the
information provided in advertisements and
news articles regarding competition products
Repositoryof all
competition data
News and
announcements
Ads on products
and sales
Industry reports
1
2
Text Summarization
Tagging
Use Case: Creation and Updation of legal Matrix
Government publishes amendment to existing laws in
gazettes which contain information about multiple
industries
Each Plant has / should have a legal matrix
which contains information about mandatory
checks, concerned authorities and last date to
complete it
Text Summarizer
Relevant text
extraction is
carried out
Text tagging
carried out
Rules comparison
and updation
General Laws captured in Gazette
Few of the major acts covered in Gazette:
1. Customs Act
2. Central Goods and Services TaxAct
3. Major PortTrustAct
4. Mines and Minerals Development
Regulation Act
5. Bureau of Indian Standards Act
(Necessary for lab equipment's)
6. Special Economic Zones Act
7. IndustrialBoilers and PressureVesselact
8. Labor law
Use Case: Increasing customer care efficiency
Current Customer Journey while calling customer care
Time Spentfilling the
informationviakeypad
The in call option is not great
for mobile as they haveto
removetheir handset from
there ears and then input
details which further irritates a
person
Speech to text
converted
Text Summarizer Auto filling of information
Cons:
1. Added complexity of Speech to text converter
2. Indian dialects and local languages, extremely
hard to train for all
Pros:
1. More interactive
2. Better Consumer Experience
3. Multiple verification can be done earlier
TextSummarizer – Business applications
SummarizeNoisy, Unstructured, Ungrammatical,
Huge Volume of Data for Pantaloons & MFL
Storylines of events : Identify and summarize
events of Idea that leads to the event of interest
Sentence compression fromnews articles & stock market
reports for Aditya Birla Capital
Summarizing Internalsales reportat various levels
for UltraTech cement
TextSummarizer – Function applications
Legal and employee document screening and
summarization for HR Dept
Capturing Customer carevoice calls of Aditya
Birla capital,
Summarization of bill, contract, order details using vendor
documents for ABMCPL
Summarizing news articles for central economic
cell consisting regulation and economic conditions
Thankyou

More Related Content

What's hot

Document Summarization
Document SummarizationDocument Summarization
Document SummarizationPratik Kumar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Information Extraction
Information ExtractionInformation Extraction
Information Extractionssbd6985
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Text Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptxText Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptxJuliesSr
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxKevinSims18
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayesDhwaj Raj
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentationMarijn van Zelst
 

What's hot (20)

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Word embedding
Word embedding Word embedding
Word embedding
 
NLP
NLPNLP
NLP
 
TEXT SUMMARIZATION
TEXT SUMMARIZATIONTEXT SUMMARIZATION
TEXT SUMMARIZATION
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Text Classification
Text ClassificationText Classification
Text Classification
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Text Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptxText Summarization Using AI Tools.pptx
Text Summarization Using AI Tools.pptx
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
NLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docxNLP Techniques for Text Summarization.docx
NLP Techniques for Text Summarization.docx
 
Nlp
NlpNlp
Nlp
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Text clustering
Text clusteringText clustering
Text clustering
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayes
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 

Similar to Text summarization

Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyRimzim Thube
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
Keyword_extraction.pptx
Keyword_extraction.pptxKeyword_extraction.pptx
Keyword_extraction.pptxBiswarupDas18
 
Association of deep learning algorithm with fuzzy logic for multi-document te...
Association of deep learning algorithm with fuzzy logic for multi-document te...Association of deep learning algorithm with fuzzy logic for multi-document te...
Association of deep learning algorithm with fuzzy logic for multi-document te...Salem-Kabbani
 
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGEUNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGEPrasadu Peddi
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognitionAditya Kumar Khare
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Nakul Sharma
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...IRJET Journal
 
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all studentstalldesalegn
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationJEE HYUN PARK
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationkokanechandrakant
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...PyData
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionTakato Hayashi
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 

Similar to Text summarization (20)

Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Keyword_extraction.pptx
Keyword_extraction.pptxKeyword_extraction.pptx
Keyword_extraction.pptx
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Association of deep learning algorithm with fuzzy logic for multi-document te...
Association of deep learning algorithm with fuzzy logic for multi-document te...Association of deep learning algorithm with fuzzy logic for multi-document te...
Association of deep learning algorithm with fuzzy logic for multi-document te...
 
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGEUNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey
 
K0936266
K0936266K0936266
K0936266
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all students
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarization
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguation
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Conversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognitionConversational transfer learning for emotion recognition
Conversational transfer learning for emotion recognition
 
Abstractive Review Summarization
Abstractive Review SummarizationAbstractive Review Summarization
Abstractive Review Summarization
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 

Text summarization

  • 1. 1
  • 2. Analytical methods at glance Extractive Methods Selecting set of sentences from the source text, then arranging them to form a summary Abstractive Methods Using the natural language generation techniques to write novel sentences Methods Luhn Edmunson TextRank LexRank SumBasic LSA Methods Sequence to Sequence Sequence to Sequence with attention Pointer Generation Network Fast abstractive Reinforcement
  • 3. 3 Extractive – Luhn Method What is Luhn method? Frequency of the word in the text Relative position of the word in the sentence Significance factor Bracketing Pros • Can highlight the important topics of the document Limitations • Very few features of the text taken into consideration,might not give accurate results • More weight to sentences at the beginning of the document or a paragraph
  • 4. 4 Extractive – Edmunson Method What is Edmusonmethod? Cue Method • Cue dictionary comprises – Stigma, Bonus and Null words; Final cue weight for each sentence will be generated • Concordance program will provide – Frequency, Dispersion, Selection ratio Pros • Takes into considerationuser defined features of the sentences Limitations • Designed for well formatted document • Performance declined for disorganized document Key Method • Non-cue words based on the frequency will be selected as Key words and positive weights will be assigned based on the frequency • Final key weight of sentence is sum of key words’ weights TitleMethod • Title glossary words (non-null words of title, subtitle, and headings) will be formed • Positive weights will be assigned to the title glossary words and sentence weight will be calculated Location Method • Based on 2 hypotheses that – Sentences following headings are more relevant & topic sentences occur in the initial or last lines of the paragraph, the probable relevance will be calculated • Weights will be assigned based on the hypotheses test results Weighted score of 4 methods for each sentence
  • 5. 5 Extractive – Latent Semantic Analysis(LSA) What is LSA method? Pros • Captures salient and recurring word combinationpatterns • Suitablefor Big Data Limitations • LSA is very sensitive on a stop list and a lemmatization process, performance will differ a lot for different languages. • A sentence matrix is created by applyingTF-IDF (Term Frequency – Inverse Document Frequency) on document • SingularValueDecomposition is applied,which decomposes this into 3 different matrices • They represent different interrelationsof words and topics and their importance
  • 6. 6 Extractive – TextRank Method What is TextRank method? Pros • It takes intoconsiderationboth the number of links and their weightage Limitations • Can only be used for single document summarization • Similar to Google’s famous PageRank method(graph based) • Data will be preprocessed to remove the irrelevantdata • Words will be vectorised and a ‘Cosine Similarity Score’ is calculated • ‘Similarity matrix’ will be created – similarity b/w any two sentences
  • 7. 7 Extractive – LexRank Method What is LexRank method? Pros • Can be used for multi document summarization • It takes intoconsiderationboth the number of links and their weightage Limitations • Documents might have conflicting ideas, might lead to an incorrect summary • Based on Google’s PageRank method • Connectivityis based on cosine similarity • Concept of eigenvector centralityin graph is used to set sentence importance
  • 8. 8 Extractive – SumBasic, KLSum What is SumBasic method? Pros • It gives the summarizer sensitivity to context depending on what has already been included • Naturalway to deal with redundancy Limitations • It gives informationabout word frequency, but does not capture all the topics accurately • SumBasic is a way of computing sentence scores from a multi-documentdataset • Computes the probability distribution over the words in the input • Assigns weights to sentences as the average probability of the words in them • Pick the best scoring sentence • Recalculatesprobabilitiesif summary length has not been reached What is KLsum method? Improved SumBasic by minimizingthe ‘Kullback-Lieber’ (KL) divergence between probabilitydistributionsof summary and document
  • 9. 9 Abstractive – RNN and Seq2Seq What is RNN? • RNN (recurrent neural network) is a multi layer NN(neural network) that works on principleof predicting current event based on the recent and also the long term past • RNN states holds information about sequentialevents. • LSTM(Long short term memory) is a variantof RNN that improves RNN predictiveabilityfor very long sequences • LSTM have a selective memory, so it can ‘forget’ and ‘update’ the NN, which improves RNN state cell at every LSTM cell
  • 10. 10 Abstractive – Sequence to Sequence Method What is Sequence to Sequence method? Pros • Work in abstractive sense, performs similar to a human. • Based on machine learning • optimization accuracycan be improved. • Same network can trainedfor any other language or for translation Limitations • Trainingtime increases as size of input data increases(number of encoders) • Summary is generatedby lookup from vocabulary(limited abstraction) • <UNK> tag is generated for names, places etc. not in vocab • No coverage of what has alreadybeen decoded • Text is fed to encoder units and the intermediateform(hidden state) is fed to decoder. • Stacked RNN/LSTM unitsare used for encoding and decoding
  • 11. 11 Abstractive – Seq2Seq with Attention Method What is Seq2Seq with Attention method? Pros • The decoder’s abilityto freely generate words in any order Limitations • Represent factualinformation incorrectly • Summary sometime repeats • <UNK> tag is generated for names, places etc. not in vocab • No coverage of what has been decoded • Attentiondistributionis used to prepare weighted sum of encoder hiddenstates, known as context vector • The context vector can be regarded as “what has been read from the source text” on this step of the decode • Context vector and decoder hidden state used to generate vocabularydistribution
  • 12. 12 Abstractive – Pointer-generator Network Method What is Pointer-generatorNetwork method? Pros • No repetition of words • Out of source text words can be generated • Rare words (names/factual information) represented correctly Limitations • Higher level abstractionis mission, wording usually close to originaltext • Incorrect composition of sentences. • Hybrid network that can copy words from the source via pointing, while retaining the ability to generate words from the fixed vocabulary • Generation probability is used to determine whether to copy the word from source or to generate it from vocab • Generation probability is used to weigh the attention distribution and vocab distribution to generate final distribution • Coverage is used to avoid repetition, it checks what has been decoded so far
  • 13. 13 Abstractive – Reinforce-Selected Sentence Rewriting(fast_abs_rl) What is fast_abs_rlNetworkmethod? Pros • 4x improvement on training speed 10-20x improvement on inference speed • Whole text is considered for abstractive summarization Limitations • High level of abstractionstill missing as vocab size is limited • Fast summarization method that first select salient sentences and then rewrites it as an abstractivesummary. • Pipeline of Extractive-Abstractivesummarizer is used which is optimized using Reinforcement learning • The Convolutionencoder used is Extractive • Seq2Seq pipelinewith ReRank used is Abstractive • Based on decoder output extractor parameter are optimized using POMDP
  • 14. Benchmark metrics Major3 metrics were considered: ROUGE(Lin 2004), METEOR and BLUE – String matching metrics Brief: • ROUGE-N measures the overlap of N-grams between the system and reference summary • ROUGE-L is based on longest common subsequences. Takes into account sentence level similarity. • ROUGE-S is the skip-gram variant • METEOR score matches the unigrams between the system and reference summary with explicit care wrt to sentence ordering Summarization method Rogue Score(ROUGE-L) METEOR Score TextRank 0.500 - LexRank 0.469 - SumBasic 0.484 - KLSum - - LSA 0.432 - Seq2Seq 31.2 - Seq2Seq attention 33.8 - Pointer generator 36.38 18.72 Fast_abs_rl 38.54 20.38
  • 15. Method Source Luhn https://ieeexplore.ieee.org/document/5392672/ Edmuson http://courses.ischool.berkeley.edu/i256/f06/papers/edmonson69.pdf LexRank https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html TextRank https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf LSA www.kiv.zcu.cz/~jstein/publikace/isim2004.pdf SumBasic http://www.cis.upenn.edu/~nenkova/papers/ipm.pdf KLSum http://www.aclweb.org/anthology/N09-1041 Reduction https://github.com/adamfabish/Reduction/blob/master/Source/reduction.py Seqto seq https://arxiv.org/pdf/1409.3215.pdf Seqto seqwithattention http://www.aclweb.org/anthology/N16-1012 Seqto seqwithattentionandpointer generation Get To The Point: Summarization with Pointer-Generator Networks Fast Abstractive SummarizationwithReinforce- SelectedSentence Rewriting https://arxiv.org/pdf/1805.11080.pdf References
  • 16. Use Case: Planning and Procurements Influenced by Trends Text Summarizer The articles are passed through the summarizer that is trained to pick up specific design elements Design The inputs are provided to the planning team and to help them arrive at better decision Fashion Articles News and magazines articles showing latest fashion trends and top selling designs Tagging The various design elements are tagged according on clothing category Summarizing and tagging the Current trends for any business can provide valuable insights and make the planning and procurement process better Examples: • Fashion articles can augmentthe Design SuccessProbability project(illustrated below) • Latest Auto news can support Novelis R&D and customer negotiations • Shutdownsin factorieswhich supply raw mater can help in taking pre-emptive measuresand protectfrom increased raw materialprices
  • 17. Use Case: Competition Benchmarking Analysisof news articles can help in businesses in consolidatingall competition related informationand also plan for reactive actions to competitionnews Examples: • Jio launching a new campaign or increasing services to new circles • Aditya Birla Capital could leverage the information provided in advertisements and news articles regarding competition products Repositoryof all competition data News and announcements Ads on products and sales Industry reports 1 2 Text Summarization Tagging
  • 18. Use Case: Creation and Updation of legal Matrix Government publishes amendment to existing laws in gazettes which contain information about multiple industries Each Plant has / should have a legal matrix which contains information about mandatory checks, concerned authorities and last date to complete it Text Summarizer Relevant text extraction is carried out Text tagging carried out Rules comparison and updation General Laws captured in Gazette Few of the major acts covered in Gazette: 1. Customs Act 2. Central Goods and Services TaxAct 3. Major PortTrustAct 4. Mines and Minerals Development Regulation Act 5. Bureau of Indian Standards Act (Necessary for lab equipment's) 6. Special Economic Zones Act 7. IndustrialBoilers and PressureVesselact 8. Labor law
  • 19. Use Case: Increasing customer care efficiency Current Customer Journey while calling customer care Time Spentfilling the informationviakeypad The in call option is not great for mobile as they haveto removetheir handset from there ears and then input details which further irritates a person Speech to text converted Text Summarizer Auto filling of information Cons: 1. Added complexity of Speech to text converter 2. Indian dialects and local languages, extremely hard to train for all Pros: 1. More interactive 2. Better Consumer Experience 3. Multiple verification can be done earlier
  • 20. TextSummarizer – Business applications SummarizeNoisy, Unstructured, Ungrammatical, Huge Volume of Data for Pantaloons & MFL Storylines of events : Identify and summarize events of Idea that leads to the event of interest Sentence compression fromnews articles & stock market reports for Aditya Birla Capital Summarizing Internalsales reportat various levels for UltraTech cement
  • 21. TextSummarizer – Function applications Legal and employee document screening and summarization for HR Dept Capturing Customer carevoice calls of Aditya Birla capital, Summarization of bill, contract, order details using vendor documents for ABMCPL Summarizing news articles for central economic cell consisting regulation and economic conditions