SlideShare a Scribd company logo
1 of 12
Download to read offline
AUTOMATIC DOCUMENT SUMMARIZATION
             FINDWISE
Single document summarization




Proposed use for Findwise:
 • Meta data for indexing service
Unsupervised:
 • No need for trainingset
 • Relative domain independence
 • Relative language independence
Preprocessing




      Mandatory              Additional
      • Sentence splitting   •  Named Entity Recognition
      • Tokenization         •  Keyword extraction
      • Stemming             •  tfidf term weighting
      • PoS-tagging
Sentence extraction

      Sentence ranking
      •  Real value ranking
      •  Relevance ordering

      Sentence selection
      •  Desired summary length

      Sentence ordering
      •  Final presentation
TextRank

     Graph based
     •  Sentences as vertices
     •  Similarity as edges

     Iterative ranking
     •   PageRank
Sentence Similarity

     What makes two sentences similar?




      Explored variations
      •  Shared words
      •  Word importance
      •  Lexical filtering
      •  Length normalization
      •  Advanced analysis
K-means clustering




     Approach:
      • Sentences as points
      • Divide into clusters
      • Select sentences from each cluster
      • Diverse summaries
Domain customization




      Domain: short news articles in English
      • Sentence position important
      • Use domain knowledge to improve performance
      • Other boosting for other domains
Multi document summarization




      Sentence Ranking       Sentence selection
      • TextRank             • Similarity threshold
      • K-Means clustering
Sentence Ordering




     Paragraph selection      Paragraph merging
      • Topical closeness     • Date of publication
      • Sentence Similarity   • Original position
Results single document

                          Algorithm          ROUGE
                                             Ngram(1,1)
                          TextRank           0.4797

                          K-means            0.4680

                          One-class SVM      0.4343

                          TextRank           0.4708
                          Original
                          K-means Original   0.4791

                          Baseline 1         0.4649

                          Baseline 2         0.3998
Results multi document



                         Algorithm    ROUGE
                                      Ngram(1,1)
                         TextRank     0.2537

                         K-means      0.2400

                         MetaRank     0.2561

                         Baseline 1   0.2317

                         Baseline 2   0.2054

More Related Content

Viewers also liked

Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 
Textrank algorithm
Textrank algorithmTextrank algorithm
Textrank algorithmAndrew Koo
 
Document Summarization
Document SummarizationDocument Summarization
Document SummarizationPratik Kumar
 
Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vecKai Sasaki
 
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarizationConstantin Orasan
 
Automatic Text Summarization
Automatic Text SummarizationAutomatic Text Summarization
Automatic Text SummarizationHimanshuPu
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithmAndrew Koo
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
 
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content RecommendationSE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content RecommendationIvana Bosnic
 
Clusterrank
ClusterrankClusterrank
Clusterranknikgarg
 
Indianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural SectorIndianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural Sectorwittylama
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaMaxim Grinev
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesAutomated Insights
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character RecognitionKamakhya Gupta
 

Viewers also liked (20)

Text summarization
Text summarizationText summarization
Text summarization
 
Textrank algorithm
Textrank algorithmTextrank algorithm
Textrank algorithm
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vec
 
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarization
 
Automatic Text Summarization
Automatic Text SummarizationAutomatic Text Summarization
Automatic Text Summarization
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithm
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content RecommendationSE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
 
Clusterrank
ClusterrankClusterrank
Clusterrank
 
Deposition Summary
Deposition SummaryDeposition Summary
Deposition Summary
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Indianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural SectorIndianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural Sector
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by Wikipedia
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization Opportunities
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character Recognition
 
Deviant Sex PPT
Deviant Sex PPTDeviant Sex PPT
Deviant Sex PPT
 

More from Findwise

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017Findwise
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017Findwise
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017Findwise
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM WatsonFindwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindwise
 
Findability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindwise
 
Findability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindwise
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindwise
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findwise
 
Findability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindwise
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Findwise
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findwise
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindwise
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findwise
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...Findwise
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findwise
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...Findwise
 

More from Findwise (20)

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM Watson
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Findability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaboration
 
Findability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindability Day 2016 - SKF case study
Findability Day 2016 - SKF case study
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experience
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?
 
Findability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPR
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Automatic Document Summarization

  • 2. Single document summarization Proposed use for Findwise: • Meta data for indexing service Unsupervised: • No need for trainingset • Relative domain independence • Relative language independence
  • 3. Preprocessing Mandatory Additional • Sentence splitting • Named Entity Recognition • Tokenization • Keyword extraction • Stemming • tfidf term weighting • PoS-tagging
  • 4. Sentence extraction Sentence ranking • Real value ranking • Relevance ordering Sentence selection • Desired summary length Sentence ordering • Final presentation
  • 5. TextRank Graph based • Sentences as vertices • Similarity as edges Iterative ranking • PageRank
  • 6. Sentence Similarity What makes two sentences similar? Explored variations • Shared words • Word importance • Lexical filtering • Length normalization • Advanced analysis
  • 7. K-means clustering Approach: • Sentences as points • Divide into clusters • Select sentences from each cluster • Diverse summaries
  • 8. Domain customization Domain: short news articles in English • Sentence position important • Use domain knowledge to improve performance • Other boosting for other domains
  • 9. Multi document summarization Sentence Ranking Sentence selection • TextRank • Similarity threshold • K-Means clustering
  • 10. Sentence Ordering Paragraph selection Paragraph merging • Topical closeness • Date of publication • Sentence Similarity • Original position
  • 11. Results single document Algorithm ROUGE Ngram(1,1) TextRank 0.4797 K-means 0.4680 One-class SVM 0.4343 TextRank 0.4708 Original K-means Original 0.4791 Baseline 1 0.4649 Baseline 2 0.3998
  • 12. Results multi document Algorithm ROUGE Ngram(1,1) TextRank 0.2537 K-means 0.2400 MetaRank 0.2561 Baseline 1 0.2317 Baseline 2 0.2054