Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Text summarization
1. Text Summarization Using NLP
Presented by,
Akash N. Karwande
(2016MNS011)
Guided by
Prof. R.K. Chavan
2. Introduction
The goal of summarization is to produce a shorter version of a source text by
preserving the meaning and the key contents of the original document.
A well written summary can significantly reduce the amount of work
needed to digest large amounts of text.
3. Types of Text summarization
There are two types summaries
1. Extractive summaries
2. Abstractive summaries
4. Extractive summaries
Extractive summaries are created by reusing portions (words,
sentences, etc.) of the input text document
The system extracts text from the entire collection, without modifying
the text document.
Most of the summarization research today is on extractive
summarization.
5. Abstractive summaries
Requires deep understanding and reasoning over the text
It Provides own summary over input text without using same word or
sentence in the input text
Determines the actual and short meaning of each element, such as
words,sentences and paragraphs
6. Natural Language Toolkit
leading platform for building Python programs to work with human
language data
NLP is a field of computer science, artificial intelligence (also called
machine learning), and linguistics processing
Interactions between computers and human (natural) languages
It provides suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning
7. Continued…
Following are the NLTK Libraries used in text summarization
Word tokenizer
Sentence tokenizer
stopwords
BeautifulSoup
numpy library
Tagging
Parsing
8. Linguistic Preprocessing for Automatic Summarization
Fig.Pipeline architecture of an information extraction process
9. Sentence Segmentation
Converts raw text into sentences
List of strings
Sentence tokenizer
Input Text: John owns a car. It is a Toyota.
Output: Segm1: John owns a car.
Segm2: It is a Toyota.
10. Tokenization
Identifies the word tokens from given sentence
Provides a list of tokens as output
Word tokenizer
Input: John owns a car.
Output: [[John], [owns], [a], [car], [.]]
11. Part of speech tagging (POS Tagging)
Assigns appropriate part of speech tag to each word
POS is useful in extraction of nouns, adverbs, adjective, which provide
some meaningful information about text
Generates a list of tuples with POS annotation
Input: [[John], [owns], [a], [car], [.]]
Output: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .)
12. Entity detection
Identification of predefined categories such as person, location,
quantities, organizations etc
NER provides the entity detection for linguistic processing
NER system uses linguistic grammar-based techniques and also
statistical model to identify the entity
Input: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .)
Output: John->Person
13. Relation detection
Identifies the possible relation between two or more chunked sentences
Co-reference chain provides a relation between two or more sentences
Provides the link between pronouns and its corresponding nouns
Replacement of the pronouns with proper nouns
Input Text: John owns a car. It is a Toyota. (In form of parse tree)
Output: "a car" -> "a Toyota"; "It" -> "a Toyota"
14. Conclusion
Automatic Text Summarization has been shown to be useful for Natural
Language Processing tasks such as Question Answering or Text
Classification and other related fields of computer science such as
Information Retrieval. And the access time for information searching will
be improved.
15. Future work
From our summarization result we have found that by reducing all
sentences that do not contain any geographic information may lead to a
loss of information, since there may exist links between that reduced
sentences. Therefore, we will analyse this issue in detail, by studying
graph based algorithms that capture the relationship between
sentences.
16. References
https://www.researchgate.net/publication/315667326 Extractive Based Automatic
Text Summarization
https://github.com/shreyans29/ The semicolon Data Analytics youtube tutorials on The
Semicolon
https://gist.github.com/shlomibabluki/5473521 summary_tool.py
https://thetokenizer.com/2013/04/28/build-your-own-summary-tool/
https://glowingpython.blogspot.in/2014/09/text-summarization-with-nltk.html
http://www.nltk.org/ NLTK 3.2.5 documentation