SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Natural Language Processing
(NLP)
Introduction
!1
Presented by,
Venkatesh Murugadas
Venkatesh Murugadas
Image source : www.google.com
!2Venkatesh Murugadas
Problem of Natural Language
“Human language is highly ambiguous … It is also ever changing
and evolving. People are great at producing language and
understanding language, and are capable of expressing,
perceiving, and interpreting very elaborate and nuanced
meanings. At the same time, while we humans are great users of
language, we are also very poor at formally understanding and
describing the rules that govern language.”
- Page 1, Neural Network Methods in Natural Language Processing, 2017.
Source : https://machinelearningmastery.com/natural-language-processing/
3Venkatesh Murugadas
“It is hard from the standpoint of the child, who must spend many
years acquiring a language … it is hard for the adult language
learner, it is hard for the scientist who attempts to model the
relevant phenomena, and it is hard for the engineer who attempts
to build systems that deal with natural language input or output.
These tasks are so hard that Turing could rightly make fluent
conversation in natural language the centrepiece of his test for
intelligence.”
- Page 248, Mathematical Linguistics, 2010.
Source : https://machinelearningmastery.com/natural-language-processing/
4Venkatesh Murugadas
Computer Linguistics
Linguistics is the scientific study of language, including its grammar,
semantics, and phonetics.
Computational linguistics is the modern study of linguistics using the
tools of computer science. Yesterday’s linguistics may be today’s
computational linguist as the use of computational tools and thinking
has overtaken most fields of study.
Source : https://machinelearningmastery.com/natural-language-
processing/
5Venkatesh Murugadas
Statistical NLP
Statistical NLP aims to do statistical inference for the field
of natural language. Statistical inference in general
consists of taking some data (generated in accordance
with some unknown probability distribution) and then
making some inference about this distribution.
— Page 191, Foundations of Statistical Natural Language Processing, 1999.
Source : https://machinelearningmastery.com/natural-language-
processing/
6Venkatesh Murugadas
Natural language processing (NLP) is a collective term
referring to automatic computational processing of
human languages. This includes both algorithms that
take human-produced text as input, and algorithms
that produce natural looking text as outputs.
— Page xvii, Neural Network Methods in Natural Language Processing, 2017.
Source : https://machinelearningmastery.com/natural-language-processing/
Natural language processing is a subfield of computer science, information engineering, and
artificial intelligence concerned with the interactions between computers and human languages, in
particular how to program computers to process and analyze large amounts of natural language
!7Venkatesh Murugadas
We will take Natural Language Processing — or NLP for short
–in a wide sense to cover any kind of computer manipulation
of natural language. At one extreme, it could be as simple as
counting word frequencies to compare different writing
styles. At the other extreme, NLP involves “understanding”
complete human utterances, at least to the extent of being
able to give useful responses to them.
— Page ix, Natural Language Processing with Python, 2009.
Source : https://machinelearningmastery.com/natural-language-
processing/
8Venkatesh Murugadas
Areas of NLP
• Natural Language Understanding
• Natural Language Search
• Natural Language Generation
• Natural Language Interface
Venkatesh Murugadas
Applications of NLP
1. Text classification and Categorisation
2. Named Entity Recognition
3. Conversational AI
4. Paraphrase detection
5. Language generation and Multi-document Summarisation
6. Machine Translation
7. Speech recognition
8. Spell Checking
10Venkatesh Murugadas
Corpus
• “A corpus is a large body of natural language text used for
accumulating statistics on natural language text. The plural
is corpora. Corpora often include extra information such as
a tag for each word indicating its part-of-speech, and
perhaps the parse tree for each sentence.”
Source : https://www.quora.com/In-NLP-what-is-the-difference-between-a-Lexicon-and-a-Corpus
11Venkatesh Murugadas
NLP Pipeline
• Word Tokenisation
• Sentence Segmentation
• Parts of Speech Tagging
• Dependency Parsing
• Named Entity Recognition
• Relation Extraction
12Venkatesh Murugadas
Word Tokenisation
Token :
“A token is an instance of a sequence of characters in some particular document that are grouped
together as a useful semantic unit for processing.” Eg. To sleep perhaps to dream.
Type:
“A type is the class of all tokens containing the same character sequence.”
Term :
“A term is a (perhaps normalized) type that is included in the dictionary.”
Text Normalisation :
“Token normalization is the process of creating tokens, so that matches occur despite superficial
differences in the character sequences of the tokens”
Source: nlp.anirbansaha.com
Tokenization is an identification of
basic units to be processed. 

Tokenizer must often be customised
to the data in question.
13Venkatesh Murugadas
How is tokenisation done?
• NLTK ( Natural Language Tool Kit) and SpaCy language models use Regular
Expressions (Regex) to create tokens from the running sequence of texts.
• NLTK - Penn Treebank Tokenizer , Word Punct Tokenizer , Tweet Tokenizer ,
MWETokenizer (Multi word Expression Tokenizer)
• This is language dependent.
• Languages in which white spaces are not present, such as Chinese, Japanese and
Korean they use the technique called Word Segmentation.
!14Venkatesh Murugadas
• Problems
• Hyphenated words - co-operative, self-esteem
• URL’s - “https://www.google.com/"
• Phone numbers - (541) 754-3010
• Compound nouns (Names , Places) - New York
15Venkatesh Murugadas
Sentence Segmentation
• It is splitting the running text by detecting the sentence boundary.
• Sentence Boundary Detection.
• NLTK uses the class Punkt Sentence Tokenizer. This is the most widely used
sentence tokenizer.
16Venkatesh Murugadas
Punkt Architecture
Source: Unsupervised Multilingual Sentence Boundary Detection ( Tibor Kiss, Jan Strunk)
Type based Classification (Initials,
Ordinal numbers, Texts)

1. Strong Collocation 

2. Internal Periods

3. Penalty 

Token based Classification 

1. Orthographic Heuristics - Word
shape 

2. The Collocation Heuristics

3. Frequent sentence Starter
Heuristic
17Venkatesh Murugadas
Problem
Ordinal numbers
!18Venkatesh Murugadas
Parts of Speech Tagging
• Part-of-Speech tagging in itself may not be the solution to any particular NLP
problem. It is however something that is done as a pre-requisite to simplify a lot of
different problems
• 8 Parts of Speech in English
• There are open classes and closed classes.
• Open class - Noun, Verb, Adverb, Adjective
• There are languages in which there is no classification of Parts of Speech, such as
Riau Indonesian. Korean language do not have Adjectives.
NOUN.

PRONOUN.

VERB.

ADJECTIVE.

ADVERB.

PREPOSITION.

CONJUNCTION

INTERJECTION.
19Venkatesh Murugadas
• There are 8 to 45 POS tags present.
• The mainly used tagged corpora :
• Brown corpus - with a million word
• Wall Street Journal corpus - with a
million word
• Switchboard : telephone speech
corpus - with 2 million words
Source: Speech and Language Processing, Daniel Jurafsky and James H. Martin
20Venkatesh Murugadas
Source: Speech and Language Processing, Daniel Jurafsky and James H. Martin
Parts of Speech Tagging algorithm
Generative Hidden Markov Model
21Venkatesh Murugadas
Parts of Speech Tagging algorithm
Discriminative Maximum Entropy Markov Model
Source: Speech and Language Processing, Daniel Jurafsky and James H. Martin
Discriminative model to incorporate a
lot of features based on which the
classification will be better. 

There is a feature template.
22Venkatesh Murugadas
• The modern POS tagging algorithms
use Bidirectional methods.
• The Stanford core NLP uses a log-
linear Parts of Speech Tagger.
• Based on the paper : https://
nlp.stanford.edu/~manning/papers/
tagging.pdf
23Venkatesh Murugadas
Dependency Parser
• Dependency syntax postulates that syntactic structure consists
of lexical items linked by binary asymmetric relations (“arrows”)
called dependencies.
• The arrows are commonly typed with name of grammatical
relations.
• So dependencies form a tree (connected, acyclic, single-head)
24Venkatesh Murugadas
• Shallow parsing (also chunking, "light
parsing") is an analysis of a sentence which
first identifies constituent parts of
sentences (nouns, verbs, adjectives, etc.)
and then links them to higher order units
that have discrete grammatical meanings
(noun groups or phrases, verb groups, etc.).
25Venkatesh Murugadas
• The Stanford NLP core is
based on the paper : https://
nlp.stanford.edu/~sebschu/
pubs/schuster-manning-
lrec2016.pdf
26Venkatesh Murugadas
Information Extraction Architecture
Source: https://www.nltk.org/book/ch07.html
27Venkatesh Murugadas
Named Entity Recognition
Named-entity recognition (NER) (also known as entity identification, entity
chunking and entity extraction) is a subtask of information extraction that
seeks to locate and classify named entity mentions in unstructured text into
pre-defined categories such as the person names, organisations, locations,
medical codes, time expressions, quantities, monetary values, percentages, etc.
Source: https://en.wikipedia.org/wiki/Named-entity_recognition
28Venkatesh Murugadas
Noun Phrase Chunking
• This is a basic technique used for entity detection.
• Each of these larger boxes is called a chunk.
• This is done with the help of Regular Expression. (RegEx)
Source: https://www.nltk.org/book/ch07.html
29Venkatesh Murugadas
Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which
first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and
then links them to higher order units that have discrete grammatical meanings
(noun groups or phrases, verb groups, etc.).
Image Source: https://www.nltk.org/book/ch07.html
Source: https://en.wikipedia.org/wiki/Shallow_parsing
30Venkatesh Murugadas
• Noun chunking using
Regular Expression
31Venkatesh Murugadas
• SpaCy based Named Entity Recognition
• It is trained on the dataset OntoNotes5.
• There are 7 pre-defined categories of
Entities.
32
Applications of NER

1. NLU 

2. NLS
Venkatesh Murugadas
Discussion !
Venkatesh Murugadas !33
For further queries, contact me at
edu.venkateshdas@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processingSanzid Kawsar
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 

Was ist angesagt? (20)

Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 

Ähnlich wie Introduction to Natural Language Processing (NLP)

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challengesantonellarose
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESLinda Garcia
 
Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)IT Industry
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Natural language processing
Natural language processingNatural language processing
Natural language processingSaurav Aryal
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 

Ähnlich wie Introduction to Natural Language Processing (NLP) (20)

REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
NLP todo
NLP todoNLP todo
NLP todo
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
NLPinAAC
NLPinAACNLPinAAC
NLPinAAC
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 

Kürzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 

Kürzlich hochgeladen (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 

Introduction to Natural Language Processing (NLP)

  • 1. Natural Language Processing (NLP) Introduction !1 Presented by, Venkatesh Murugadas Venkatesh Murugadas
  • 2. Image source : www.google.com !2Venkatesh Murugadas
  • 3. Problem of Natural Language “Human language is highly ambiguous … It is also ever changing and evolving. People are great at producing language and understanding language, and are capable of expressing, perceiving, and interpreting very elaborate and nuanced meanings. At the same time, while we humans are great users of language, we are also very poor at formally understanding and describing the rules that govern language.” - Page 1, Neural Network Methods in Natural Language Processing, 2017. Source : https://machinelearningmastery.com/natural-language-processing/ 3Venkatesh Murugadas
  • 4. “It is hard from the standpoint of the child, who must spend many years acquiring a language … it is hard for the adult language learner, it is hard for the scientist who attempts to model the relevant phenomena, and it is hard for the engineer who attempts to build systems that deal with natural language input or output. These tasks are so hard that Turing could rightly make fluent conversation in natural language the centrepiece of his test for intelligence.” - Page 248, Mathematical Linguistics, 2010. Source : https://machinelearningmastery.com/natural-language-processing/ 4Venkatesh Murugadas
  • 5. Computer Linguistics Linguistics is the scientific study of language, including its grammar, semantics, and phonetics. Computational linguistics is the modern study of linguistics using the tools of computer science. Yesterday’s linguistics may be today’s computational linguist as the use of computational tools and thinking has overtaken most fields of study. Source : https://machinelearningmastery.com/natural-language- processing/ 5Venkatesh Murugadas
  • 6. Statistical NLP Statistical NLP aims to do statistical inference for the field of natural language. Statistical inference in general consists of taking some data (generated in accordance with some unknown probability distribution) and then making some inference about this distribution. — Page 191, Foundations of Statistical Natural Language Processing, 1999. Source : https://machinelearningmastery.com/natural-language- processing/ 6Venkatesh Murugadas
  • 7. Natural language processing (NLP) is a collective term referring to automatic computational processing of human languages. This includes both algorithms that take human-produced text as input, and algorithms that produce natural looking text as outputs. — Page xvii, Neural Network Methods in Natural Language Processing, 2017. Source : https://machinelearningmastery.com/natural-language-processing/ Natural language processing is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language !7Venkatesh Murugadas
  • 8. We will take Natural Language Processing — or NLP for short –in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves “understanding” complete human utterances, at least to the extent of being able to give useful responses to them. — Page ix, Natural Language Processing with Python, 2009. Source : https://machinelearningmastery.com/natural-language- processing/ 8Venkatesh Murugadas
  • 9. Areas of NLP • Natural Language Understanding • Natural Language Search • Natural Language Generation • Natural Language Interface Venkatesh Murugadas
  • 10. Applications of NLP 1. Text classification and Categorisation 2. Named Entity Recognition 3. Conversational AI 4. Paraphrase detection 5. Language generation and Multi-document Summarisation 6. Machine Translation 7. Speech recognition 8. Spell Checking 10Venkatesh Murugadas
  • 11. Corpus • “A corpus is a large body of natural language text used for accumulating statistics on natural language text. The plural is corpora. Corpora often include extra information such as a tag for each word indicating its part-of-speech, and perhaps the parse tree for each sentence.” Source : https://www.quora.com/In-NLP-what-is-the-difference-between-a-Lexicon-and-a-Corpus 11Venkatesh Murugadas
  • 12. NLP Pipeline • Word Tokenisation • Sentence Segmentation • Parts of Speech Tagging • Dependency Parsing • Named Entity Recognition • Relation Extraction 12Venkatesh Murugadas
  • 13. Word Tokenisation Token : “A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing.” Eg. To sleep perhaps to dream. Type: “A type is the class of all tokens containing the same character sequence.” Term : “A term is a (perhaps normalized) type that is included in the dictionary.” Text Normalisation : “Token normalization is the process of creating tokens, so that matches occur despite superficial differences in the character sequences of the tokens” Source: nlp.anirbansaha.com Tokenization is an identification of basic units to be processed. Tokenizer must often be customised to the data in question. 13Venkatesh Murugadas
  • 14. How is tokenisation done? • NLTK ( Natural Language Tool Kit) and SpaCy language models use Regular Expressions (Regex) to create tokens from the running sequence of texts. • NLTK - Penn Treebank Tokenizer , Word Punct Tokenizer , Tweet Tokenizer , MWETokenizer (Multi word Expression Tokenizer) • This is language dependent. • Languages in which white spaces are not present, such as Chinese, Japanese and Korean they use the technique called Word Segmentation. !14Venkatesh Murugadas
  • 15. • Problems • Hyphenated words - co-operative, self-esteem • URL’s - “https://www.google.com/" • Phone numbers - (541) 754-3010 • Compound nouns (Names , Places) - New York 15Venkatesh Murugadas
  • 16. Sentence Segmentation • It is splitting the running text by detecting the sentence boundary. • Sentence Boundary Detection. • NLTK uses the class Punkt Sentence Tokenizer. This is the most widely used sentence tokenizer. 16Venkatesh Murugadas
  • 17. Punkt Architecture Source: Unsupervised Multilingual Sentence Boundary Detection ( Tibor Kiss, Jan Strunk) Type based Classification (Initials, Ordinal numbers, Texts) 1. Strong Collocation 2. Internal Periods 3. Penalty Token based Classification 1. Orthographic Heuristics - Word shape 2. The Collocation Heuristics 3. Frequent sentence Starter Heuristic 17Venkatesh Murugadas
  • 19. Parts of Speech Tagging • Part-of-Speech tagging in itself may not be the solution to any particular NLP problem. It is however something that is done as a pre-requisite to simplify a lot of different problems • 8 Parts of Speech in English • There are open classes and closed classes. • Open class - Noun, Verb, Adverb, Adjective • There are languages in which there is no classification of Parts of Speech, such as Riau Indonesian. Korean language do not have Adjectives. NOUN. PRONOUN. VERB. ADJECTIVE. ADVERB. PREPOSITION. CONJUNCTION INTERJECTION. 19Venkatesh Murugadas
  • 20. • There are 8 to 45 POS tags present. • The mainly used tagged corpora : • Brown corpus - with a million word • Wall Street Journal corpus - with a million word • Switchboard : telephone speech corpus - with 2 million words Source: Speech and Language Processing, Daniel Jurafsky and James H. Martin 20Venkatesh Murugadas
  • 21. Source: Speech and Language Processing, Daniel Jurafsky and James H. Martin Parts of Speech Tagging algorithm Generative Hidden Markov Model 21Venkatesh Murugadas
  • 22. Parts of Speech Tagging algorithm Discriminative Maximum Entropy Markov Model Source: Speech and Language Processing, Daniel Jurafsky and James H. Martin Discriminative model to incorporate a lot of features based on which the classification will be better. There is a feature template. 22Venkatesh Murugadas
  • 23. • The modern POS tagging algorithms use Bidirectional methods. • The Stanford core NLP uses a log- linear Parts of Speech Tagger. • Based on the paper : https:// nlp.stanford.edu/~manning/papers/ tagging.pdf 23Venkatesh Murugadas
  • 24. Dependency Parser • Dependency syntax postulates that syntactic structure consists of lexical items linked by binary asymmetric relations (“arrows”) called dependencies. • The arrows are commonly typed with name of grammatical relations. • So dependencies form a tree (connected, acyclic, single-head) 24Venkatesh Murugadas
  • 25. • Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and then links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.). 25Venkatesh Murugadas
  • 26. • The Stanford NLP core is based on the paper : https:// nlp.stanford.edu/~sebschu/ pubs/schuster-manning- lrec2016.pdf 26Venkatesh Murugadas
  • 27. Information Extraction Architecture Source: https://www.nltk.org/book/ch07.html 27Venkatesh Murugadas
  • 28. Named Entity Recognition Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organisations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Source: https://en.wikipedia.org/wiki/Named-entity_recognition 28Venkatesh Murugadas
  • 29. Noun Phrase Chunking • This is a basic technique used for entity detection. • Each of these larger boxes is called a chunk. • This is done with the help of Regular Expression. (RegEx) Source: https://www.nltk.org/book/ch07.html 29Venkatesh Murugadas
  • 30. Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and then links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.). Image Source: https://www.nltk.org/book/ch07.html Source: https://en.wikipedia.org/wiki/Shallow_parsing 30Venkatesh Murugadas
  • 31. • Noun chunking using Regular Expression 31Venkatesh Murugadas
  • 32. • SpaCy based Named Entity Recognition • It is trained on the dataset OntoNotes5. • There are 7 pre-defined categories of Entities. 32 Applications of NER 1. NLU 2. NLS Venkatesh Murugadas
  • 33. Discussion ! Venkatesh Murugadas !33 For further queries, contact me at edu.venkateshdas@gmail.com