Natural language processing involves parsing text using a lexicon, categorization of parts of speech, and grammar rules. The parsing process involves determining the syntactic tree and label bracketing that represents the grammatical structure of sentences. Evaluation measures for parsing include precision, recall, and F1-score. Ambiguities from multiple word senses, anaphora, indexicality, metonymy, and metaphor make parsing challenging.
2. Natural Language
• Natural Language means any language we
speak
• We need to process natural language (in
text, speech, etc.) so that machine can
exploit it.
• Applications: numerous!
– Watson (Jeopardy)
– MS Word
3. Parsing
• The first task for any NLP-based system is to
read (or to parse) the text
• Parsing depends on three components of a
language1. Lexicon
2. Categorization
3. Grammar Rules
4. Lexicon
stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | ..
is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | …
right | left | east | south | back | smelly | …
here | there | nearby | ahead | right | left | east | south | back | …
me | you | I | it | S=HE | Y’ALL …
John | Mary | Boston | UCB | PAJC | …
the | a | an | …
to | in | on | near | …
and | or | but | …
0|1|2|3|4|5|6|7|8|9
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
4
5. Categorization
Noun > stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | ..
Verb > is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | …
Adjective > right | left | east | south | back | smelly | …
Adverb > here | there | nearby | ahead | right | left | east | south | back | …
Pronoun > me | you | I | it | S=HE | Y’ALL …
Name > John | Mary | Boston | UCB | PAJC | …
Article > the | a | an | …
Preposition > to | in | on | near | …
Conjunction > and | or | but | …
Digit > 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
5
6. Grammar Rules
• “The large cat”
• This phrase can be parsed by an NLP-system if
it has a grammar like
Noun Phrase -> Determiner + Adjective + Noun
• If your system finds a phrase or sentence that
has a pattern not mentioned in its set of
Grammar Rules it won’t be able to parse
them.
7. Therefore...
• Parsing is the process of using grammar
rules to determine whether a sentence is
legal,
• and to obtain its Syntactic Tree
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
7
8. Syntactic Tree
‘The large cat eats the small rat’
http://www.digitalenema.com/2012_07_01_archive.html
10. Syntactic Tree
Article adjective noun
Verb
Article adjective noun
The
large
cat
eats
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
the
small
rat
10
11. Syntactic Tree
Article adjective noun
Verb
noun phrase
Article adjective noun
The
large
cat
eats
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
the
small
rat
11
12. Syntactic Tree
Noun phrase
Article adjective noun
Verb
noun phrase
Article adjective noun
The
large
cat
eats
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
the
small
rat
12
13. Syntactic Tree
Noun phrase
verb phrase
Article adjective noun
Verb
noun phrase
Article adjective noun
The
large
cat
eats
the
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
small
rat
13
14. Syntactic Tree
sentence
Noun phrase
verb phrase
Article adjective noun
Verb
noun phrase
Article adjective noun
The
large
cat
eats
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
the
small
rat
14
15. Label Bracketing
• It is a process of representing the syntactic tree in another way.
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
15
16. Do yourself: Label Bracket the tree
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
16
24. Ambiguity
• There are 2 types of ambiguity1. Lexical Ambiguity: Sentence contains an
idiom/word/term that has more than one
meaning.
Glasses means both drinking glasses and
spectacles
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
24
25. Ambiguity
2. Structural Ambiguity: Sentence has more
than one syntactic tree
I saw the boy with the telescope
Did you see the boy with a telescope? Or
Did you see the boy who was having a
telescope?
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
25
27. Ambiguity
• Which of the following examples have
lexical ambiguity and which of them carry
structural ambiguity; justify1. The painter put on another coat
2. We like flying planes
3. Visiting relatives can be tiresome
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
27
28. Ambiguity
• He wrote the note yesterday
• You mean you carried the information by a
bus?
• Connecting wires are tiring in electronics lab
• Squad helps dog bite victim
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
28
29. Word Sense
• Most of the lexical ambiguity arises from the
differences in word sense.
• Word senses vary due to several factors:
– Synonymy
– Antonymy
– Homonymy
– Polysemy and
– Heteronymy
30. Synonymy
• Synonyms are different words (or sometimes
phrases) with identical or very similar
meanings.
• Words that are synonyms are said to
be synonymous, and the state of being a
synonym is called synonymy
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
30
31. Synonymy
•
•
•
•
•
student and pupil (noun)
buy and purchase (verb)
sick and ill (adjective)
quickly and speedily (adverb)
on and upon (preposition)
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
31
32. Synonymy is a relation between senses
rather than words
• Note that synonyms are defined with respect
to certain senses of words
• pupil as the "aperture in the iris of the eye" is
not synonymous with student.
• Similarly, he expired means the same as he
died, yet my passport has expired cannot be
replaced by my passport has died.
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
32
33. Synonymy is a relation between senses
rather than words
• Consider the words big and large
• Are they synonyms?:
– How big is the plane?
– Are we travelling with a large or small plane?
• How about?:
– Mrs Benjamin became a big sister of him
– Mrs Benjamin became a large sister of him
34. Heteronymy
• heteronyms (also known as heterophones)
are words with
– identical spellings (or characters)
– but different pronunciations and meanings.
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
34
35. Antonymy
• Antonyms are words with opposite or nearly
opposite meanings.
• short and tall
• dead and alive
• increase and decrease
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
35
36. Homonymy
• A homonym is one of a group of words that
– share the same spelling but
– Have different distinct meaning
• Bank (Financial Institute) vs Bank (Sloping Land)
• Bat (A club for hitting the ball) vs Bat (Mammal)
• Homographs (Bank/Bank, Bat/Bat)
• Homophones (Right/Write, Piece/Peace)
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
36
37. Polysemy
• Homonymous words that are related with
each other
– The bank was constructed in 1971 (building
related to a financial institute)
– I draw money from the bank (financial institute)
38. Hypernymy and Hyponymy
• Superclass-subclass structure
– Car is a hypernym of Honda
– Honda is a hyponym of Car
39. Zeugma Test
• A test to see whether or not two words have
the same sense
– Which flight does serve breakfast?
– Does Lufthansa serve Philadelphia?
• Simply make a conjunction:
– Does Lufthansa serve breakfast and Philadelphia?
40. WordNet 3.0
• A hierarchically organized lexical database
• On-line thesaurus + aspects of a dictionary
• Some other languages available or under development
– (Arabic, Finnish, German, Portuguese…)
Category
Unique Strings
Noun
117,798
Verb
11,529
Adjective
22,479
Adverb
4,481
44. WordNet 3.0
• Where it is:
– http://wordnetweb.princeton.edu/perl/webwn
• Libraries
– Python: WordNet from NLTK
• http://www.nltk.org/Home
– Java:
• JWNL, extJWNL on sourceforge
45. Difficulties with Natural Language:
Anaphora
• Using pronouns to refer back to entities already
introduced in the text
– After Mary proposed to John, they found a preacher
and got married. For the honeymoon, they went to
Hawaii
– Mary saw a ring through the window and asked John
for it
– Mary threw a rock at the window and broke it
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
45
46. Difficulties with Natural Language:
Indexicality
• Indexical sentences refer to utterance
situation (place, time, etc.)
– I am over here
– Why did you do that?
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
46
47. Difficulties with Natural Language:
Metonymy
• Using one noun phrase to stand for another
– I've read Shakespeare
– Chrysler announced record profits
– The ham sandwich on Table 4 wants another
beer
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
47
48. Difficulties with Natural Language:
Metaphor
• “Non-literal" usage of words and phrases,
often systematic.
– I've tried killing the process but it won't die. Its
parent keeps it alive.
Rushdi Shams, Dept of CSE, KUET,
Bangladesh
48
49. Summary
• The components of a language
– Lexicon
– Categorization
– Grammar rules
•
•
•
•
•
Syntactic Tree
Label Bracketing
Evaluation of Parsing
Word sense
Problem of Parsing