Natural language processing

Natural Language Processing
CC-9016
Dr. C. Saravanan, Ph.D., NIT Durgapur
dr.cs1973@gmail.com
Natural Language Processing, Dr. C. Saravanan, NIT Durgapur.

Introduction
• What is NLP ?
• Natural language processing (NLP) can be defined as the automatic (or semi-
automatic) processing of human language.
• Nowadays, alternative terms are often preferred, like `Language
Technology' or `Language Engineering'.
• Language is often used in contrast with speech (e.g., Speech and
Language Technology).

Introduction cont…
• NLP is essentially multidisciplinary: it is closely related to linguistics.
• It also has links to research in cognitive science, psychology,
philosophy and maths (especially logic).
• Within Computer Science, it relates to formal language theory,
compiler techniques, theorem proving, machine learning and human-
computer interaction.
• Of course it is also related to AI, though nowadays it's not generally
thought of as part of AI.

• The HAL 9000 computer in Stanley Kubrick’s film 2001: A Space Odyssey is
one of the most recognizable characters in twentieth-century cinema.
• HAL is an artificial agent capable of speaking and understanding English,
and even reading lips.
• HAL able to do information retrieval (finding out where needed textual
resources reside), information extraction (extracting pertinent facts from
those textual resources), and inference (drawing conclusions based on
known facts).

Space Odyssey

Other valuable areas of language processing such as
spelling correction,
grammar checking,
information retrieval, and
machine translation.
Example: Is the door open? The door is open.

Online Applications
• Dictation - Online Speech Recognition https://dictation.io/
• SpeechPad - a new voice notebook https://speechpad.pw/
• Speechlogger - https://speechlogger.appspot.com/en/
• Talktyper https://talktyper.com/
(microphone and speaker is required)

Knowledge in Language Processing
• Requires knowledge about phonetics and phonology, which can help
model how words are pronounced in colloquial speech
• they're- they + are. They're from Canada.
• their- something belongs to "them." This is their car.
• there- in that place. The park is over there.
• two- the number 2. I'll have two hamburgers, please.
• to- this has many meanings. One means "in the direction of." I'm
going to South America.
• too- also. I want to go, too.

Knowledge in Language Processing cont…
• Requires knowledge about morphology, which captures information
about the shape and behavior of words in context.
• Singular and Plural – Man Men, Boy Boys
• The knowledge needed to order and group words together comes
under the heading of syntax.

• Requires knowledge of the meanings of the component words, the
domain of lexical semantics
• and knowledge of how these components combine to form larger
meanings, compositional semantics.
• No
• No, I won’t open the door.
• I’m sorry and I’m afraid, I can’t. (indirect refusal)

• The knowledge of language needed to engage in complex language
behavior can be separated into six distinct categories.
• Phonetics and Phonology – The study of linguistic sounds.
• Morphology – The study of the meaningful components of words.
• Syntax – The study of the structural relationships between words.
• Semantics – The study of meaning.
• Pragmatics – The study of how language is used to accomplish goals.
• Discourse – The study of linguistic units larger than a single utterance.

AMBIGUITY
• Some input is ambiguous if there are multiple alternative linguistic
structures than can be built for it.
• Look at the dog with one eye.
• Look at the dog using only one of your eyes.
• Look at the dog that only has one eye.
• The word with is semantically ambiguous in the sentence.

AMBIGUITY
• Some cases, for example the word make means ‘create’ and ‘cook’
• can be solved by word sense disambiguation and
• be resolved by speech act interpretation.
• Example: I make a Cake

Models and Algorithms
• Most important elements are
• state machines,
• formal rule systems,
• logic,
• probability theory
• Well known computational algorithms
• dynamic programming
• state space search

State Machines
• state machines are formal models that consist of states, transitions
among states, and an input representation
• the variations of this basic model are deterministic and non-
deterministic finite-state automata, finite-state transducers
• which can write to an output device, weighted automata, Markov
models and hidden Markov models which have a probabilistic
component.

Formal Rule Systems
• procedural models and more important ones are
• regular grammars
• regular relations
• context-free grammars
• feature-augmented grammars
• probabilistic variants of them all.
• the algorithms that are often used such as depth-first search, best-
first, and A* search.

Logic and Probability Theory
• first order logic, also known as the predicate calculus, deals with
formalisms as feature-structures, semantic networks, and conceptual
dependency.
• Probability theory solves the many kinds of ambiguity problems
• given N choices for some ambiguous input, choose the most probable one

Language, Thought, and Understanding
• Alan Turing (1950) introduced Turing Test
• In Turing’s game, there are three participants: 2 people and a computer.
• One of the people is a contestant and plays the role of an interrogator.
• To win, the interrogator must determine which of the other two
participants is the machine by asking a series of questions via a teletype.
• The task of the machine is to fool the interrogator into believing it is a
person by responding as a person would to the interrogator’s questions.
• The task of the second human participant is to convince the interrogator
that the other participant is the machine, and that they are human.

ELIZA
• ELIZA (1966) was an early natural language processing system makes use of
pattern-matching capable of carrying on a limited form of conversation with a
user.
• User1: You are like my father in some ways.
• ELIZA1: WHAT RESEMBLANCE DO YOU SEE
• User2: You are not very aggressive but I think you don’t want me to notice that.
• ELIZA2: WHAT MAKES YOU THINK I AM NOT AGGRESSIVE
• User3: You don’t argue with me.
• ELIZA3: WHY DO YOU THINK I DON’T ARGUEWITH YOU
• User4: You are afraid of me.
• ELIZA4: DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU

HISTORY 1940’s and 1950’s
• Automata theory was contributed to by Shannon (1948), who applied
probabilistic models of discrete Markov processes to automata for
language.
• Chomsky (1956) first considered finite-state machines as a way to
characterize a grammar, and defined a finite-state language as a language
generated by a finite-state grammar.
• These early models led to the field of formal language theory, which used
algebra and set theory to define formal languages as sequences of symbols.
• This includes the context-free grammar, first defined by Chomsky (1956) for
natural languages but independently discovered by Backus (1959) and Naur
et al. (1960) in their descriptions of the ALGOL programming language.
• The sound spectrograph was developed (Koenig et al., 1946)

1957–1970
• Speech and Language Processing had split very cleanly into two
paradigms: symbolic and stochastic.
• The first was the work of Chomsky and others on formal language
theory and generative syntax throughout the late 1950’s and early to
mid 1960’s, and the work of many linguistics and computer scientists
on parsing algorithms, initially top-down and bottom-up, and then via
dynamic programming.
• One of the earliest complete parsing systems was Zelig Harris’s
Transformations and Discourse Analysis Project (TDAP), which was
implemented between June 1958 and July 1959 at the University of
Pennsylvania (Harris, 1962).

1957–1970 cont…
• The second line of research was the new field of artificial intelligence.
• In the 1956 John McCarthy, Marvin Minsky, Claude Shannon, and
Nathaniel Rochester organized a two month workshop on artificial
intelligence.
• The major focus of the new field was the work on reasoning and logic
typified by Newell and Simon’s work on the Logic Theorist and the
General Problem Solver.
• These were simple systems which worked in single domains mainly by
a combination of pattern matching and key-word search with simple
heuristics for reasoning and question-answering.

1957–1970 cont…
• Bledsoe and Browning (1959) built a Bayesian system for text-recognition
(optical character recognition) that used a large dictionary and computed
the likelihood of each observed letter sequence given each word in the
dictionary by multiplying the likelihoods for each letter.
• Mosteller and Wallace (1964) applied Bayesian methods to the problem of
authorship attribution on The Federalist papers.
• The Brown corpus of American English, a 1 million word collection of
samples from 500 written texts from different genres (newspaper, novels,
non-fiction, academic, etc.), which was assembled in 1963-64 (Kucera and
Francis, 1967; Francis, 1979; Francis and Kucera, 1982), and William S. Y.
Wang’s 1967 DOC (Dictionary on Computer), an on-line Chinese dialect
dictionary.

1970–1983
• Use of the Hidden Markov Model and the metaphors of the noisy
channel and decoding, developed independently by Jelinek, Bahl,
Mercer, and colleagues at IBM’s Thomas J. Watson Research Center,
and Baker at Carnegie Mellon University.
• The logic-based paradigm was begun by the work of Colmerauer and
his colleagues on Q-systems and metamorphosis grammars
(Colmerauer, 1970, 1975), the forerunners of Prolog and Definite
Clause Grammars (Pereira and Warren, 1980).
• Independently, Kay’s (1979) work on functional grammar, and shortly
later, (1982)’s (1982) work on LFG, established the importance of
feature structure unification.

1970–1983 cont…
• Terry Winograd’s SHRDLU system which simulated a robot embedded
in a world of toy blocks (Winograd, 1972a). The program was able to
accept natural language text commands (Move the red block on top of
the smaller green one) of a hitherto unseen complexity and
sophistication.
• Roger Schank and his colleagues and students built a series of
language understanding programs that focused on human conceptual
knowledge such as scripts, plans and goals, and human memory
organization (Schank and Abelson, 1977; Schank and Riesbeck, 1981;
Cullingford, 1981; Wilensky, 1983; Lehnert, 1977).

1970–1983 cont…
• The discourse modeling paradigm focused on four key areas in
discourse. Grosz and her colleagues proposed ideas of discourse
structure and discourse focus (Grosz, 1977a; Sidner, 1983a), a number
of researchers began to work on automatic reference resolution
(Hobbs, 1978a), and the BDI (Belief-Desire-Intention) framework for
logic-based work on speech acts was developed (Perrault and Allen,
1980; Cohen and Perrault, 1979).

1983-1993
• The finite-state models, which began to receive attention again after
work on finite-state phonology and morphology by (Kaplan and Kay,
1981) and finite-state models of syntax by Church (1980).
• Another trend was the rise of probabilistic models throughout
speech and language processing, influenced strongly by the work at
the IBM Thomas J. Watson Research Center on probabilistic models of
speech recognition.

1994-1999
• First, probabilistic and data-driven models had become quite standard
throughout natural language processing.
• Algorithms for parsing, part of speech tagging, reference resolution, and
discourse processing all began to incorporate probabilities, and employ
evaluation methodologies borrowed from speech recognition and
information retrieval.
• Second, the increases in the speed and memory of computers had allowed
commercial exploitation of a number of subareas of speech and language
processing, in particular speech recognition and spelling and grammar
checking.
• Finally, the rise of the Web emphasized the need for language-based
information retrieval and information extraction.

NLP applications
• spelling and grammar checking optical character recognition (OCR)
• screen readers for blind and partially sighted users / augmentative and
alternative communication
• machine aided translation lexicographers' tools
• information retrieval document classification (filtering, routing)
• document clustering information extraction
• question answering summarization
• text segmentation exam marking
• report generation (possibly multilingual) machine translation
• natural language interfaces to databases email understanding
• dialogue system

Some Linguistic Terminology
• Morphology: the structure of words. For instance, unusually can be
thought of as composed of a prefix un-, a stem usual, and an affix -ly.
composed is compose plus the inflectional affix -ed: a spelling rule
means we end up with composed rather than composeed.
• Syntax: the way words are used to form phrases. e.g., it is part of
English syntax that a determiner such as the will come before a noun,
and also that determiners are obligatory with certain singular nouns.

Some Linguistic Terminology
• Semantics. Compositional semantics is the construction of meaning
(generally expressed as logic) based on syntax.
• Pragmatics: meaning in context.

Difficulties
• How fast is the 3G ?
• How fast will my 3G arrive?
• Please tell me when I can expect the 3G I ordered.
• Has my order number 4291 been shipped yet?

Spelling and grammar checking
• All spelling checkers can ag words which aren't in a dictionary.
• * The neccesary steps are obvious.
• The necessary steps are obvious.
• If the user can expand the dictionary, or if the language has complex
productive morphology (see x2.1), then a simple list of words isn't
enough to do this and some morphological processing is needed.

• * The dog came into the room, it's tail wagging.
• The dog came into the room, its tail wagging.
• For instance, possessive its generally has to be followed by a noun or
an adjective noun combination etc.

• But, it sometimes isn't locally clear:
e.g. fair is ambiguous between a noun and an adjective.
• `It's fair', was all Kim said.
• Every village has an annual fair.

• The most elaborate spelling/grammar checkers can get some of these cases
right.
• * The tree's bows were heavy with snow.
• The tree's boughs were heavy with snow.
• Getting this right requires an association between tree and bough.
• Attempts might have been made to hand-code this in terms of general
knowledge of trees and their parts.

• Checking punctuation can be hard
• Students at Cambridge University, who come from less affluent
backgrounds, are being offered up to 1,000 a year under a bursary scheme.
• The sentence implies that most/all students at Cambridge come from less
affluent backgrounds. The following sentence makes different meaning.
• Students at Cambridge University who come from less affluent
backgrounds are being offered up to 1,000 a year under a bursary scheme.

Information retrieval
• Information retrieval involves returning a set of documents in
response to a user query: Internet search engines are a form of IR.
• However, one change from classical IR is that Internet search now
uses techniques that rank documents according to how many links
there are to them (e.g., Google's PageRank) as well as the presence of
search terms.

Information extraction
• Information extraction involves trying to discover specific information
from a set of documents.
• The information required can be described as a template.
• For instance, for company joint ventures, the template might have
slots for the companies, the dates, the products, the amount of
money involved. The slot fillers are generally strings.

Question answering
• Attempts to find a specific answer to a specific question from a set of
documents, or at least a short piece of text that contains the answer.
• What is the capital of France?
• Paris has been the French capital for many centuries.
• There are some question-answering systems on theWeb, but most
use very basic techniques.

Machine translation (MT)
• MT work started in the US in the early fifties, concentrating on
Russian to English. A prototype system was demonstrated in 1954.
• Systran was providing useful translations by the late 60s.
• Until the 80s, the utility of general purpose MT systems was severely
limited by the fact that text was not available in electronic form:
Systran used teams of skilled typists to input Russian documents.

• Systran and similar systems are not a substitute for human
translation:
• They are useful because they allow people to get an idea of what a
document is about, and maybe decide whether it is interesting
enough to get translated properly.
• Spoken language translation is viable for limited domains: research
systems include Verbmobil, SLT and CSTAR.

Natural language interfaces
• Natural language interfaces were the `classic' NLP problem in the 70s and
80s.
• LUNAR is the classic example of a natural language interface to a database
(NLID):
• Its database concerned lunar rock samples brought back from the Apollo
missions. LUNAR is described by Woods (1978).
• It was capable of translating elaborate natural language expressions into
database queries.

SHRDLU
• SHRDLU (Winograd, 1973) was a system capable of participating in a
dialogue about a micro world (the blocks world) and manipulating
this world according to commands issued in English by the user.
• SHRDLU had a big impact on the perception of NLP at the time since it
seemed to show that computers could actually `understand'
language:
• The impossibility of scaling up from the micro world was not realized.

• There have been many advances in NLP since these systems were built:
• systems have become much easier to build, and somewhat easier to use,
but they still haven't become ubiquitous.
• Natural Language interfaces to databases were commercially available in
the late 1970s, but largely died out by the 1990s:
• Porting to new databases and especially to new domains requires very
specialist skills and is essentially too expensive.

NLP application architecture
• The input to an NLP system could be speech or text.
• It could also be gesture (multimodal input or perhaps a Sign
Language).
• The output might be non-linguistic, but most systems need to give
some sort of feedback to the user, even if they are simply performing
some action.

Components
• Input Pre Processing: speech / gesture recogniser or text pre-processor
• Morphological Analysis
• Speech Tagging
• Parsing - this includes syntax and compositional semantics
• Disambiguation: this can be done as part of parsing
• Context Module: this maintains information about the context
• Text Planning: the part of language generation / what meaning to convey
• Tactical Generation: converts meaning representations to strings.
• Morphological Generation
• Output Processing: text-to-speech, text formatter, etc.

• Following are required in order to construct the standard components
and knowledge sources.
• Lexicon Acquisition
• Grammar Acquisition
• Acquisition of Statistical Information

Note
• NLP applications need complex knowledge sources.
• Applications cannot be 100% perfect. Humans aren't 100% perfect
anyway.
• Applications that aid humans are much easier to construct than
applications which replace humans.
• Improvements in NLP techniques are generally incremental rather
than revolutionary.

Morphology
• Morphology concerns the structure of words.
• Words are assumed to be made up of morphemes, which are the
minimal information carrying unit.
• Morphemes which can only occur in conjunction with other
morphemes are affixes: words are made up of a stem and zero or
more affixes.
• For instance, dog is a stem which may occur with the plural suffix +s
i.e., dogs.

• English only has suffixes (affixes which come after a stem) and
prefixes (which come before the stem).
• Few languages have infixes (affixes which occur inside the stem) and
circumfixes (affixes which go around a stem).
• Some English irregular verbs show a relic of inflection by infixation
(e.g. sing, sang, sung)

Inflectional Morphology
• Inflectional morphology concerns properties such as
• tense, aspect, number, person, gender, and case,
• although not all languages code all of these:
• English, for instance, has very little morphological marking of case
and gender.

Derivational Morphology
• Derivational affixes, such as un-, re-, anti- etc, have a broader range
of semantic possibilities and don't fit into neat paradigms.
• Antidisestablishmentarianism
• Antidisestablishmentarianismization
• Antiantimissile

Zero Derivation
• tango, waltz etc. are words
• which are basically nouns but can be used as verbs.
• An English verb will have a present tense form, a 3rd person singular
present tense form, a past participle and a passive participle.
• e.g., text as a verb - texts, texted.

• Stems and afifxes can be individually ambiguous.
• There is also potential for ambiguity in how a word form is split into
morphemes.
• For instance, unionised could be union -ise -ed
• or (in chemistry) union -ise -ed.
• Note that union is not a possible form (because un- can't attach to a
noun).

• un- that can attach to verbs, it nearly always denotes a reversal of a
process (e.g., untie)
• whereas the un- that attaches to adjectives means `not', which is the
meaning in the case of union -ise -ed.

Spelling / orthographic rules
• English morphology is essentially concatenative.
• Some words have irregular morphology and their inflectional forms
simply have to be listed.
• For instance, the suffix -s is pronounced differently when it is added
to a stem which ends in s, x or z and the spelling reflects this with the
addition of an e (boxes etc.).

• The `e-insertion' rule can be described as follows:
{ s }
• ɛ → e / { x } ^_s
{ z }
• _ indicating the position in question
• ɛ is used for the empty string and
• ^ for the affix boundary

• Other spelling rules in English include consonant doubling
• e.g., rat, ratted, though note, not *auditted)
• and y/ie conversion (party, parties).

Lexical requirements for morphological
processing
• There are three sorts of lexical information that are needed for full,
high precision morphological processing:
• affixes, plus the associated information conveyed by the affix
• irregular forms, with associated information similar to that for affixes
• stems with syntactic categories

• On approach to an affix lexicon is for it to consist of a pairing of affix
and some encoding of the syntactic/semantic effect of the affix.
• For instance, consider the following fragment of a suffix lexicon
• ed PAST_VERB
• ed PSP_VERB
• s PLURAL_NOUN

• A lexicon of irregular forms is also needed.
• began PAST_VERB begin
• begun PSP_VERB begin
• English irregular forms are the same for all senses of a word. For
instance, ran is the past of run whether we are talking about athletes,
politicians or noses.

• The morphological analyser is split into two stages.
• The first of these only concerns morpheme forms
• A second stage which is closely coupled to the syntactic analysis

Finite state automata for recognition
• The approach to spelling rules that described involves the use of finite
state transducers (FSTs).
• Suppose we want to recognize dates (just day and month pairs)
written in the format day/month. The day and the month may be
expressed as one or two digits (e.g. 11/2, 1/12 etc).
• This format corresponds to the following simple FSA, where each
character corresponds to one transition:

Prediction and part-of-speech tagging
• Corpora
• A corpus (corpora is the plural) is simply a body of text that has been
collected for some purpose.
• A balanced corpus contains texts which represent different genres
(newspapers, fiction, textbooks, parliamentary reports, cooking
recipes, scientific papers etc.):
• Early examples were the Brown corpus (US English) and the
Lancaster-Oslo-Bergen (LOB) corpus (British English) which are each
about 1 million words: the more recent British National Corpus (BNC)
contains approx. 100 million words and includes 20 million words of
spoken English.

• Implementing an email answering application:
• it is essential to collect samples of representative emails.
• For interface applications in particular, collecting a corpus requires a
simulation of the actual application:
• generally this is done by a Wizard of Oz experiment, where a human
pretends to be a computer.
• Corpora are needed in NLP for two reasons.
• To evaluate algorithms on real language
• Providing the data source for many machine-learning approaches

Prediction
• The essential idea of prediction is that,
• given a sequence of words,
• to determine what's most likely to come next.
• Used for language modelling for automatic speech recognition.
• Speech recognisers cannot accurately determine a word from the
sound signal.
• They cannot reliably tell where each word starts and finishes.
• The most probable word is chosen on the basis of the language model

• The language models which are currently most effective work on the
basis of N-grams (a type of Markov chain), where the sequence of the
prior n-1 words is used to predict the next.
• Trigram models use the preceding 2 words,
• bigram models use the preceding word and
• unigram models use no context at all, but simply work on the basis of
individual word probabilities.

• Word prediction is also useful in communication aids:
• Systems for people who can't speak because of some form of
disability.
• People who use text-to-speech systems to talk because of a non-
linguistic disability usually have some form of general motor
impairment which also restricts their ability to type at normal rates
(stroke, ALS, cerebral palsy etc).

• Often they use alternative input devices, such as adapted keyboards, puffer
switches, mouth sticks or eye trackers.
• Generally such users can only construct text at a few words a minute,
which is too slow for anything like normal communication to be possible
(normal speech is around 150 words per minute).
• As a partial aid, a word prediction system is sometimes helpful:
• This gives a list of candidate words that changes as the initial letters are
entered by the user.
• The user chooses the desired word from a menu when it appears.
• The main difficulty with using statistical prediction models in such
applications is in finding enough data: to be useful, the model really has to
be trained on an individual speaker's output.

• Prediction is important in estimation of entropy, including estimations
of the entropy of English.
• The notion of entropy is important in language modelling because it
gives a metric for the difficulty of the prediction problem.
• For instance, speech recognition is much easier in situations where
the speaker is only saying two easily distinguishable words than when
the vocabulary is unlimited: measurements of entropy can quantify
this.
• Other applications for prediction include optical character recognition
(OCR), spelling correction and text segmentation.

Bigrams
• A bigram model assigns a probability to a word based on the previous
word P(wn|wn-1) where wn is the 𝑛 𝑡ℎ
word in some string.
• The probability of some string of words 𝑃(𝑊
𝑛
1
) is thus approximated
by the product of these conditional probabilities:
• 𝑃 𝑊
𝑛
1
= 𝑘=1
𝑛
𝑃(𝑤 𝑘|𝑤 𝑘−1)

Example
• good morning
• good afternoon
• good afternoon
• it is very good
• it is good
• We'll use the symbol <s> to indicate the start of an utterance, so the
corpus really looks like:
• <s> good morning <s> good afternoon <s> good afternoon <s> it is
very good <s> it is good <s>

• The bigram probabilities are given as
•
𝐶(𝑤 𝑛−1 𝑤 𝑛)
𝑤 𝐶(𝑤 𝑛−1 𝑤)
• the count of a particular bigram, normalized by dividing by the total
number of bigrams starting with the same word.

Part of Speech (POS) tagging
• One important application is to part-of-speech tagging (POS tagging), where the words in
a corpus are associated with a tag indicating some syntactic information that applies to
that particular use of the word.
• For instance, consider the example sentence below:
• They can fish.
• This has two readings: one (the most likely) about ability to Fish and other about putting
fish in cans. Fish is ambiguous between a singular noun, plural noun and a verb, while
can is ambiguous between singular noun, verb (the `put in cans' use) and modal verb.
However, they is unambiguously a pronoun. (I am ignoring some less likely possibilities,
such as proper names.)
• These distinctions can be indicated by POS tags:
• they PNP
• can VM0 VVB VVI NN1
• fish NN1 NN2 VVB VVI

• The meaning of the tags above is:
• NN1 singular noun
• NN2 plural noun
• PNP personal pronoun
• VM0 modal auxiliary verb
• VVB base form of verb (except infinitive)
• VVI infinitive form of verb (i.e. occurs with `to')

• A POS tagger resolves the lexical ambiguities to give the most likely
set of tags for the sentence. In this case, the right tagging is likely to
be:
• They PNP can VM0 sh VVB . PUN
• Note the tag for the full stop: punctuation is treated as unambiguous.
POS tagging can be regarded as a form of very basic word sense
disambiguation.
• The other syntactically possible reading is:
• They PNP can VVB sh NN2 . PUN

Stochastic POS tagging
• One form of POS tagging applies the N-gram technique that we saw
above, but in this case it applies to the POS tags rather than the
individual words.
• The most common approaches depend on a small amount of
manually tagged training data from which POS N-grams can be
extracted.

Generative grammar
• concerned with the structure of phrases
• the dog slept
• can be bracketed
• ((the (big dog)) slept)

Equivalent
• Two grammars are said to be weakly-equivalent if they generate the
same strings.
• Two grammars are strongly equivalent if they assign the same
bracketing's to all strings they generate.

• In most, but not all, approaches, the internal structures are given
labels.
• For instance, the big dog is a noun phrase (abbreviated NP),
• slept, slept in the park and licked Sandy are verb phrases (VPs).
• The labels such as NP and VP correspond to non-terminal symbols in
a grammar.

Context free grammars
• A CFG has four components,
1. a set of non-terminal symbols (e.g., S, VP), conventionally written in
uppercase;
2. a set of terminal symbols (i.e., the words), conventionally written in
lowercase;
3. a set of rules (productions), where the left hand side (the mother) is
a single non-terminal and the right hand side is a sequence of one or
more non-terminal or terminal symbols (the daughters);
4. a start symbol, conventionally S, which is a member of the set of
non-terminal symbols.

• they fish
• (S (NP they) (VP (V fish)))
• they can fish
• (S (NP they) (VP (V can) (VP (V fish))))
• ;;; the modal verb `are able to' reading
• (S (NP they) (VP (V can) (NP fish)))
• ;;; the less plausible, put fish in cans, reading
• they fish in rivers
• (S (NP they) (VP (VP (V fish)) (PP (P in) (NP rivers))))

Parse Trees
• Parse trees are equivalent to bracketed structures, but are easier to
read for complex cases.
• A parse tree and bracketed structure for one reading of they can fish
in December is shown below.

Chart Parsing
• A chart is a list of edges.
• In the simplest version of chart parsing, each edge records a rule
application and has the following structure:
• [id, left vertex, right vertex, mother category, daughters]
• A vertex is an integer representing a point in the input string, as illustrated
below:
. they . can . fish .
0 1 2 3

• mother category refers to the rule that has been applied to create the
edge.
• daughters is a list of the edges that acted as the daughters for this
particular rule application.

Lexical Semantics
1. Meaning postulates
2. Classical lexical relations: hyponymy, meronymy, synonymy, antonymy
3. Taxonomies and WordNet
4. Classes of polysemy: homonymy, regular polysemy, vagueness
5. Word sense disambiguation

Meaning postulates
• Inference rules can be used to relate open class predicates.
• bachelor(x) -> man(x) ^ unmarried(x)
• how to define table, tomato or thought

Hyponymy
• Hyponymy is the classical relation
• e.g. dog is a hyponym of animal
• That is, the relevant sense of dog is the hyponym of animal
• animal is the hypernym of dog

• Classes of words can be categorized by hyponymy
• Some nouns - biological taxonomies, human artefacts, professions etc
• Abstract nouns, such as truth,
• Some verbs can be treated as being hyponyms of one another . e.g.
murder is a hyponym of kill
• Hyponymy is essentially useless for adjectives.
• coin a hyponym of money?
• coin might be metal and also money - multiple parents might be
possible

Meronymy
• The standard examples of meronymy apply to physical relationships
• arm is part of a body (arm is a meronym of body);
• steering wheel is a merony of car
• a soldier is part of an army

Synonymy and Antonymy
• True synonyms are relatively uncommon
• thin, slim, slender, skinny
• Antonymy is mostly discussed with respect to adjectives
• e.g., big/little, like/dislike

WordNet
• WordNet is the main resource for lexical semantics for English that is
used in NLP, primarily because of its very large coverage and the fact
that it's freely available.
• The primary organisation of WordNet is into synsets: synonym sets
(near-synonyms). To illustrate this, the following is part of what
WordNet returns as an `overview' of red:

• wn red -over
• Overview of adj red The adj red has 6 senses (first 5 from tagged
texts)
• 1. (43) red, reddish, ruddy, blood-red, carmine, cerise, cherry, cherry-
red, crimson, ruby, ruby-red, scarlet -- (having any of numerous bright
or strong colors reminiscent of the color of blood or cherries or
tomatoes or rubies)
• 2. (8) red, reddish -- ((used of hair or fur) of a reddish brown color;
"red deer"; reddish hair")
• Nouns in WordNet are organized by hyponymy

• Sense 6
• big cat, cat
• => leopard, Panthera pardus
• => leopardess
• => panther
• => snow leopard, ounce, Panthera uncia
• => jaguar, panther, Panthera onca, Felis onca
• => lion, king of beasts, Panthera leo
• => lioness
• => lionet
• => tiger, Panthera tigris
• => Bengal tiger
• => tigress
• => liger
• => tiglon, tigon
• => cheetah, chetah, Acinonyx jubatus
• => saber-toothed tiger, sabertooth
• => Smiledon californicus
• => false saber-toothed tiger

Polysemy
• The standard example of polysemy is bank (river bank) vs bank
(financial institution).
• teacher is intuitively general between male and female teachers
• Dictionaries are not much help

Word sense disambiguation
• Word sense disambiguation (WSD) is needed for most NL applications
that involve semantics.
• What's changed since the 1980s is that various statistical or machine-
learning techniques have been used to avoid hand-crafting rules.
• supervised learning
• unsupervised learning
• Machine readable dictionaries (MRDs)

Discourse
• Utterances are always understood in a particular context. Context-
dependent situations include:
1. Referring expressions: pronouns, definite expressions etc.
2. Universe of discourse: every dog barked, doesn't mean every dog in
the world but only every dog in some explicit or implicit contextual set.
3. Responses to questions, etc: only make sense in a context:
Who came to the party? Not Sandy.
4. Implicit relationships between events: Max fell. John pushed him.

Rhetorical relations and coherence
• Consider the following discourse: Max fell. John pushed him.
• This discourse can be interpreted in at least two ways:
1. Max fell because John pushed him.
2. Max fell and then John pushed him.
• In 1 the link is a form of explanation, but 2 is an example of narration.
• because and and then are said to be cue phrases.

Coherence
• Discourses have to have connectivity to be coherent:
• Kim got into her car. Sandy likes apples.
• Both of these sentences make perfect sense in isolation, but taken
together they are incoherent.
• Adding context can restore coherence:
• Kim got into her car. Sandy likes apples, so Kim thought she'd go to
the farm shop and see if she could get some.
• The second sentence can be interpreted as an explanation of the first.

• For example, consider a system that reports share prices. This might
generate:
• In trading yesterday: Dell was up 4.2%, Safeway was down 3.2%,
Compaq was up 3.1%.
• This is much less acceptable than a connected discourse:
• Computer manufacturers gained in trading yesterday: Dell was up
4.2% and Compaq was up 3.1%. But retail stocks suffered: Safeway
was down 3.2%.
• Here but indicates a Contrast.

Factors influencing discourse interpretation
• Cue phrases. These are sometimes unambiguous, but not usually. e.g.
and
• Punctuation (also prosody) and text structure.
• Max fell (John pushed him) and Kim laughed.
• Max fell, John pushed him and Kim laughed.
• Real world content:
• Max fell. John pushed him as he lay on the ground.
• Tense and aspect.
• Max fell. John had pushed him.
• Max was falling. John pushed him.

Referring expressions
• referent a real world entity that some piece of text (or speech) refers
to.
• referring expressions bits of language used to perform reference by a
speaker
• antecedant the text evoking a referent.
• anaphora the phenomenon of referring to an antecedant

Pronoun agreement
• Pronouns generally have to agree in number and gender with their
antecedants.
• A little girl is at the door - see what she wants, please?
• My dog has hurt his foot - he is in a lot of pain.

Reflexives
• Johni cut himself shaving.
(himself = John, subscript notation used to indicate this)
• reflexive pronouns must be coreferential with a preceeding argument
of the same verb.

Pleonastic pronouns
• Pleonastic pronouns are semantically empty, and don't refer:
• It is snowing.
• It is not easy to think of good examples.
• It is obvious that Kim snores.

Salience
• There are a number of effects which cause particular pronoun
referents to be preferred.
• Recency More recent referents are preferred
• Kim has a fast car. Sandy has an even faster one. Lee likes to drive it.
• it preferentially refers to Sandy's car, rather than Kim's.
• Grammatical role Subjects > objects > everything else:
• Fred went to the Grafton Centre with Bill. He bought a CD.
• he is more likely to be interpreted as Fred than as Bill.

• Repeated mention Entities that have been mentioned more
frequently are preferred.
• Fred was getting bored. He decided to go shopping. Bill went to the Grafton
Centre with Fred. He bought a CD.
• He=Fred (maybe) despite the general preference for subjects
• Parallelism Entities which share the same role as the pronoun in the
same sort of sentence are preferred
• Bill went with Fred to the Grafton Centre. Kim went with him to Lion Yard.
• Him=Fred, because the parallel interpretation is preferred.

• Coherence effects The pronoun resolution may depend on the
rhetorical/discourse relation that is inferred.
• Bill likes Fred. He has a great sense of humour.
• He = Fred preferentially, possibly because the second sentence is
interpreted as an explanation of the first, and having a sense of
humour is seen as a reason to like someone.

Applications
• Machine Translation
• Spoken Dialogue Systems
• Machine translation (MT)
• Direct transfer: map between morphologically analysed structures.
• Syntactic transfer: map between syntactically analysed structures.
• Semantic transfer: map between semantics structures.
• Interlingua: construct a language-neutral representation from parsing and use
this for generation

MT using semantic transfer
Semantic transfer is an approach to MT which involves:
1. Parsing a source language string to produce a meaning
representation.
2. Transforming that representation to one appropriate for the target
language.
3. Generating Target Language from the transformed representation.

Dialogue Systems
Turn-taking: generally there are points where a speaker invites someone else
to take a turn (possibly choosing a specic person), explicitly (e.g., by asking a
question) or otherwise.
Pauses: pauses between turns are generally very short (a few hundred
milliseconds, but highly culture specific).
• Longer pauses are assumed to be meaningful: example from Levinson
(1983: 300)
A: Is there something bothering you or not? (1.0 sec pause)
A: Yes or no? (1.5 sec pause)
A: Eh?
B: No.

• Overlap: Utterances can overlap (the acceptability of this is
dialect/culture specific but unfortunately humans tend to interrupt
automated systems. this is known as barge in).
• Backchannel: Utterances like Uh-huh, OK can occur during other
speaker's utterance as a sign that the hearer is paying attention.
• Attention: The speaker needs reassurance that the hearer is
understanding/paying attention. Often eye contact is enough, but this
is problematic with telephone conversations, dark sunglasses, etc.
Dialogue systems should give explicit feedback.
• Cooperativity: Because participants assume the others are
cooperative, we get effects such as indirect answers to questions.
• When do you want to leave?
• My meeting starts at 3pm.

1. Single initiative systems (also known as system initiative systems):
system controls what happens when.
• System: Which station do you want to leave from?
• User: King's Cross
2. Mixed initiative dialogue. Both participants can control the dialogue
to some extent.
• System: Which station do you want to leave from?
• User: I don't know, tell me which station I need for Cambridge.
3. Dialogue tracking

Email response
• The LinGO ERG constraint-based grammar has been used for parsing in
commercially-deployed email response systems.
Well, I am going on vacation for the next two weeks, so the first day that I
would be able to meet would be the eighteenth.
If you give me your name and your address we will send you the ticket.
The morning is good, but nine o'clock might be a little too late, as I have a
seminar at ten o'clock.

References
• James Allen, Natural Language Understanding, Second Edition,
Benjamin-Cummings Publishing Co., Inc. USA, 1995.
• Eugene Charniack, Statistical Language Learning, MIT Press, USA,
1996.
• Daniel Jurafsky and James H. Martin, Speech and Language
Processing, Second Edition, Prentice Hall, 2008.
• Christopher D. Manning, and Heinrich Schutze, Foundations of
Statistical Natural Language Processing, MIT Press, 1999.

Questions ?

Natural language processing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Natural language processing

Ähnlich wie Natural language processing (20)

Mehr von National Institute of Technology Durgapur

Mehr von National Institute of Technology Durgapur (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Natural language processing