SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
Lecture Notes - Are Natural Languages Regular?
This is an important question for two reasons: first, it places an upper bound on the running time of
algorithms that process natural language; second, it may tell us something about human language
processing and language acquisition.
To answer this question let us first understand…
• What is a language (natural language / formal language)?
• What is a regular language?
• What are regular grammars?
What is a natural language?
A natural language is a human communication system. A natural language can be thought of as a
mutually understandable communication system that is used between members of some population.
When communicating, speakers of a natural language are tacitly agreeing on what strings are
allowed (i.e., which strings are grammatical). Dialects and specialized languages (including e.g.,
the language used on social media) are all natural languages in their own right.
Named languages that you are familiar with, such as French, Chinese, English etc, are usually
historically, politically or geographically derived labels for populations of speakers.
A natural language has high ambiguity.
Example: I made her duck
1. I cooked waterfowl* for her.
2. I cooked waterfowl* belonging to her.
3. I created the (plaster?) duck she owns.
4. I caused her to quickly lower her head.
5. I turned her into a duck.
Several types of ambiguity combine to cause many meanings:
• morphological (her can be a dative pronoun or possessive pronoun and duck can be a noun
or a verb)
• syntactic (make can behave both transitively and ditransitively; make can select a direct
object or a verb)
• semantic (make can mean create, cause, cook ...)
What is a formal language?
A formal language is a set of strings over an alphabet.
Alphabet: An alphabet is specified by a finite set, ∑ , whose elements are called symbols. Some
examples are shown below:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} the 10-element set of decimal digits.
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
{a, b, c, …. x, y, z} the 26-element set of lower-case characters of written English.
{aardvark, ….. zebra} the 250,000-element set of words in the Oxford English Dictionary.
The set of natural numbers N = {0, 1, 2, 3, ….} cannot be an alphabet because it is infinite.
Strings: A string of length n over an alphabet ∑ is an ordered n-tuple of elements of ∑.
∑ * denotes the set of all strings over ∑ of finite length.
If ∑ = {a, b} then ∊, ba, bab, aab are examples of strings over ∑.
If ∑ = {a} then ∑ * = {∊, a, aa, aaa, ….}
If ∑ = {cats, dogs, eat} then
∑ * = {∊, cats, cats eat, cats eat dogs, …..}
Languages: Given an alphabet ∑ any subset of ∑ * is a formal language over alphabet ∑.
What is a regular language?
A language is regular if it is equal to the set of strings accepted by some deterministic finite-state
automaton (DFA).
Regular languages are accepted by DFAs.
Given a DFA M = (Q,∑,∆,s,F) the language, L(M), of strings accepted by M can be generated by
the regular grammar Greg = (N, ∑, S,P) where:
N= {Q} the non-terminals are the states of M
∑ = ∑ the terminals, set of transition symbols of M
S = s the starting symbol is the starting state of M
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
P = qi → aqj when (qi , a) = qj ∊ ∆
or qi → ∊ when q ∊ F (i.e. when q is an end state)
In order to derive a string from a grammar
• start with the designated starting symbol
• then non-terminal symbols are repeatedly expanded using the rewrite rules until there is
nothing further left to expand.
The rewrite rules derive the members of a language from their internal structure (or phrase
structure).
A regular language has a left- and right-linear grammar.
For every regular grammar the rewrite rules of the grammar can all be expressed in the form:
X → aY
X → a
or alternatively, they can all be expressed as:
X → Ya
X → a
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
A phrase structure grammar over an alphabet ∑ is defined by a tuple G = (N, ∑, S,P). The language
generated by grammar G is L(G):
Non-terminals N: Non-terminal symbols (often uppercase letters) may be rewritten using the rules
of the grammar.
Terminals ∑ : Terminal symbols (often lowercase letters) are elements of ∑ and cannot be rewritten.
Note N ∩ ∑ = ϕ.
Start Symbol S: A distinguished non-terminal symbol S ∊ N. This non-terminal provides the starting
point for derivations.
Phrase Structure Rules P: Phrase structure rules are pairs of the form (w, v) usually written :
w → v, where w ∊ (∑ ∪ N)*N(∑ ∪ N)* and v ∊ (∑ ∪ N)*
Now lets try and the answer the question Can regular grammars model natural language?
It turns out that regular grammars have limitations when modelling natural languages for following
reasons:
• Centre Embedding
• Redundancy
• Useful internal structures
Problems using regular grammars for natural language
1. Centre Embedding
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
In principle, the syntax of natural languages cannot be described by a regular language due to the
presence of centre-embedding; i.e. infinitely recursive structures described by the rule, A → αAβ,
which generate language examples of the form, an
bn
.
For instance, the sentences below have a center embedded structure.
1. The students the police arrested complained.
2. The luggage that the passengers checked arrived.
3. The luggage that the passengers that the storm delayed checked arrived
Intuitively, the reason that a regular language cannot describe centre-embedding is that its
associated automaton has no memory of what has occurred previously in a string.
In order to ‘know’ that n verbs were required to match n nominals already seen, an automaton would
need to ‘record’ that n nominals had been seen; but a DFA has no mechanism to do this.
Formally, we can prove this using Pumping Lemma property to show that strings of the form anbn
are not regular.
The pumping lemma for regular languages is used to prove that a language is not regular. The
pumping lemma property is:
All w ∊ L with |w| ≥ l can be expressed as a concatenation of three strings, w = u1vu2, where u1, v
and u2 satisfy:
|v| ≥ 1 (i.e. v ≠ ∊)
u1|v| ≤ l
for all n ≥ 0, u1vnu2 ∊ L (i.e. u1u2 ∊ L, u1vu2 ∊ L, u1vvu2 2 L, u1vvvu2 ∊ L, etc.)
If you intersect a regular language with another regular language you should get a third regular
language.
Lreg1 ∩ Lreg2 = Lreg3
Also regular languages are closed under homomorphism (we can map all nouns to a and all verbs
to b)
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
So if English is regular and we intersect it with another regular language (e.g. the one generated by
/the a (that the a)*b*/) we should get another regular language.
if Leng then Leng ∩ La*b* = Lreg3
However the intersection of an a*b* with English is anbn ( in our example case specifically /the a
(that the a)n-1bn/), which is not regular as it fails the pumping lemma property.
but Leng ∩ La*b* = La
n
b
n
(which is not regular )
The assumption that English is regular must be incorrect.
2. Redundancy
Grammars written using regular grammar rules alone are highly redundant: since the rules are very
simple we need a great many of them to describe the language. This makes regular grammars very
difficult to build and maintain.
Useful internal structures
There are instances where a regular language can recognize the strings of a language but in doing
so does not provide a structure that is linguistically useful to us. The left-linear or right-linear
internal structures derived by regular grammars are generally not very useful for higher level NLP
applications.
We need informative internal structure so that we can, for example, build up good semantic
representations.
In practice, regular grammars can be useful for partial grammars (i.e. when we don’t need to know
the syntax tree for the whole sentence but rather just some part of it) and also when we don’t care
about derivational structure (i.e. when we just want a Boolean for whether a string is in a language).
For example, in information extraction, we need to recognize named entities.
The internal structure of named entities is normally unimportant to us, we just want to recognize
when we encounter them.
For instance, using rules such as:
NP → nnsb NP
NP → np1 NP
NP → np1
where NP is a non-terminal and nnsb and np1 are terminals representing tags from the large tagset,
you could match a titled name like, Prof. Stephen William Hawking.
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
For every natural language that exists, can we find a context-free grammar to generate it?
There is some evidence that natural language can contain cross serial dependencies. A small
number of languages exhibit strings of the form shown below.
There is a Zurich dialect of Swiss German in which constructions like the following are found:
mer d’chind em Hans es huus haend wele laa hälfe aastriiche.
we the children Hans the house have wanted to let help paint.
we have wanted to let the children help Hans paint the house.
Such expressions may not be derivable by a context-free grammar.
Where do natural languages fit in Chomsky hierarchy?
If we are to use formal grammars to represent natural language, it is useful to know where they
appear in the Chomsky hierarchy. With respect to natural language, it might turn out that the set of
all attested natural languages is actually as depicted in Figure.
The overlap with the context-sensitive languages which accounts for those languages that have
cross-serial dependencies.
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
Natural languages are an infinite set of sentences constructed out of a finite set of characters.
Words in a sentence don’t have defined upper limits either.
When natural languages are reverse engineered into their component parts, they get broken down
into four parts - syntax, semantics, morphology, phonology.
Natural languages are believed to be at least context-free. However, Dutch and Swiss German
contain grammatical constructions with cross-serial dependencies which make them context
sensitive.
Extensions to Chomsky hierarchy that find relevance in NLP
There are two extensions to the traditional Chomsky hierarchy that have proved useful in linguistics
and cognitive science:
Mildly context-sensitive languages – CFGs are not adequate (weakly or strongly) to characterize
some aspects of language structure. To derive extra power beyond CFG, a grammatical formalism
called Tree Adjoining Grammars (TAG) was proposed as an approximate characterization of Mildly
Context-Sensitive Grammars. composition, called 'adjoining’.
Another classification called Minimalist Grammars (MG) describes an even larger class of formal
languages.
Sub-regular languages
A sub-regular language is a set of strings that can be described without employing the full power of
finite state automata. Many aspects of human language are manifestly sub-regular, such as some
‘strictly local’ dependencies.
Example – identifying recurring sub-string patterns within words is one such common application.

Weitere ähnliche Inhalte

Was ist angesagt?

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar Hemantha Kulathilake
 
Chapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.pptChapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.pptFamiDan
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free GrammarAkhil Kaushik
 
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDAPush Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDAAshish Duggal
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prologHarry Potter
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix ArrayHarshit Agarwal
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
2.1 & 2.2 grammar introduction – types of grammar
2.1 & 2.2 grammar introduction – types of grammar2.1 & 2.2 grammar introduction – types of grammar
2.1 & 2.2 grammar introduction – types of grammarSampath Kumar S
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 

Was ist angesagt? (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
Chapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.pptChapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.ppt
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
Finite automata
Finite automataFinite automata
Finite automata
 
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDAPush Down Automata (PDA) | TOC  (Theory of Computation) | NPDA | DPDA
Push Down Automata (PDA) | TOC (Theory of Computation) | NPDA | DPDA
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
NLP
NLPNLP
NLP
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
2.1 & 2.2 grammar introduction – types of grammar
2.1 & 2.2 grammar introduction – types of grammar2.1 & 2.2 grammar introduction – types of grammar
2.1 & 2.2 grammar introduction – types of grammar
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Pthread
PthreadPthread
Pthread
 

Ähnlich wie Lecture Notes-Are Natural Languages Regular.pdf

Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammarmeresie tesfay
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptLamhotNaibaho3
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of ComputationAnkur Singh
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
Free Ebooks Download ! Edhole
Free Ebooks Download ! EdholeFree Ebooks Download ! Edhole
Free Ebooks Download ! EdholeEdhole.com
 
Mba ebooks ! Edhole
Mba ebooks ! EdholeMba ebooks ! Edhole
Mba ebooks ! EdholeEdhole.com
 
01-Introduction&Languages.pdf
01-Introduction&Languages.pdf01-Introduction&Languages.pdf
01-Introduction&Languages.pdfTariqSaeed80
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched textsHarsh Jhamtani
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf SVTaylor123
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfjaishreemane73
 
Алексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВАлексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВMinsk Linux User Group
 
Final formal languages
Final formal languagesFinal formal languages
Final formal languagesMegha Khanna
 
Formal language
Formal languageFormal language
Formal languageRajendran
 
7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: SyntaxTikaram Poudel
 

Ähnlich wie Lecture Notes-Are Natural Languages Regular.pdf (20)

Regular expression
Regular expressionRegular expression
Regular expression
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of Computation
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Free Ebooks Download ! Edhole
Free Ebooks Download ! EdholeFree Ebooks Download ! Edhole
Free Ebooks Download ! Edhole
 
Mba ebooks ! Edhole
Mba ebooks ! EdholeMba ebooks ! Edhole
Mba ebooks ! Edhole
 
01-Introduction&Languages.pdf
01-Introduction&Languages.pdf01-Introduction&Languages.pdf
01-Introduction&Languages.pdf
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf
 
Nlp
NlpNlp
Nlp
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
 
NLP-my-lecture (3).ppt
NLP-my-lecture (3).pptNLP-my-lecture (3).ppt
NLP-my-lecture (3).ppt
 
Алексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВАлексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВ
 
Final formal languages
Final formal languagesFinal formal languages
Final formal languages
 
Formal language
Formal languageFormal language
Formal language
 
7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax
 

Kürzlich hochgeladen

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 

Kürzlich hochgeladen (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 

Lecture Notes-Are Natural Languages Regular.pdf

  • 1. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT Lecture Notes - Are Natural Languages Regular? This is an important question for two reasons: first, it places an upper bound on the running time of algorithms that process natural language; second, it may tell us something about human language processing and language acquisition. To answer this question let us first understand… • What is a language (natural language / formal language)? • What is a regular language? • What are regular grammars? What is a natural language? A natural language is a human communication system. A natural language can be thought of as a mutually understandable communication system that is used between members of some population. When communicating, speakers of a natural language are tacitly agreeing on what strings are allowed (i.e., which strings are grammatical). Dialects and specialized languages (including e.g., the language used on social media) are all natural languages in their own right. Named languages that you are familiar with, such as French, Chinese, English etc, are usually historically, politically or geographically derived labels for populations of speakers. A natural language has high ambiguity. Example: I made her duck 1. I cooked waterfowl* for her. 2. I cooked waterfowl* belonging to her. 3. I created the (plaster?) duck she owns. 4. I caused her to quickly lower her head. 5. I turned her into a duck. Several types of ambiguity combine to cause many meanings: • morphological (her can be a dative pronoun or possessive pronoun and duck can be a noun or a verb) • syntactic (make can behave both transitively and ditransitively; make can select a direct object or a verb) • semantic (make can mean create, cause, cook ...) What is a formal language? A formal language is a set of strings over an alphabet. Alphabet: An alphabet is specified by a finite set, ∑ , whose elements are called symbols. Some examples are shown below: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} the 10-element set of decimal digits.
  • 2. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT {a, b, c, …. x, y, z} the 26-element set of lower-case characters of written English. {aardvark, ….. zebra} the 250,000-element set of words in the Oxford English Dictionary. The set of natural numbers N = {0, 1, 2, 3, ….} cannot be an alphabet because it is infinite. Strings: A string of length n over an alphabet ∑ is an ordered n-tuple of elements of ∑. ∑ * denotes the set of all strings over ∑ of finite length. If ∑ = {a, b} then ∊, ba, bab, aab are examples of strings over ∑. If ∑ = {a} then ∑ * = {∊, a, aa, aaa, ….} If ∑ = {cats, dogs, eat} then ∑ * = {∊, cats, cats eat, cats eat dogs, …..} Languages: Given an alphabet ∑ any subset of ∑ * is a formal language over alphabet ∑. What is a regular language? A language is regular if it is equal to the set of strings accepted by some deterministic finite-state automaton (DFA). Regular languages are accepted by DFAs. Given a DFA M = (Q,∑,∆,s,F) the language, L(M), of strings accepted by M can be generated by the regular grammar Greg = (N, ∑, S,P) where: N= {Q} the non-terminals are the states of M ∑ = ∑ the terminals, set of transition symbols of M S = s the starting symbol is the starting state of M
  • 3. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT P = qi → aqj when (qi , a) = qj ∊ ∆ or qi → ∊ when q ∊ F (i.e. when q is an end state) In order to derive a string from a grammar • start with the designated starting symbol • then non-terminal symbols are repeatedly expanded using the rewrite rules until there is nothing further left to expand. The rewrite rules derive the members of a language from their internal structure (or phrase structure). A regular language has a left- and right-linear grammar. For every regular grammar the rewrite rules of the grammar can all be expressed in the form: X → aY X → a or alternatively, they can all be expressed as: X → Ya X → a
  • 4. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT A phrase structure grammar over an alphabet ∑ is defined by a tuple G = (N, ∑, S,P). The language generated by grammar G is L(G): Non-terminals N: Non-terminal symbols (often uppercase letters) may be rewritten using the rules of the grammar. Terminals ∑ : Terminal symbols (often lowercase letters) are elements of ∑ and cannot be rewritten. Note N ∩ ∑ = ϕ. Start Symbol S: A distinguished non-terminal symbol S ∊ N. This non-terminal provides the starting point for derivations. Phrase Structure Rules P: Phrase structure rules are pairs of the form (w, v) usually written : w → v, where w ∊ (∑ ∪ N)*N(∑ ∪ N)* and v ∊ (∑ ∪ N)* Now lets try and the answer the question Can regular grammars model natural language? It turns out that regular grammars have limitations when modelling natural languages for following reasons: • Centre Embedding • Redundancy • Useful internal structures Problems using regular grammars for natural language 1. Centre Embedding
  • 5. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT In principle, the syntax of natural languages cannot be described by a regular language due to the presence of centre-embedding; i.e. infinitely recursive structures described by the rule, A → αAβ, which generate language examples of the form, an bn . For instance, the sentences below have a center embedded structure. 1. The students the police arrested complained. 2. The luggage that the passengers checked arrived. 3. The luggage that the passengers that the storm delayed checked arrived Intuitively, the reason that a regular language cannot describe centre-embedding is that its associated automaton has no memory of what has occurred previously in a string. In order to ‘know’ that n verbs were required to match n nominals already seen, an automaton would need to ‘record’ that n nominals had been seen; but a DFA has no mechanism to do this. Formally, we can prove this using Pumping Lemma property to show that strings of the form anbn are not regular. The pumping lemma for regular languages is used to prove that a language is not regular. The pumping lemma property is: All w ∊ L with |w| ≥ l can be expressed as a concatenation of three strings, w = u1vu2, where u1, v and u2 satisfy: |v| ≥ 1 (i.e. v ≠ ∊) u1|v| ≤ l for all n ≥ 0, u1vnu2 ∊ L (i.e. u1u2 ∊ L, u1vu2 ∊ L, u1vvu2 2 L, u1vvvu2 ∊ L, etc.) If you intersect a regular language with another regular language you should get a third regular language. Lreg1 ∩ Lreg2 = Lreg3 Also regular languages are closed under homomorphism (we can map all nouns to a and all verbs to b)
  • 6. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT So if English is regular and we intersect it with another regular language (e.g. the one generated by /the a (that the a)*b*/) we should get another regular language. if Leng then Leng ∩ La*b* = Lreg3 However the intersection of an a*b* with English is anbn ( in our example case specifically /the a (that the a)n-1bn/), which is not regular as it fails the pumping lemma property. but Leng ∩ La*b* = La n b n (which is not regular ) The assumption that English is regular must be incorrect. 2. Redundancy Grammars written using regular grammar rules alone are highly redundant: since the rules are very simple we need a great many of them to describe the language. This makes regular grammars very difficult to build and maintain. Useful internal structures There are instances where a regular language can recognize the strings of a language but in doing so does not provide a structure that is linguistically useful to us. The left-linear or right-linear internal structures derived by regular grammars are generally not very useful for higher level NLP applications. We need informative internal structure so that we can, for example, build up good semantic representations. In practice, regular grammars can be useful for partial grammars (i.e. when we don’t need to know the syntax tree for the whole sentence but rather just some part of it) and also when we don’t care about derivational structure (i.e. when we just want a Boolean for whether a string is in a language). For example, in information extraction, we need to recognize named entities. The internal structure of named entities is normally unimportant to us, we just want to recognize when we encounter them. For instance, using rules such as: NP → nnsb NP NP → np1 NP NP → np1 where NP is a non-terminal and nnsb and np1 are terminals representing tags from the large tagset, you could match a titled name like, Prof. Stephen William Hawking.
  • 7. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT For every natural language that exists, can we find a context-free grammar to generate it? There is some evidence that natural language can contain cross serial dependencies. A small number of languages exhibit strings of the form shown below. There is a Zurich dialect of Swiss German in which constructions like the following are found: mer d’chind em Hans es huus haend wele laa hälfe aastriiche. we the children Hans the house have wanted to let help paint. we have wanted to let the children help Hans paint the house. Such expressions may not be derivable by a context-free grammar. Where do natural languages fit in Chomsky hierarchy? If we are to use formal grammars to represent natural language, it is useful to know where they appear in the Chomsky hierarchy. With respect to natural language, it might turn out that the set of all attested natural languages is actually as depicted in Figure. The overlap with the context-sensitive languages which accounts for those languages that have cross-serial dependencies.
  • 8. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT Natural languages are an infinite set of sentences constructed out of a finite set of characters. Words in a sentence don’t have defined upper limits either. When natural languages are reverse engineered into their component parts, they get broken down into four parts - syntax, semantics, morphology, phonology. Natural languages are believed to be at least context-free. However, Dutch and Swiss German contain grammatical constructions with cross-serial dependencies which make them context sensitive. Extensions to Chomsky hierarchy that find relevance in NLP There are two extensions to the traditional Chomsky hierarchy that have proved useful in linguistics and cognitive science: Mildly context-sensitive languages – CFGs are not adequate (weakly or strongly) to characterize some aspects of language structure. To derive extra power beyond CFG, a grammatical formalism called Tree Adjoining Grammars (TAG) was proposed as an approximate characterization of Mildly Context-Sensitive Grammars. composition, called 'adjoining’. Another classification called Minimalist Grammars (MG) describes an even larger class of formal languages. Sub-regular languages A sub-regular language is a set of strings that can be described without employing the full power of finite state automata. Many aspects of human language are manifestly sub-regular, such as some ‘strictly local’ dependencies. Example – identifying recurring sub-string patterns within words is one such common application.