SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Codeco: A Grammar Notation for Controlled
Natural Language in Predictive Editors
Tobias Kuhn
Department of Informatics
University of Zurich
Second Workshop on Controlled Natural Language
15 September 2010
Marettimo (Italy)
Introduction
• Problem: Existing grammar frameworks do not work out
particularly well for CNLs.
• Reason:
• CNLs have essential differences to other languages (natural and
formal ones)
• To solve the writability problem, CNLs have to be embedded in
special tools with very specific requirements.
• Error messages and suggestions
• Predictive editors
• Language generation
• Solution: A new grammar notation that is dedicated to CNLs
and predictive editors.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 2 / 25
CNL Grammar Requirements
Concreteness: CNL grammars should be fully formalized and
interpretable by computers.
Declarativeness: CNL grammars should not depend on a concrete
algorithm or implementation.
Lookahead Features: CNL grammars should allow for the retrieval
of possible next tokens for a partial text.
Anaphoric References: CNL grammars should allow for the
definition of nonlocal structures like anaphoric
references.
Implementability: CNL grammars should be easy to implement in
different programming languages.
Expressivity: CNL grammars should be sufficiently expressive to
express CNLs.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 3 / 25
Lookahead Features
Predictive editors need to know which words can follow a partial text:
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 4 / 25
Anaphoric References
• Anaphoric references in CNLs are resolvable in a deterministic
way:
A country contains an area that is not controlled by the country.
If a person X is a relative of a person Y then the person Y is a
relative of the person X.
John protects himself and Mary helps him.
• Anaphoric references that cannot be resolved should be
disallowed:
* Every area is controlled by it.
* The person X is a relative of the person Y.
• Scopes have to be considered too:
Every man protects a house from every enemy and does not
destroy ...
... himself.
... the house.
* ... the enemy.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 5 / 25
Existing Grammar Frameworks
• Grammar Frameworks for Natural Languages
• Head-Driven Phrase Structure Grammars
• Lexical-Functional Grammars
• Tree-Adjoining Grammars
• Combinatory Categorial Grammars
• Dependency Grammars
• ... and many more
• Backus-Naur Form (BNF)
• Parser Generators
• Yacc
• GNU Bison
• ANTLR
• Definite Clause Grammars (DCG)
• Grammatical Framework (GF)
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 6 / 25
How Existing Grammar Frameworks Fulfill our
Requirements for CNL Grammars
NL BNF PG DCG GF
Concreteness + + + + +
Declarativeness +/– + – (+) +
Lookahead Features – + (+) (+) +
Anaphoric References (+) – – (+) –
Implementability – + – – ?
Expressivity + – + + +
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 7 / 25
The Codeco Notation
Codeco = “Concrete and Declarative Grammar Notation for
Controlled Natural Languages”
• Formal and Declarative
• Easy to implement in different programming languages.
• Expressive enough for common CNLs.
• Lookahead features can be implemented in a practical and
efficient way.
• Deterministic anaphoric references can be defined in an adequate
and simple way.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 8 / 25
Grammar Rules in Codeco
• Grammatical categories with flat feature structures
• Category Types:
• non-terminal (e.g. vp)
• pre-terminal (e.g. noun)
• terminal (e.g. [ does not ])
• Grammar Rule Examples:
• vp num: Num
neg: Neg
:
−→ v
num: Num
neg: Neg
type: tr
np case: acc
• v neg: +
type: Type
:
−→ [ does not ] verb type: Type
• np noun: Noun
:
−→ [ a ] noun text: Noun
• noun text: woman
gender: fem
→ [ woman ]
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 9 / 25
Forward and Backward References
The special categories “>” and “<” can be used to establish
nonlocal dependencies, e.g. for anaphoric references:
np
:
−→ det def: – noun text: Noun > type: noun
noun: Noun
ref
:
−→ det def: + noun text: Noun < type: noun
noun: Noun
s
vp
vp
np
ref
noun
area
det
the
v
tv
control
aux
does not
conj
and
vp
np
noun
area
det
an
v
tv
contains
np
>noun
country
det
A
>
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 10 / 25
Scopes
• Opening of scopes:
• Scopes in (controlled) English usually open at the position of the
scope triggering structure, or nearby.
• Scope opener category “ ” in Codeco:
quant exist: –
:
−→ [ every ]
• Closing of scopes:
• Scopes in (controlled) English usually close at the end of certain
structures like verb phrases, relative clauses, and sentences.
• Scope-closing rules “
∼
−→” in Codeco:
vp num: Num
∼
−−→ v num: Num
type: tr
np case: acc
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 11 / 25
Position Operators
• How to define reflexive pronouns like “herself ” that can only
attach to the subject?
• With the position operator “#”, position identifiers can be
assigned:
np id: Id
:
−→ # Id prop gender: G >
id: Id
gender: G
type: prop
ref subj: Subj
:
−→ [ herself ] < id: Subj
gender: fem
s
vp
np
ref
herself
v
tv
helps
np
prop
Maryp0 p1 p2 p3
#
#
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 12 / 25
Negative Backward References
• How to define that the same variable can be introduced only
once?
*A person X knows a person X.
• Negative backward references “ ” succeed only if no matching
antecedent is accessible:
newvar
:
−→ var text: V
type: var
var: V
> type: var
var: V
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 13 / 25
Complex Backward References
• How to define pronouns like “him” that cannot attach to the
subject?
*John helps him.
• Complex backward references “<+... −...” have one or more
positive feature structure “+” and zero or more negative
ones “−”.
• They succeed if there is an antecedent that matches one of the
positive feature structures but none of the negative ones:
ref subj: Subj
case: acc
:
−→ [ him ] <+ human: +
gender: masc
− id: Subj
• A more complicated (but probably less useful) example:
ref subj: Subj
:
−→ [ this ] <+ hasvar: –
human: –
type: relation − id: Subj type: prop
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 14 / 25
Strong Forward References
• How to define that propernames like “Bill” are always accessible?
*Mary does not love a man. Mary hates him.
Mary does not love Bill. Mary hates him.
• Strong forward references “ ” are always accessible:
np id: Id
:
−→ prop human: H
gender: G



id: Id
human: H
gender: G
type: prop



Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 15 / 25
Reference Resolution: Accessibility
Forward references are only accessible if they are not within a scope
that has already been closed before the position of the backward
reference:
s ∼
vp
vp ∼
np
ref
...
v
tv
destroy
aux
does not
conj
and
vp
pp
np
>n
enemy
det
every
prep
from
np
n
house
det
a
v
tv
protects
np
n
man
det
Every
∼
( ( ()
>
>
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 16 / 25
Reference Resolution: Accessibility
Strong forward references are always accessible:
s ∼
vp
vp ∼
np
ref
...
v
tv
likes
conj
and
vp
np
pp
np
prop
Bill
prep
of
>n
friend
det
every
v
tv
knows
np
prop
Mary
∼
( )
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 17 / 25
Reference Resolution: Proximity
If a backward reference matches more than one forward reference
then the closest one is taken:
s
s
...
...
np
ref
pn
it
conj
then
s
vp
np
n
error
det
an
v
tv
causes
np
pp
np
n
machine
det
a
prep
of
n
part
det
a
conj
If
>
>
> <
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 18 / 25
Possible Extensions
• Semantics (e.g. with λ-DRSs)
• General feature structures (instead of flat ones)
• ...
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 19 / 25
Parsers for Codeco
Two parsers with different parsing approaches exist:
• Transformation into Prolog DCG
• fast (1.5 ms per sentence)
• no lookahead features
• ideal for regression tests and parsing of large texts in batch mode
• Execution in a chart parser (Earley parser) under Java
• slower, but still reasonably fast (130 ms per sentence)
• lookahead features
• ideal for predictive editors in Java
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 20 / 25
ACE in Codeco
• Large subset of ACE in Codeco
• Includes: countable nouns, proper names, intransitive and
transitive verbs, adjectives, adverbs, prepositions, plurals,
negation, comparative and superlative adjectives and adverbs,
of-phrases, relative clauses, modality, numerical quantifiers,
coordination of sentences / verb phrases / relative clauses,
conditional sentences, questions, and anaphoric references (simple
definite noun phrases, variables, and reflexive and irreflexive
pronouns)
• Excludes: Mass nouns, measurement nouns, ditransitive verbs,
numbers and strings as noun phrases, sentences as verb phrase
complements, Saxon genitive, possessive pronouns, noun phrase
coordination, and commands
• 164 grammar rules
• Used in the ACE Editor:
http://attempto.ifi.uzh.ch/webapps/aceeditor/
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 21 / 25
Evaluation of ACE Codeco
Exhaustive Language Generation:
• Evaluation subgrammar with 97 grammar rules
• Minimal lexicon
• 2’250’869 sentences with 3–10 tokens:
sentence length number of sentences growth factor
3 6
4 87 14.50
5 385 4.43
6 1’959 5.09
7 11’803 6.03
8 64’691 5.48
9 342’863 5.30
10 1’829’075 5.33
3–10 2’250’869
• All are accepted by the ACE parser
→ ACE Codeco is a subset of ACE
• None is generated more than once
→ ACE Codeco is unambiguous
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 22 / 25
Evaluation of the Codeco notation and its
implementations
Prolog DCG representation versus Java Earley parser:
• Equivalence of the Implementations:
Generate the same set of sentences up to 8 tokens
→ The two implementations process Codeco in the same way
• Performance Tests:
task grammar implementation seconds/sentence
generation ACE Codeco eval. subset Prolog DCG 0.00286
generation ACE Codeco eval. subset Java Earley parser 0.0730
parsing ACE Codeco eval. subset Prolog DCG 0.000360
parsing ACE Codeco eval. subset Java Earley parser 0.0276
parsing full ACE Codeco Prolog DCG 0.00146
parsing full ACE Codeco Java Earley parser 0.134
parsing full ACE APE 0.0161
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 23 / 25
Conclusions
Codeco ...
• ... fulfills our requirements for CNLs in predictive editors.
• ... is suitable to describe a large subset of ACE.
• ... allows for automatic tests.
• ... stands for a principled and engineering focused approach to
CNL.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 24 / 25
Thank you for your attention!
Questions & Discussion
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 25 / 25

Weitere ähnliche Inhalte

Was ist angesagt?

OUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingOUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingFlorian Leitner
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translationHiroshi Matsumoto
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Florian Leitner
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsCS, NcState
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Class9
 Class9 Class9
Class9issbp
 

Was ist angesagt? (12)

OUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingOUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String Processing
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translation
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
Language models
Language modelsLanguage models
Language models
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause Grammars
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Class9
 Class9 Class9
Class9
 

Andere mochten auch

Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsTobias Kuhn
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsTobias Kuhn
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer Tobias Kuhn
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Tobias Kuhn
 

Andere mochten auch (10)

Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
 
Shreya
ShreyaShreya
Shreya
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical Publications
 
Task week 3 design
Task week 3 designTask week 3 design
Task week 3 design
 
Clares work
Clares workClares work
Clares work
 
Vacantions
VacantionsVacantions
Vacantions
 
Kyle
KyleKyle
Kyle
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
 

Ähnlich wie Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors

1908 working memory
1908 working memory1908 working memory
1908 working memoryWarNik Chow
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsVitomir Kovanovic
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationTobias Kuhn
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPHendrik D'Oosterlinck
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxSyedNadeemAbbas6
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...Tobias Kuhn
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 

Ähnlich wie Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors (20)

1908 working memory
1908 working memory1908 working memory
1908 working memory
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 

Mehr von Tobias Kuhn

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingTobias Kuhn
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsTobias Kuhn
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishingTobias Kuhn
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataTobias Kuhn
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Tobias Kuhn
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for NanopublicationsTobias Kuhn
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data PublishingTobias Kuhn
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...Tobias Kuhn
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Tobias Kuhn
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsTobias Kuhn
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Tobias Kuhn
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksTobias Kuhn
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageTobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureTobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureTobias Kuhn
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiTobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...Tobias Kuhn
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageTobias Kuhn
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiTobias Kuhn
 

Mehr von Tobias Kuhn (20)

Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Scientific Data Publishing
Scientific Data PublishingScientific Data Publishing
Scientific Data Publishing
 
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
AceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural LanguageAceRules: Executing Rules in Controlled Natural Language
AceRules: Executing Rules in Controlled Natural Language
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors

  • 1. Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors Tobias Kuhn Department of Informatics University of Zurich Second Workshop on Controlled Natural Language 15 September 2010 Marettimo (Italy)
  • 2. Introduction • Problem: Existing grammar frameworks do not work out particularly well for CNLs. • Reason: • CNLs have essential differences to other languages (natural and formal ones) • To solve the writability problem, CNLs have to be embedded in special tools with very specific requirements. • Error messages and suggestions • Predictive editors • Language generation • Solution: A new grammar notation that is dedicated to CNLs and predictive editors. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 2 / 25
  • 3. CNL Grammar Requirements Concreteness: CNL grammars should be fully formalized and interpretable by computers. Declarativeness: CNL grammars should not depend on a concrete algorithm or implementation. Lookahead Features: CNL grammars should allow for the retrieval of possible next tokens for a partial text. Anaphoric References: CNL grammars should allow for the definition of nonlocal structures like anaphoric references. Implementability: CNL grammars should be easy to implement in different programming languages. Expressivity: CNL grammars should be sufficiently expressive to express CNLs. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 3 / 25
  • 4. Lookahead Features Predictive editors need to know which words can follow a partial text: Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 4 / 25
  • 5. Anaphoric References • Anaphoric references in CNLs are resolvable in a deterministic way: A country contains an area that is not controlled by the country. If a person X is a relative of a person Y then the person Y is a relative of the person X. John protects himself and Mary helps him. • Anaphoric references that cannot be resolved should be disallowed: * Every area is controlled by it. * The person X is a relative of the person Y. • Scopes have to be considered too: Every man protects a house from every enemy and does not destroy ... ... himself. ... the house. * ... the enemy. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 5 / 25
  • 6. Existing Grammar Frameworks • Grammar Frameworks for Natural Languages • Head-Driven Phrase Structure Grammars • Lexical-Functional Grammars • Tree-Adjoining Grammars • Combinatory Categorial Grammars • Dependency Grammars • ... and many more • Backus-Naur Form (BNF) • Parser Generators • Yacc • GNU Bison • ANTLR • Definite Clause Grammars (DCG) • Grammatical Framework (GF) Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 6 / 25
  • 7. How Existing Grammar Frameworks Fulfill our Requirements for CNL Grammars NL BNF PG DCG GF Concreteness + + + + + Declarativeness +/– + – (+) + Lookahead Features – + (+) (+) + Anaphoric References (+) – – (+) – Implementability – + – – ? Expressivity + – + + + Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 7 / 25
  • 8. The Codeco Notation Codeco = “Concrete and Declarative Grammar Notation for Controlled Natural Languages” • Formal and Declarative • Easy to implement in different programming languages. • Expressive enough for common CNLs. • Lookahead features can be implemented in a practical and efficient way. • Deterministic anaphoric references can be defined in an adequate and simple way. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 8 / 25
  • 9. Grammar Rules in Codeco • Grammatical categories with flat feature structures • Category Types: • non-terminal (e.g. vp) • pre-terminal (e.g. noun) • terminal (e.g. [ does not ]) • Grammar Rule Examples: • vp num: Num neg: Neg : −→ v num: Num neg: Neg type: tr np case: acc • v neg: + type: Type : −→ [ does not ] verb type: Type • np noun: Noun : −→ [ a ] noun text: Noun • noun text: woman gender: fem → [ woman ] Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 9 / 25
  • 10. Forward and Backward References The special categories “>” and “<” can be used to establish nonlocal dependencies, e.g. for anaphoric references: np : −→ det def: – noun text: Noun > type: noun noun: Noun ref : −→ det def: + noun text: Noun < type: noun noun: Noun s vp vp np ref noun area det the v tv control aux does not conj and vp np noun area det an v tv contains np >noun country det A > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 10 / 25
  • 11. Scopes • Opening of scopes: • Scopes in (controlled) English usually open at the position of the scope triggering structure, or nearby. • Scope opener category “ ” in Codeco: quant exist: – : −→ [ every ] • Closing of scopes: • Scopes in (controlled) English usually close at the end of certain structures like verb phrases, relative clauses, and sentences. • Scope-closing rules “ ∼ −→” in Codeco: vp num: Num ∼ −−→ v num: Num type: tr np case: acc Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 11 / 25
  • 12. Position Operators • How to define reflexive pronouns like “herself ” that can only attach to the subject? • With the position operator “#”, position identifiers can be assigned: np id: Id : −→ # Id prop gender: G > id: Id gender: G type: prop ref subj: Subj : −→ [ herself ] < id: Subj gender: fem s vp np ref herself v tv helps np prop Maryp0 p1 p2 p3 # # Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 12 / 25
  • 13. Negative Backward References • How to define that the same variable can be introduced only once? *A person X knows a person X. • Negative backward references “ ” succeed only if no matching antecedent is accessible: newvar : −→ var text: V type: var var: V > type: var var: V Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 13 / 25
  • 14. Complex Backward References • How to define pronouns like “him” that cannot attach to the subject? *John helps him. • Complex backward references “<+... −...” have one or more positive feature structure “+” and zero or more negative ones “−”. • They succeed if there is an antecedent that matches one of the positive feature structures but none of the negative ones: ref subj: Subj case: acc : −→ [ him ] <+ human: + gender: masc − id: Subj • A more complicated (but probably less useful) example: ref subj: Subj : −→ [ this ] <+ hasvar: – human: – type: relation − id: Subj type: prop Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 14 / 25
  • 15. Strong Forward References • How to define that propernames like “Bill” are always accessible? *Mary does not love a man. Mary hates him. Mary does not love Bill. Mary hates him. • Strong forward references “ ” are always accessible: np id: Id : −→ prop human: H gender: G    id: Id human: H gender: G type: prop    Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 15 / 25
  • 16. Reference Resolution: Accessibility Forward references are only accessible if they are not within a scope that has already been closed before the position of the backward reference: s ∼ vp vp ∼ np ref ... v tv destroy aux does not conj and vp pp np >n enemy det every prep from np n house det a v tv protects np n man det Every ∼ ( ( () > > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 16 / 25
  • 17. Reference Resolution: Accessibility Strong forward references are always accessible: s ∼ vp vp ∼ np ref ... v tv likes conj and vp np pp np prop Bill prep of >n friend det every v tv knows np prop Mary ∼ ( ) < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 17 / 25
  • 18. Reference Resolution: Proximity If a backward reference matches more than one forward reference then the closest one is taken: s s ... ... np ref pn it conj then s vp np n error det an v tv causes np pp np n machine det a prep of n part det a conj If > > > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 18 / 25
  • 19. Possible Extensions • Semantics (e.g. with λ-DRSs) • General feature structures (instead of flat ones) • ... Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 19 / 25
  • 20. Parsers for Codeco Two parsers with different parsing approaches exist: • Transformation into Prolog DCG • fast (1.5 ms per sentence) • no lookahead features • ideal for regression tests and parsing of large texts in batch mode • Execution in a chart parser (Earley parser) under Java • slower, but still reasonably fast (130 ms per sentence) • lookahead features • ideal for predictive editors in Java Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 20 / 25
  • 21. ACE in Codeco • Large subset of ACE in Codeco • Includes: countable nouns, proper names, intransitive and transitive verbs, adjectives, adverbs, prepositions, plurals, negation, comparative and superlative adjectives and adverbs, of-phrases, relative clauses, modality, numerical quantifiers, coordination of sentences / verb phrases / relative clauses, conditional sentences, questions, and anaphoric references (simple definite noun phrases, variables, and reflexive and irreflexive pronouns) • Excludes: Mass nouns, measurement nouns, ditransitive verbs, numbers and strings as noun phrases, sentences as verb phrase complements, Saxon genitive, possessive pronouns, noun phrase coordination, and commands • 164 grammar rules • Used in the ACE Editor: http://attempto.ifi.uzh.ch/webapps/aceeditor/ Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 21 / 25
  • 22. Evaluation of ACE Codeco Exhaustive Language Generation: • Evaluation subgrammar with 97 grammar rules • Minimal lexicon • 2’250’869 sentences with 3–10 tokens: sentence length number of sentences growth factor 3 6 4 87 14.50 5 385 4.43 6 1’959 5.09 7 11’803 6.03 8 64’691 5.48 9 342’863 5.30 10 1’829’075 5.33 3–10 2’250’869 • All are accepted by the ACE parser → ACE Codeco is a subset of ACE • None is generated more than once → ACE Codeco is unambiguous Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 22 / 25
  • 23. Evaluation of the Codeco notation and its implementations Prolog DCG representation versus Java Earley parser: • Equivalence of the Implementations: Generate the same set of sentences up to 8 tokens → The two implementations process Codeco in the same way • Performance Tests: task grammar implementation seconds/sentence generation ACE Codeco eval. subset Prolog DCG 0.00286 generation ACE Codeco eval. subset Java Earley parser 0.0730 parsing ACE Codeco eval. subset Prolog DCG 0.000360 parsing ACE Codeco eval. subset Java Earley parser 0.0276 parsing full ACE Codeco Prolog DCG 0.00146 parsing full ACE Codeco Java Earley parser 0.134 parsing full ACE APE 0.0161 Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 23 / 25
  • 24. Conclusions Codeco ... • ... fulfills our requirements for CNLs in predictive editors. • ... is suitable to describe a large subset of ACE. • ... allows for automatic tests. • ... stands for a principled and engineering focused approach to CNL. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 24 / 25
  • 25. Thank you for your attention! Questions & Discussion Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 25 / 25