SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Classifying Non-Referential It for
Question Answer Pairs
Timothy Lee, Alex Lutz, and Jinho D. Choi

Department of Mathematics and Computer Science, Emory University
• Classifying non-referential it is an important task in
coreference resolution
• But no such attempts in this classification has been done
for question answer pairs
• The style of English used in question answer documents
differs greatly from standard documents such as the Wall
Street Journal
• Our task is to classify non-referential it for question
answer pairs and introduce the dataset
• Initially Evans classified it into seven categories which
effectively captures all types of it
• We propose a more efficient set of rules to classify it for
coreference specifically for question answer pairs
Introduction
• “Politics and Government” and “Society and Culture” had the
highest proportion of non-referential instances due to their
abstract ideas
• “Computers and Internet” and “Science and Mathematics”
had the most referential-nominal cases because these dealt
with tangible objects
• One large problem while annotating was resolving
ambiguous references of it
• If it were $1,700.00 … and let it go but for $170,000...
• Here, it can be either idiomatic, or refer to the “post dated
cheque” or the “process of receiving the post dated cheque”
• Contextual information proved vital to solve this ambiguity
• Q: Regarding IT, what are the fastest ways of getting
superich? A: …. with maintenece or service of systems or
with old programming languages.
• Since Yahoo! Answers isn’t standard english, IT could have
been capitalized for emphasis or to mean Information
Technology. With the help of contextual information, such
ambiguity could be avoided
Corpus Analytics
Experimental Setup
• Used stochastic adaptive gradient descent with mini-batch
and L1 regularization
• Tested new sets of features including brown clusters, word
embeddings, and dependency derived relationships
The following features were used to classify instances of
it:
• POS and dependency of current word
• POS and lemma for dependency head of current word
• POS and lemma for succeeding token and POS of 2nd
succeeding token
• POS of succeeding token with lemma of 2nd succeeding
token
• POS of 1st and 2nd succeeding token with lemma of 3rd
succeeding token
• m0 only used the baseline features
• m1 uses additional features based on the relative position
of it, the relative distance from preceding noun, and relative
position of sentence within document
• m2 discard the annotations for errors
• m3 merges referential-nominal and referential-other during
the training set
• m4 merges referential-nominal and referential-other during
the evaluation set
Results
• We introduced a new corpus from Yahoo! Answers which
classified instances of it into four categories
• Using a mixture of old and new features, we were able to
achieve promising results despite a challenging dataset
• In the future, we plan to increase the size of our dataset by
adding more genres from Yahoo! Answers
• We plan to use a recurrent neural network with our dataset
in the future to see if that will yield better results
Conclusion
Nominal Anaphoric
Clause Anaphoric
Pleonastic
Cataphoric
Idiomatic /
Stereotypic
Proaction
Discourse Topic
Non-Referential
1. Extra positional
2. Cleft
3. Weather/Time

Condition/Place
4. Idiomatic
Referential
Non-Referential
1. Cataphoric
2. Proaction
3. Discourse Topic
4. Clause Anaphoric
5. Pleonastic
6. Idiomatic/

Stereotypic
1. Referential - Nominal
1. Nominal Anaphoric
2. Referential - Others
1. Proaction
2. Discourse Topic
3. Clause Anaphoric
3. Non-Referential
1. Cataphoric

2. Pleonastic
3. Idiomatic/Stereotypic
4. Errors
Evans (2001) Boyd (2005) Bergsma (2008) This Work (2016)
Referential - Noun
1. Nominal Anaphoric
Referential - Nominal
1. Nominal Anaphoric
Non-Referential
1. Cataphoric
2. Proaction
3. Discourse Topic
4. Pleonastic
5. Idiomatic/

Stereotypic
Referential - Clause
1. Clause Anaphoric
Li (2009)
Genre Doc Sen Tok C1 C2 C3 C4 C⇤
1. Computers and Internet 100 918 11,586 222 31 24 3 280
2. Science and Mathematics 100 801 11,589 164 35 18 3 220
3. Yahoo! Products 100 1,027 11,803 176 36 25 3 240
4. Education and Reference 100 831 11,520 148 55 36 2 241
5. Business and Finance 100 817 11,267 139 57 37 0 233
6. Entertainment and Music 100 946 11,656 138 68 30 5 241
7. Society and Culture 100 864 11,589 120 57 47 2 226
8. Health 100 906 11,305 142 97 32 0 271
9. Politics and Government 100 876 11,482 99 81 51 0 231
Total 900 7,986 103,797 1,348 517 300 18 2,183
1
Model Development Set Evaluation Set
ACC C1 C2 C3 C4 ACC C1 C2 C3 C4
M0 72.73 82.43 35.48 57.14 0.00 74.05 82.65 49.20 71.07 0.00
M1 73.21 82.56 50.00 62.50 0.00 74.68 82.93 53.14 73.33 0.00
M2 73.08 82.56 49.41 60.00 - 75.21 83.39 51.23 73.95 -
M3 76.44 82.31 64.75 - 77.14 82.26 67.87 -
M4 76.92 83.45 61.90 - 78.21 83.39 68.32 -
Table 1: Accuracies achieved by each model (in %). ACC: overall accuracy,
C1..4: F1 scores for 4 categories in Section ??. The highest accuracies are
highlighted in bold.
Proportion
0
0.2
0.4
0.6
0.8
Genre
G1 G2 G3 G4 G5 G6 G7 G8 G9
C1 C2 C3
Type Description
C1 Anaphoric instances of it which refer to
nouns, noun phrases, or gerunds
e.g. John bought a book. It was about
space.
C2 Anaphoric instances of it which do not re-
fer to nominals such as proaction, clause
anaphoras, and discourse topic.
e.g. Always use tools correctly. If it feels
very awkward, stop.
C3 Contains the most common instances of
pleonastic it including extraposition, cleft,
atmospheric, and idiomatic
e.g. It is sunny outside.
C4 Non-pronoun forms of it including disflue-
ncies, abbreviations, and misspellings
e.g. Why did my email move it self?

Weitere ähnliche Inhalte

Was ist angesagt?

pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Word Frequency Dominance and L2 Word Recognition
Word Frequency Dominance and L2 Word RecognitionWord Frequency Dominance and L2 Word Recognition
Word Frequency Dominance and L2 Word RecognitionYu Tamura
 
Evaluation Dataset and System for Japanese Lexical Simplification
Evaluation Dataset and System for Japanese Lexical SimplificationEvaluation Dataset and System for Japanese Lexical Simplification
Evaluation Dataset and System for Japanese Lexical SimplificationTomoyuki Kajiwara
 
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Yu Tamura
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Coordinated NPs agree with singular or plural in there-constructions?: A comp...
Coordinated NPs agree with singular or plural in there-constructions?: A comp...Coordinated NPs agree with singular or plural in there-constructions?: A comp...
Coordinated NPs agree with singular or plural in there-constructions?: A comp...Yu Tamura
 
Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...
Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...
Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...Yu Tamura
 
Mechanics of style
Mechanics of styleMechanics of style
Mechanics of styles-soheili
 
Tweets Classification
Tweets ClassificationTweets Classification
Tweets ClassificationVarun Gupta
 
Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)Yuya Yoshikawa
 
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...Alexander Panchenko
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine ReadingIsabelle Augenstein
 
Grade 6 e_parcc_ig_2017
Grade 6 e_parcc_ig_2017Grade 6 e_parcc_ig_2017
Grade 6 e_parcc_ig_2017Steven Jarvis
 

Was ist angesagt? (18)

pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Word Frequency Dominance and L2 Word Recognition
Word Frequency Dominance and L2 Word RecognitionWord Frequency Dominance and L2 Word Recognition
Word Frequency Dominance and L2 Word Recognition
 
Evaluation Dataset and System for Japanese Lexical Simplification
Evaluation Dataset and System for Japanese Lexical SimplificationEvaluation Dataset and System for Japanese Lexical Simplification
Evaluation Dataset and System for Japanese Lexical Simplification
 
Lect03
Lect03Lect03
Lect03
 
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Coordinated NPs agree with singular or plural in there-constructions?: A comp...
Coordinated NPs agree with singular or plural in there-constructions?: A comp...Coordinated NPs agree with singular or plural in there-constructions?: A comp...
Coordinated NPs agree with singular or plural in there-constructions?: A comp...
 
Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...
Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...
Conceptual and Grammatical Plurality of Conjoined NPs in L2 Sentence Comprehe...
 
Semantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of TwitterSemantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of Twitter
 
Mechanics of style
Mechanics of styleMechanics of style
Mechanics of style
 
Lect06
Lect06Lect06
Lect06
 
Tweets Classification
Tweets ClassificationTweets Classification
Tweets Classification
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
Lect01
Lect01Lect01
Lect01
 
Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)
 
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
Grade 6 e_parcc_ig_2017
Grade 6 e_parcc_ig_2017Grade 6 e_parcc_ig_2017
Grade 6 e_parcc_ig_2017
 

Ähnlich wie Classifying Non-Referential It for Question Answer Pairs

VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Course Plan: Engg.Maths
Course Plan: Engg.MathsCourse Plan: Engg.Maths
Course Plan: Engg.MathsDr. N. Asokan
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Infrastructure Facility
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
(1) assignment 7.1 a identifying letters, phonemes, and grapheme
(1) assignment 7.1 a identifying letters, phonemes, and grapheme(1) assignment 7.1 a identifying letters, phonemes, and grapheme
(1) assignment 7.1 a identifying letters, phonemes, and graphemessuserfa5723
 
(1)assignment 7.1 a identifying letters, phonemes, and graphemes
(1)assignment 7.1 a identifying letters, phonemes, and graphemes (1)assignment 7.1 a identifying letters, phonemes, and graphemes
(1)assignment 7.1 a identifying letters, phonemes, and graphemes ssuserfa5723
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
FriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsFriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsJinho Choi
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Semantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringSemantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringJinho Choi
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognitionAditya Kumar Khare
 

Ähnlich wie Classifying Non-Referential It for Question Answer Pairs (20)

VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
 
Course Plan: Engg.Maths
Course Plan: Engg.MathsCourse Plan: Engg.Maths
Course Plan: Engg.Maths
 
Alleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment AnalysisAlleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment Analysis
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
(1) assignment 7.1 a identifying letters, phonemes, and grapheme
(1) assignment 7.1 a identifying letters, phonemes, and grapheme(1) assignment 7.1 a identifying letters, phonemes, and grapheme
(1) assignment 7.1 a identifying letters, phonemes, and grapheme
 
Examining reading
Examining readingExamining reading
Examining reading
 
(1)assignment 7.1 a identifying letters, phonemes, and graphemes
(1)assignment 7.1 a identifying letters, phonemes, and graphemes (1)assignment 7.1 a identifying letters, phonemes, and graphemes
(1)assignment 7.1 a identifying letters, phonemes, and graphemes
 
4th sem
4th sem4th sem
4th sem
 
2018 syllabus
2018 syllabus2018 syllabus
2018 syllabus
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
FriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsFriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show Transcripts
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Semantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringSemantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-Answering
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 

Mehr von Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

Mehr von Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Kürzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Classifying Non-Referential It for Question Answer Pairs

  • 1. Classifying Non-Referential It for Question Answer Pairs Timothy Lee, Alex Lutz, and Jinho D. Choi
 Department of Mathematics and Computer Science, Emory University • Classifying non-referential it is an important task in coreference resolution • But no such attempts in this classification has been done for question answer pairs • The style of English used in question answer documents differs greatly from standard documents such as the Wall Street Journal • Our task is to classify non-referential it for question answer pairs and introduce the dataset • Initially Evans classified it into seven categories which effectively captures all types of it • We propose a more efficient set of rules to classify it for coreference specifically for question answer pairs Introduction • “Politics and Government” and “Society and Culture” had the highest proportion of non-referential instances due to their abstract ideas • “Computers and Internet” and “Science and Mathematics” had the most referential-nominal cases because these dealt with tangible objects • One large problem while annotating was resolving ambiguous references of it • If it were $1,700.00 … and let it go but for $170,000... • Here, it can be either idiomatic, or refer to the “post dated cheque” or the “process of receiving the post dated cheque” • Contextual information proved vital to solve this ambiguity • Q: Regarding IT, what are the fastest ways of getting superich? A: …. with maintenece or service of systems or with old programming languages. • Since Yahoo! Answers isn’t standard english, IT could have been capitalized for emphasis or to mean Information Technology. With the help of contextual information, such ambiguity could be avoided Corpus Analytics Experimental Setup • Used stochastic adaptive gradient descent with mini-batch and L1 regularization • Tested new sets of features including brown clusters, word embeddings, and dependency derived relationships The following features were used to classify instances of it: • POS and dependency of current word • POS and lemma for dependency head of current word • POS and lemma for succeeding token and POS of 2nd succeeding token • POS of succeeding token with lemma of 2nd succeeding token • POS of 1st and 2nd succeeding token with lemma of 3rd succeeding token • m0 only used the baseline features • m1 uses additional features based on the relative position of it, the relative distance from preceding noun, and relative position of sentence within document • m2 discard the annotations for errors • m3 merges referential-nominal and referential-other during the training set • m4 merges referential-nominal and referential-other during the evaluation set Results • We introduced a new corpus from Yahoo! Answers which classified instances of it into four categories • Using a mixture of old and new features, we were able to achieve promising results despite a challenging dataset • In the future, we plan to increase the size of our dataset by adding more genres from Yahoo! Answers • We plan to use a recurrent neural network with our dataset in the future to see if that will yield better results Conclusion Nominal Anaphoric Clause Anaphoric Pleonastic Cataphoric Idiomatic / Stereotypic Proaction Discourse Topic Non-Referential 1. Extra positional 2. Cleft 3. Weather/Time
 Condition/Place 4. Idiomatic Referential Non-Referential 1. Cataphoric 2. Proaction 3. Discourse Topic 4. Clause Anaphoric 5. Pleonastic 6. Idiomatic/
 Stereotypic 1. Referential - Nominal 1. Nominal Anaphoric 2. Referential - Others 1. Proaction 2. Discourse Topic 3. Clause Anaphoric 3. Non-Referential 1. Cataphoric
 2. Pleonastic 3. Idiomatic/Stereotypic 4. Errors Evans (2001) Boyd (2005) Bergsma (2008) This Work (2016) Referential - Noun 1. Nominal Anaphoric Referential - Nominal 1. Nominal Anaphoric Non-Referential 1. Cataphoric 2. Proaction 3. Discourse Topic 4. Pleonastic 5. Idiomatic/
 Stereotypic Referential - Clause 1. Clause Anaphoric Li (2009) Genre Doc Sen Tok C1 C2 C3 C4 C⇤ 1. Computers and Internet 100 918 11,586 222 31 24 3 280 2. Science and Mathematics 100 801 11,589 164 35 18 3 220 3. Yahoo! Products 100 1,027 11,803 176 36 25 3 240 4. Education and Reference 100 831 11,520 148 55 36 2 241 5. Business and Finance 100 817 11,267 139 57 37 0 233 6. Entertainment and Music 100 946 11,656 138 68 30 5 241 7. Society and Culture 100 864 11,589 120 57 47 2 226 8. Health 100 906 11,305 142 97 32 0 271 9. Politics and Government 100 876 11,482 99 81 51 0 231 Total 900 7,986 103,797 1,348 517 300 18 2,183 1 Model Development Set Evaluation Set ACC C1 C2 C3 C4 ACC C1 C2 C3 C4 M0 72.73 82.43 35.48 57.14 0.00 74.05 82.65 49.20 71.07 0.00 M1 73.21 82.56 50.00 62.50 0.00 74.68 82.93 53.14 73.33 0.00 M2 73.08 82.56 49.41 60.00 - 75.21 83.39 51.23 73.95 - M3 76.44 82.31 64.75 - 77.14 82.26 67.87 - M4 76.92 83.45 61.90 - 78.21 83.39 68.32 - Table 1: Accuracies achieved by each model (in %). ACC: overall accuracy, C1..4: F1 scores for 4 categories in Section ??. The highest accuracies are highlighted in bold. Proportion 0 0.2 0.4 0.6 0.8 Genre G1 G2 G3 G4 G5 G6 G7 G8 G9 C1 C2 C3 Type Description C1 Anaphoric instances of it which refer to nouns, noun phrases, or gerunds e.g. John bought a book. It was about space. C2 Anaphoric instances of it which do not re- fer to nominals such as proaction, clause anaphoras, and discourse topic. e.g. Always use tools correctly. If it feels very awkward, stop. C3 Contains the most common instances of pleonastic it including extraposition, cleft, atmospheric, and idiomatic e.g. It is sunny outside. C4 Non-pronoun forms of it including disflue- ncies, abbreviations, and misspellings e.g. Why did my email move it self?