SlideShare a Scribd company logo
1 of 33
Download to read offline
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°
Tutorial on Coreference
Resolution
by Anirudh Jayakumar (ajayaku2),
Sili Hui(silihui2)
Prepared as an assignment for CS410: Text Information Systems in Spring 2016
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Agenda
We will mainly address 3 problems
1. What is the Coreference Resolution
problem?
2. What are the existing approaches to it?
3. What are the future directions?
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°What is the Coreference Resolution problem?
Suppose you are given a sample text,
I did not voted for Donald
Trump because I think he isā€¦
How can program tell he refers to Donald Trump?
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Definition
ā€¢ Coreference resolution is the task of finding
all noun phrases (NPs) that refer to the
same real-world entity.
ā€¢ In previous example, he == Donald Trump
ā€¢ One of the classical NLP problems
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Why do we care?
ā€¢ Suppose your boss ask you to general opinion of Donald Trump from a corpus
of collected text data, how to finish your job?
ā€¢ I did not voted for Donald Trump because I
think he isā€¦
ā€¢ What goes after ā€œhe isā€¦ā€ provides information about the sentiment of this
person towards ā€œheā€, but what does he refers to?
ā€¢ If we know ā€œheā€ refers to ā€œDonald Trumpā€ we know more about this person!
(that either likes or, most likely, dislikes Donald Trump)
ā€“ Small dataset can be labeled by hand (time consuming but ok-ish)
ā€“ What if we have GBs of text data????
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Why do we care?
ā€¢ This is where coreference resolution come into play
ā€“ We know more about what entity are associated with
what words
ā€¢ There are many potential real world use cases:
ā€“ information extraction
ā€“ information retrieval
ā€“ question answering
ā€“ machine translation
ā€“ text summarization
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°A brief history of mainstreamā€¦
ā€¢ 1970s - 1990s
ā€“ Mostly linguistic approaches
ā€“ Parse tree, semantic analysis, etc.
ā€¢ 1990s - 2000s
ā€“ More machine Learning approaches to this problem
ā€“ Mostly supervised machine learning approaches
ā€¢ Later 2000s - now
ā€“ More unsupervised machine learning approaches came out
ā€“ Other models (ILP, Markov Logic Nets, etc) were proposed
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°How to evaluate?
ā€¢ How to tell my approach is better than yours?
ā€¢ Many well-established datasets and benchmarks
ā€“ ACE
ā€“ MUC
ā€¢ Evaluate the performance on these datasets on F1 score,
precision, etc.
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Taxonomy of ideas
ā€¢ In this tutorial, we will focus on two approaches:
ā€“ Linguistic approaches
ā€“ Machine learning approaches
ā€¢ Supervised approaches
ā€¢ Unsupervised approaches
ā€¢ Other approaches will be briefly addressed towards the
end
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Linguistic Approach
ā€¢ Appear in 1980s
ā€¢ One of very first approaches to the problem
ā€¢ Take advantage of linguistic of the text
ā€“ parse tree
ā€“ syntactic constraints
ā€“ semantic analysis
ā€¢ Requires domain specific knowledge
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Linguistic Approach
ā€¢ Centering approach to pronouns was proposed by : S.E. Brennan, M.W.
Friedman, and C.J. Pollard in 1987
ā€¢ Centering theory was proposed in order to model the ā€œrelationships among
ā€“ a) focus of attention
ā€“ b) choice of referring expression
ā€“ c) perceived coherence of utterances
ā€¢ An entity means an object, that could be the targets of a referring expression.
ā€¢ An utterance is used to describe the basic unit, which could be a sentence or a
clause or a phrase
ā€¢ Each utterance is assigned a set of forward-looking centers, Cf (U), and a
single backward-looking center, Cb(U)
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Linguistic Approach
ā€¢ The algorithm consists of four main steps
ā€“ Construct all possible < Cb, Cf > pairs, by taking the
cross-product of Cb, Cf lists
ā€“ Filter these pairs by applying certain constraints
ā€“ Classify each pair based on the transition type, and
rank the pairs
ā€“ Choose the best ranked pair
ā€¢ The goal of the algorithm design was conceptual clarity
rather than efficiency
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Machine Learning Approaches
ā€¢ More ML approaches appear since 1990s
ā€¢ We consider two classical categorizations of ML
approaches:
ā€“ Supervised learning
ā€¢ Take advantage of labeled data (train) and predict
on unlabeled data
ā€“ Unsupervised learning
ā€¢ Feed in unlabeled data the algorithm will do the right
thing for you(hopefully)
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Learning
ā€¢ Supervised learning is the machine learning task of
inferring a function from labeled training data.
ā€¢ The training data consist of a set of training examples.
ā€¢ A supervised learning algorithm analyzes the training data
and produces an inferred function, which can be used for
mapping new examples
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 1
ā€¢ Evaluating automated and manual acquisition of anaphora resolution
strategies - Chinatsu Aone and Scott William Bennett
ā€¢ This paper describes an approach to build an automatically trainable
anaphora resolution system
ā€¢ Uses discourse information tagged Japanese newspaper articles as
training examples for a C4.5 decision tree algorithm
ā€¢ The training features include lexical (e.g. category), syntactic (e.g.
grammatical role), semantic (e.g. semantic class) and positional (e.g.
distance between anaphor and antecedent)
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 1 cont.
ā€¢ The method uses three training techniques using different parameters
ā€“ The anaphoric chain parameter is used in selecting positive and negative
training examples
ā€“ With anaphoric type identification parameter,
ā€¢ answer "no" when a pair of an anaphor and a possible antecedent
are not co-referential,
ā€¢ answer ā€œyesā€ the anaphoric type when they are co-referential
ā€“ The confidence factor parameter (0-100) is used in pruning decision
trees. With a higher confidence factor, less pruning of the tree
ā€¢ Using anaphoric chains without anaphoric type identification helps improve
the learning algorithm
ā€¢ With 100% confidence factor, the tree overfits the examples that lead to
spurious uses of features
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 2
ā€¢ A Machine Learning Approach to Coreference Resolution of Noun Phrases: Wee
Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim
ā€¢ Learning approach in unrestricted text by learning from a small-annotated corpus
ā€¢ All markables in the training set are determined by pipeline of NLP modules
consists of tokenization, sentence segmentation, morphological processing, part-
of-speech tagging, noun phrase identification, named entity recognition, nested
noun phrase extraction and semantic class determination
ā€¢ The feature vector consists of 12 features derived based on two extracted
markables, j, and i where i is the potential antecedent and j is the anaphor
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 2 cont.
ā€¢ The learning algorithm used in our coreference engine is C5, which is an
updated version of C4.5
ā€¢ For each j, the algorithm considers every markable i before j as a potential
antecedent. For each pair i and j, a feature vector is generated and given to
the decision tree classifier
ā€¢ The coreference engine achieves a recall of 58.6% and a precision of 67.3%,
yielding a balanced F-measure of 62.6% for MUC-6.
ā€¢ For MUC-7, the recall is 56.1%, the precision is 65.5%, and the balanced F-
measure is 60.4%.
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 3
ā€¢ Conditional models of identity uncertainty with application to proper noun
coreference: A. McCallum and B. Wellner
ā€¢ This paper introduces several discriminative, conditional-probability models
for coreferecne analysis.
ā€¢ No assumption that pairwise coreference decisions should be made
independently from each other
ā€¢ Model 1:
ā€“ Very general discriminative model where the dependency structure is
unrestricted
ā€“ The model considers the coreference decisions and the attributes of
entities as random variables, conditioned on the entity mentions
ā€“ The feature functions depend on the coreference decisions, y, the set of
attributes, a as well as the mentions of the entities, x.
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 3 cont.
ā€¢ Model 2: Authors remove the dependence of the coreference variable, y, by
replacing it with a binary valued random variable, Yij for every pair of
mentions
ā€¢ Model 3: The third model that they introduce does not include attributes as a
random variable, and is otherwise similar to the second model
ā€¢ The model performs a little better than the approach by Ng and Cardie
(2002)
ā€¢ the F1 results on NP coreference on the MUC-6 dataset is only about 73%.
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 4
ā€¢ Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge:
Xiaofeng Yang, Jian Su and Chew Lim Tan
ā€¢ a kernel-based method that can automatically mine the syntactic information
from the parse trees for pronoun resolution
ā€¢ for each pronominal anaphor encountered, a positive instance is created by
paring the anaphor and its closest antecedent
ā€¢ a set of negative instances is formed by paring the anaphor with each of the
non-coreferential candidates
ā€¢ The learning algorithm used in this work is SVM to allow the use of kernels to
incorporate the structured feature
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 4 cont.
ā€¢ The study examines three possible structured features
ā€¢ Min-Expansion records the minimal structure covering both the pronoun and
the candidate in the parse tree
ā€¢ Simple-Expansion captures the syntactic properties of the candidate or the
pronoun
ā€¢ Full-Expansion focuses on the whole tree structure between the candidate
and pronoun
ā€¢ Hobbsā€™ algorithm obtains 66%-72% success rates on the three domains
while the baseline system obtains 74%-77% success rates
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised learning
ā€¢ Let it run on top of your data, no supervision of wrong or
right. Most are iterative methods.
ā€¢ Generally preferred over supervised learning
ā€“ Does not generally need labeled data
ā€“ Does not generally need prior knowledge
ā€“ Does not subject to dataset limitation
ā€“ Often scales better than supervised approaches
ā€¢ Yet, it came along wayā€¦
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 1
ā€¢ First notable Unsupervised learning algorithms came out in
2007 by Haghighi, Aria, and Dan Klein
ā€“ it presents a generative model
ā€“ the objective function is to maximize posterior
probability of entities given collection of variables of
current mention
ā€“ it also discussed adding features to the collection like
gender, plurality and entity activation (how often this
entity is mentioned)
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 1 cont.
ā€¢ Resulting a over-complicated generative
model
ā€¢ Achieving 72.5 F1 on MUC-6
ā€¢ Set a good standard for later algorithms
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 2
ā€¢ Inspired by the previous papers
ā€¢ Another unsupervised method is proposed by Vincent Ng
in 2008
ā€“ Use a new but simpler generative model
ā€“ Consider all pairs of mentions in a document
ā€“ Probability of pair of mentions taking into account 7
context features (gender, plurality, etc)
ā€“ Use classical EM algorithm to iterative update those
parameters
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 2 cont.
ā€¢ Greatly simplified previous generative
model
ā€¢ Can be applied to document level instead of
collection level
ā€¢ Beat the performance of previous model on
AEC dataset (by small margin)
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 3
ā€¢ Previous methods emphasize on generative model
ā€¢ Why not use Markov Net?
ā€¢ Proposed by Hoifung Poon and Pedro Domingos
ā€“ formulate the problem in Markov Logic Net(MLN)
ā€“ define rules and clauses and gradually build from a
base model by adding rules
ā€“ leverage other sampling algorithms in training and
inference step
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 3 cont.
ā€¢ Pioneer work of leveraging Markov Logic into coreference
resolution problem
ā€¢ Beat the generative model proposed by Haghighi and Klein
by large margin on MUC-6 dataset.
ā€¢ Authors are pioneers in Markov Logic Network. This paper
may be just a ā€œshowcaseā€ of their work and what MLN can
do?
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Related Work
ā€¢ There are many other related work:
ā€“ Formulation of equivalent ILP problem
ā€¢ Pascal Denis and Jason Bladridge
ā€“ Enforce transitivity property on ILP
ā€¢ Finkel, Jenny Rose, and Christopher D. Manning
ā€“ Latent structure prediction approach
ā€¢ Kai-wei Chang, Rajhans Samdani, Dan Roth
(professor @ UIUC)
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Future Direction
ā€¢ After studies, we think here are some major points of future directions
ā€¢ More Standardized Updated Benchmarks
ā€“ Coreference resolution research should use a more standard set of the
standard corpora; in this way, results will be comparable.
ā€¢ First-order and cluster based features will play an important role
ā€“ The use of them has given the field a much-needed push, and will likely
remain a staple of future state-of-the-art
ā€¢ Combination of linguistic ideas into modern models
ā€“ Combining the strengths of the two themes, using more of the richer
machine learning models with the linguistic ideas.
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Conclusion
ā€¢ Thanks for going through our tutorial!
ā€¢ Major take-aways:
ā€“ Coreference Resolution remains an active research area
ā€“ Modern research tends to diverge from pure linguistic analysis
ā€“ Generally, the performance(evaluated on well-established dataset) for
state-of-art algorithms is still not optimal for industrial usages that
requires precise labels
ā€“ For general purpose, modern unsupervised learning approach can
achieve decent accuracy compared to supervised learning approach
ā€“ Future machine learning approaches will leverage more linguistic
knowledge (features) into their model
ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Reference
ā€¢ Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns." Proceedings of the 25th annual meeting on
Association for Computational Linguistics. Association for Computational Linguistics, 1987.
ā€¢ Aone, Chinatsu, and Scott William Bennett. "Evaluating automated and manual acquisition of anaphora resolution strategies." Proceedings of the 33rd
annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1995.
ā€¢ Ge, Niyu, John Hale, and Eugene Charniak. "A statistical approach to anaphora resolution." Proceedings of the sixth workshop on very large corpora.
Vol. 71. 1998.
ā€¢ Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases."
Computational linguistics 27.4 (2001): 521-544.
ā€¢ McCallum, Andrew, and Ben Wellner. "Toward conditional models of identity uncertainty with application to proper noun coreference." (2003).
ā€¢ Yang, Xiaofeng, Jian Su, and Chew Lim Tan. "Kernel-based pronoun resolution with structured syntactic knowledge." Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for
Computational Linguistics, 2006.
ā€¢ Haghighi, Aria, and Dan Klein. "Unsupervised coreference resolution in a nonparametric bayesian model." Annual meeting-Association for
Computational Linguistics. Vol. 45. No. 1. 2007.
ā€¢ Ng, Vincent. "Unsupervised models for coreference resolution." Proceedings of the Conference on Empirical Methods in Natural Language
Processing. Association for Computational Linguistics, 2008.
ā€¢ Poon, Hoifung, and Pedro Domingos. "Joint unsupervised coreference resolution with Markov logic." Proceedings of the conference on empirical
methods in natural language processing. Association for Computational Linguistics, 2008.
ā€¢ Finkel, Jenny Rose, and Christopher D. Manning. "Enforcing transitivity in coreference resolution." Proceedings of the 46th Annual Meeting of the
Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008.
ā€¢ Chang, Kai-Wei, Rajhans Samdani, and Dan Roth. "A constrained latent variable model for coreference resolution." (2013).
ā€¢ Denis, Pascal, and Jason Baldridge. "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming."
HLT-NAACL. 2007.

More Related Content

What's hot

Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
Ā 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
Ā 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
Ā 
Pumping lemma Theory Of Automata
Pumping lemma Theory Of AutomataPumping lemma Theory Of Automata
Pumping lemma Theory Of Automatahafizhamza0322
Ā 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stackingankit_ppt
Ā 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
Ā 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
Ā 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
Ā 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
Ā 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentationAhmed rebai
Ā 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
Ā 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design Dr. C.V. Suresh Babu
Ā 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
Ā 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
Ā 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
Ā 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
Ā 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
Ā 

What's hot (20)

Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Ā 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Ā 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Ā 
Pumping lemma Theory Of Automata
Pumping lemma Theory Of AutomataPumping lemma Theory Of Automata
Pumping lemma Theory Of Automata
Ā 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
Ā 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Ā 
Text Summarization
Text SummarizationText Summarization
Text Summarization
Ā 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
Ā 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Ā 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Ā 
Text Classification
Text ClassificationText Classification
Text Classification
Ā 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
Ā 
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ā 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
Ā 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
Ā 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Ā 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Ā 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
Ā 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Ā 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Ā 

Similar to Tutorial on Coreference Resolution

ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptxGoodReads1
Ā 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
Ā 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
Ā 
Unit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxUnit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxjawad184956
Ā 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
Ā 
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all studentstalldesalegn
Ā 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxJishanAhmed24
Ā 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
Ā 
Keyword_extraction.pptx
Keyword_extraction.pptxKeyword_extraction.pptx
Keyword_extraction.pptxBiswarupDas18
Ā 
Lec 4 expert systems
Lec 4  expert systemsLec 4  expert systems
Lec 4 expert systemsEyob Sisay
Ā 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
Ā 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
Ā 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahulKirtoniya
Ā 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
Ā 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
Ā 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxshamsul2010
Ā 
Machine Learning Terminologies
Machine Learning TerminologiesMachine Learning Terminologies
Machine Learning TerminologiesAjitesh Kumar
Ā 

Similar to Tutorial on Coreference Resolution (20)

ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptx
Ā 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Ā 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
Ā 
Unit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxUnit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptx
Ā 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
Ā 
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all students
Ā 
ML
MLML
ML
Ā 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
Ā 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
Ā 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
Ā 
4.1.pptx
4.1.pptx4.1.pptx
4.1.pptx
Ā 
Keyword_extraction.pptx
Keyword_extraction.pptxKeyword_extraction.pptx
Keyword_extraction.pptx
Ā 
Lec 4 expert systems
Lec 4  expert systemsLec 4  expert systems
Lec 4 expert systems
Ā 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
Ā 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
Ā 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Ā 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Ā 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
Ā 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
Ā 
Machine Learning Terminologies
Machine Learning TerminologiesMachine Learning Terminologies
Machine Learning Terminologies
Ā 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...gurkirankumar98700
Ā 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
Ā 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
Ā 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Ā 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Ā 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Ā 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Ā 

Tutorial on Coreference Resolution

  • 1. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ° Tutorial on Coreference Resolution by Anirudh Jayakumar (ajayaku2), Sili Hui(silihui2) Prepared as an assignment for CS410: Text Information Systems in Spring 2016
  • 2. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Agenda We will mainly address 3 problems 1. What is the Coreference Resolution problem? 2. What are the existing approaches to it? 3. What are the future directions?
  • 3. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°What is the Coreference Resolution problem? Suppose you are given a sample text, I did not voted for Donald Trump because I think he isā€¦ How can program tell he refers to Donald Trump?
  • 4. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Definition ā€¢ Coreference resolution is the task of finding all noun phrases (NPs) that refer to the same real-world entity. ā€¢ In previous example, he == Donald Trump ā€¢ One of the classical NLP problems
  • 5. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Why do we care? ā€¢ Suppose your boss ask you to general opinion of Donald Trump from a corpus of collected text data, how to finish your job? ā€¢ I did not voted for Donald Trump because I think he isā€¦ ā€¢ What goes after ā€œhe isā€¦ā€ provides information about the sentiment of this person towards ā€œheā€, but what does he refers to? ā€¢ If we know ā€œheā€ refers to ā€œDonald Trumpā€ we know more about this person! (that either likes or, most likely, dislikes Donald Trump) ā€“ Small dataset can be labeled by hand (time consuming but ok-ish) ā€“ What if we have GBs of text data????
  • 6. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Why do we care? ā€¢ This is where coreference resolution come into play ā€“ We know more about what entity are associated with what words ā€¢ There are many potential real world use cases: ā€“ information extraction ā€“ information retrieval ā€“ question answering ā€“ machine translation ā€“ text summarization
  • 7. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°A brief history of mainstreamā€¦ ā€¢ 1970s - 1990s ā€“ Mostly linguistic approaches ā€“ Parse tree, semantic analysis, etc. ā€¢ 1990s - 2000s ā€“ More machine Learning approaches to this problem ā€“ Mostly supervised machine learning approaches ā€¢ Later 2000s - now ā€“ More unsupervised machine learning approaches came out ā€“ Other models (ILP, Markov Logic Nets, etc) were proposed
  • 8. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°How to evaluate? ā€¢ How to tell my approach is better than yours? ā€¢ Many well-established datasets and benchmarks ā€“ ACE ā€“ MUC ā€¢ Evaluate the performance on these datasets on F1 score, precision, etc.
  • 9. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Taxonomy of ideas ā€¢ In this tutorial, we will focus on two approaches: ā€“ Linguistic approaches ā€“ Machine learning approaches ā€¢ Supervised approaches ā€¢ Unsupervised approaches ā€¢ Other approaches will be briefly addressed towards the end
  • 10. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Linguistic Approach ā€¢ Appear in 1980s ā€¢ One of very first approaches to the problem ā€¢ Take advantage of linguistic of the text ā€“ parse tree ā€“ syntactic constraints ā€“ semantic analysis ā€¢ Requires domain specific knowledge
  • 11. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Linguistic Approach ā€¢ Centering approach to pronouns was proposed by : S.E. Brennan, M.W. Friedman, and C.J. Pollard in 1987 ā€¢ Centering theory was proposed in order to model the ā€œrelationships among ā€“ a) focus of attention ā€“ b) choice of referring expression ā€“ c) perceived coherence of utterances ā€¢ An entity means an object, that could be the targets of a referring expression. ā€¢ An utterance is used to describe the basic unit, which could be a sentence or a clause or a phrase ā€¢ Each utterance is assigned a set of forward-looking centers, Cf (U), and a single backward-looking center, Cb(U)
  • 12. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Linguistic Approach ā€¢ The algorithm consists of four main steps ā€“ Construct all possible < Cb, Cf > pairs, by taking the cross-product of Cb, Cf lists ā€“ Filter these pairs by applying certain constraints ā€“ Classify each pair based on the transition type, and rank the pairs ā€“ Choose the best ranked pair ā€¢ The goal of the algorithm design was conceptual clarity rather than efficiency
  • 13. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Machine Learning Approaches ā€¢ More ML approaches appear since 1990s ā€¢ We consider two classical categorizations of ML approaches: ā€“ Supervised learning ā€¢ Take advantage of labeled data (train) and predict on unlabeled data ā€“ Unsupervised learning ā€¢ Feed in unlabeled data the algorithm will do the right thing for you(hopefully)
  • 14. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Learning ā€¢ Supervised learning is the machine learning task of inferring a function from labeled training data. ā€¢ The training data consist of a set of training examples. ā€¢ A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples
  • 15. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 1 ā€¢ Evaluating automated and manual acquisition of anaphora resolution strategies - Chinatsu Aone and Scott William Bennett ā€¢ This paper describes an approach to build an automatically trainable anaphora resolution system ā€¢ Uses discourse information tagged Japanese newspaper articles as training examples for a C4.5 decision tree algorithm ā€¢ The training features include lexical (e.g. category), syntactic (e.g. grammatical role), semantic (e.g. semantic class) and positional (e.g. distance between anaphor and antecedent)
  • 16. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 1 cont. ā€¢ The method uses three training techniques using different parameters ā€“ The anaphoric chain parameter is used in selecting positive and negative training examples ā€“ With anaphoric type identification parameter, ā€¢ answer "no" when a pair of an anaphor and a possible antecedent are not co-referential, ā€¢ answer ā€œyesā€ the anaphoric type when they are co-referential ā€“ The confidence factor parameter (0-100) is used in pruning decision trees. With a higher confidence factor, less pruning of the tree ā€¢ Using anaphoric chains without anaphoric type identification helps improve the learning algorithm ā€¢ With 100% confidence factor, the tree overfits the examples that lead to spurious uses of features
  • 17. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 2 ā€¢ A Machine Learning Approach to Coreference Resolution of Noun Phrases: Wee Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim ā€¢ Learning approach in unrestricted text by learning from a small-annotated corpus ā€¢ All markables in the training set are determined by pipeline of NLP modules consists of tokenization, sentence segmentation, morphological processing, part- of-speech tagging, noun phrase identification, named entity recognition, nested noun phrase extraction and semantic class determination ā€¢ The feature vector consists of 12 features derived based on two extracted markables, j, and i where i is the potential antecedent and j is the anaphor
  • 18. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 2 cont. ā€¢ The learning algorithm used in our coreference engine is C5, which is an updated version of C4.5 ā€¢ For each j, the algorithm considers every markable i before j as a potential antecedent. For each pair i and j, a feature vector is generated and given to the decision tree classifier ā€¢ The coreference engine achieves a recall of 58.6% and a precision of 67.3%, yielding a balanced F-measure of 62.6% for MUC-6. ā€¢ For MUC-7, the recall is 56.1%, the precision is 65.5%, and the balanced F- measure is 60.4%.
  • 19. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 3 ā€¢ Conditional models of identity uncertainty with application to proper noun coreference: A. McCallum and B. Wellner ā€¢ This paper introduces several discriminative, conditional-probability models for coreferecne analysis. ā€¢ No assumption that pairwise coreference decisions should be made independently from each other ā€¢ Model 1: ā€“ Very general discriminative model where the dependency structure is unrestricted ā€“ The model considers the coreference decisions and the attributes of entities as random variables, conditioned on the entity mentions ā€“ The feature functions depend on the coreference decisions, y, the set of attributes, a as well as the mentions of the entities, x.
  • 20. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 3 cont. ā€¢ Model 2: Authors remove the dependence of the coreference variable, y, by replacing it with a binary valued random variable, Yij for every pair of mentions ā€¢ Model 3: The third model that they introduce does not include attributes as a random variable, and is otherwise similar to the second model ā€¢ The model performs a little better than the approach by Ng and Cardie (2002) ā€¢ the F1 results on NP coreference on the MUC-6 dataset is only about 73%.
  • 21. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 4 ā€¢ Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge: Xiaofeng Yang, Jian Su and Chew Lim Tan ā€¢ a kernel-based method that can automatically mine the syntactic information from the parse trees for pronoun resolution ā€¢ for each pronominal anaphor encountered, a positive instance is created by paring the anaphor and its closest antecedent ā€¢ a set of negative instances is formed by paring the anaphor with each of the non-coreferential candidates ā€¢ The learning algorithm used in this work is SVM to allow the use of kernels to incorporate the structured feature
  • 22. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Supervised Paper 4 cont. ā€¢ The study examines three possible structured features ā€¢ Min-Expansion records the minimal structure covering both the pronoun and the candidate in the parse tree ā€¢ Simple-Expansion captures the syntactic properties of the candidate or the pronoun ā€¢ Full-Expansion focuses on the whole tree structure between the candidate and pronoun ā€¢ Hobbsā€™ algorithm obtains 66%-72% success rates on the three domains while the baseline system obtains 74%-77% success rates
  • 23. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised learning ā€¢ Let it run on top of your data, no supervision of wrong or right. Most are iterative methods. ā€¢ Generally preferred over supervised learning ā€“ Does not generally need labeled data ā€“ Does not generally need prior knowledge ā€“ Does not subject to dataset limitation ā€“ Often scales better than supervised approaches ā€¢ Yet, it came along wayā€¦
  • 24. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 1 ā€¢ First notable Unsupervised learning algorithms came out in 2007 by Haghighi, Aria, and Dan Klein ā€“ it presents a generative model ā€“ the objective function is to maximize posterior probability of entities given collection of variables of current mention ā€“ it also discussed adding features to the collection like gender, plurality and entity activation (how often this entity is mentioned)
  • 25. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 1 cont. ā€¢ Resulting a over-complicated generative model ā€¢ Achieving 72.5 F1 on MUC-6 ā€¢ Set a good standard for later algorithms
  • 26. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 2 ā€¢ Inspired by the previous papers ā€¢ Another unsupervised method is proposed by Vincent Ng in 2008 ā€“ Use a new but simpler generative model ā€“ Consider all pairs of mentions in a document ā€“ Probability of pair of mentions taking into account 7 context features (gender, plurality, etc) ā€“ Use classical EM algorithm to iterative update those parameters
  • 27. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 2 cont. ā€¢ Greatly simplified previous generative model ā€¢ Can be applied to document level instead of collection level ā€¢ Beat the performance of previous model on AEC dataset (by small margin)
  • 28. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 3 ā€¢ Previous methods emphasize on generative model ā€¢ Why not use Markov Net? ā€¢ Proposed by Hoifung Poon and Pedro Domingos ā€“ formulate the problem in Markov Logic Net(MLN) ā€“ define rules and clauses and gradually build from a base model by adding rules ā€“ leverage other sampling algorithms in training and inference step
  • 29. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Unsupervised Paper 3 cont. ā€¢ Pioneer work of leveraging Markov Logic into coreference resolution problem ā€¢ Beat the generative model proposed by Haghighi and Klein by large margin on MUC-6 dataset. ā€¢ Authors are pioneers in Markov Logic Network. This paper may be just a ā€œshowcaseā€ of their work and what MLN can do?
  • 30. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Related Work ā€¢ There are many other related work: ā€“ Formulation of equivalent ILP problem ā€¢ Pascal Denis and Jason Bladridge ā€“ Enforce transitivity property on ILP ā€¢ Finkel, Jenny Rose, and Christopher D. Manning ā€“ Latent structure prediction approach ā€¢ Kai-wei Chang, Rajhans Samdani, Dan Roth (professor @ UIUC)
  • 31. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Future Direction ā€¢ After studies, we think here are some major points of future directions ā€¢ More Standardized Updated Benchmarks ā€“ Coreference resolution research should use a more standard set of the standard corpora; in this way, results will be comparable. ā€¢ First-order and cluster based features will play an important role ā€“ The use of them has given the field a much-needed push, and will likely remain a staple of future state-of-the-art ā€¢ Combination of linguistic ideas into modern models ā€“ Combining the strengths of the two themes, using more of the richer machine learning models with the linguistic ideas.
  • 32. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Conclusion ā€¢ Thanks for going through our tutorial! ā€¢ Major take-aways: ā€“ Coreference Resolution remains an active research area ā€“ Modern research tends to diverge from pure linguistic analysis ā€“ Generally, the performance(evaluated on well-established dataset) for state-of-art algorithms is still not optimal for industrial usages that requires precise labels ā€“ For general purpose, modern unsupervised learning approach can achieve decent accuracy compared to supervised learning approach ā€“ Future machine learning approaches will leverage more linguistic knowledge (features) into their model
  • 33. ŠžŠ±Ń€Š°Š·ŠµŃ† Š·Š°Š³Š¾Š»Š¾Š²ŠŗŠ°Reference ā€¢ Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns." Proceedings of the 25th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1987. ā€¢ Aone, Chinatsu, and Scott William Bennett. "Evaluating automated and manual acquisition of anaphora resolution strategies." Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1995. ā€¢ Ge, Niyu, John Hale, and Eugene Charniak. "A statistical approach to anaphora resolution." Proceedings of the sixth workshop on very large corpora. Vol. 71. 1998. ā€¢ Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases." Computational linguistics 27.4 (2001): 521-544. ā€¢ McCallum, Andrew, and Ben Wellner. "Toward conditional models of identity uncertainty with application to proper noun coreference." (2003). ā€¢ Yang, Xiaofeng, Jian Su, and Chew Lim Tan. "Kernel-based pronoun resolution with structured syntactic knowledge." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. ā€¢ Haghighi, Aria, and Dan Klein. "Unsupervised coreference resolution in a nonparametric bayesian model." Annual meeting-Association for Computational Linguistics. Vol. 45. No. 1. 2007. ā€¢ Ng, Vincent. "Unsupervised models for coreference resolution." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008. ā€¢ Poon, Hoifung, and Pedro Domingos. "Joint unsupervised coreference resolution with Markov logic." Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2008. ā€¢ Finkel, Jenny Rose, and Christopher D. Manning. "Enforcing transitivity in coreference resolution." Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008. ā€¢ Chang, Kai-Wei, Rajhans Samdani, and Dan Roth. "A constrained latent variable model for coreference resolution." (2013). ā€¢ Denis, Pascal, and Jason Baldridge. "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming." HLT-NAACL. 2007.