SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
LATEX TikZposter
A2, Coreference Resolution System, CS6340, Fall 2016
Avani Sharma & Aishwarya Asesh
School of Computing, University of Utah
A2, Coreference Resolution System, CS6340, Fall 2016
Avani Sharma & Aishwarya Asesh
School of Computing, University of Utah
SYSTEM DESIGN AND
COMPONENTS
Fig. 1: Overall Basic System Structure
INTRODUCTION
Coreference Resolution is the task of finding, whether two or more expressions in a
text base, refer to the same object or living entity.
Coreference Resolution System is an essential step for many high order Natu-
ral Language processing implementations such as summarization from a paragraph,
building a question answering system, and cases were we need to do information ex-
traction based on available database. Such systems are a mix of concepts including
NLP, Info Retrieval, Info Extraction, ML, Knowledge Representa-
tion, Logic and Inference, Sematic Search.
For this particular project, we are more concerned in coreference resolution in cases
where domain of search space is limited to a particular document.
Emphasis/Originality
Key Observation: Variation in accuracy rates occurs due to change of order for
exact, substring, abbreviations, gender and other matching methods. The order for
matching was finalized after analysing experimental results.
SECOND THOUGHTS
Successes - Things that went well
1. The Exact, Substring & Abbreviation Match are the key strengths of our algorithm.
2. We are most proud of the generalized implementation, which makes our system
balanced irrespective of the scope of the given input file.
Regrets - Scope for improvement
1. Due to restrictions of external libraries available for python, we were not able
to get proper results during implementation process.
2. Usage of Wordnet for semantic category decisions did not yield perfect results.
3. Stanford Core NLP Toolkit appeared to give better results than NLTK toolkit.
4. We restricted ourselves to python toolkits only.
COMPONENTS
EXPLAINED
1. Word referred with its exact occurrence in the document.
2. Shortened word/phrase referred with same word or order.
3. Reference given if partial match of a string exists.
4. Reference exist if male/female pronoun found in the list.
5. Reference made if word found in same semantic class.
6. If Capitalized word matches animate/inanimate pronouns.
7. If pronoun type is same, then a reference is recorded.
8. Match if pattern follows (RE) regular expression format.
9. If two noun phrases are adjacent, then Appositives.
10. Remaining values referenced to nearest preceding value.
Fig. 2: Process for Coreference Resolution System
EXTERNAL RESOURCES
1. Thanks to our friend NLTK, Wordnet and lxml.
2. Stopwords for Head Noun/Substring match taken from
www.nltk.org/book/ch02.html
3. Male/Female corpus used from
www.cs.cmu.edu/Groups/AI/util/areas/nlp/corpora/names/0.html
4. Noun Phrase Coreference as Clustering , Cardie and Wagstaff,
EMNLP 2000
5. A Multi-Pass Sieve for Coreference Resolution , Raghunathan et al.,
EMNLP 2010
6. Deterministic coreference resolution based on entity-centric, precision-
ranked rules , Lee et al., Computational Linguistics 39(4), 2013
PERFORMANCE
Due to the generalized Implementation strategy used, our system per-
formed in a very balanced manner
Evaluation Results: TEST SET 2 = 60.46%, TEST SET 3 = 61.11%
Team Member
Contribution:
Aishwarya Asesh -
Abbreviations, Pronoun,
Date, Appositives, Ran-
dom Matching.
Avani Sharma - Ex-
act, Head Noun, Gender,
Semantic, Proper Name
Matching.
Did you know?
The lack of a proper coreference resolution system is a hinderance in devel-
opment of Super Intelligent Robots. Super AI consistently fails while
making coreference resolutions such as understanding what - it refers in sen-
tence - He saw the doughnut on the table and ate it. This is the reason AI
specialists are serious about Elon Musk’s prediction of potentially dangerous
encounter with super intelligent silicon.
Guess the It
Case I: This Sharp TV’s picture quality is so bad, our old Sony TV is much better.It is also so
expensive.
Case II: This Sharp TV’s picture quality is so bad, our old Sony TV is much better.It is also
more reliable.
The Case I (It) refers to Sharp, Case II (It) refers to Sony.
Did you know?
Paper titled ”Corpus-Based Identification of Non-Anaphoric Noun
Phrases”, by Bean, D. and our instructor Riloff, E. is one of the earliest
papers that describes a modern day approach to the problem of Coreference Resolution.
Did you know?Apple’s Siri, IBM’s
Watson and GoogleTranslate all have run-
ning projects for devel-
oping flawless Coreference
resolution system. Peo-
ple are curious about the
data driven or hybrid ap-
proach used in IBM Wat-
son, namely the most ad-
vanced Coreference Re-
solver. The development
will be made public in the
coming years.

Weitere ähnliche Inhalte

Was ist angesagt?

Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
Waqas Tariq
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
krukob9
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
butest
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaning
feiwin
 

Was ist angesagt? (20)

AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
 
Privacy Protectin Models and Defamation caused by k-anonymity
Privacy Protectin Models and Defamation caused by k-anonymityPrivacy Protectin Models and Defamation caused by k-anonymity
Privacy Protectin Models and Defamation caused by k-anonymity
 
Mca4040 analysis and design of algorithm
Mca4040  analysis and design of algorithmMca4040  analysis and design of algorithm
Mca4040 analysis and design of algorithm
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
 
Crash course in chat bots
Crash course in chat botsCrash course in chat bots
Crash course in chat bots
 
C:\Fakepath\Learning Through Conversation
C:\Fakepath\Learning Through ConversationC:\Fakepath\Learning Through Conversation
C:\Fakepath\Learning Through Conversation
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 
I1 geetha3 revathi
I1 geetha3 revathiI1 geetha3 revathi
I1 geetha3 revathi
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaning
 
Python Tutorial Questions part-1
Python Tutorial Questions part-1Python Tutorial Questions part-1
Python Tutorial Questions part-1
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence
 

Andere mochten auch

Sulma niño actividad1_mapa_c
Sulma niño actividad1_mapa_cSulma niño actividad1_mapa_c
Sulma niño actividad1_mapa_c
sulmanino
 
AI Final Report
AI Final ReportAI Final Report
AI Final Report
Xuming Gao
 
OSV-El Mundo 23 Febrero 2007
OSV-El Mundo 23 Febrero 2007OSV-El Mundo 23 Febrero 2007
OSV-El Mundo 23 Febrero 2007
Jorge Bern
 
Monica cabal actividad1 mapa_c.pdf
Monica cabal actividad1 mapa_c.pdfMonica cabal actividad1 mapa_c.pdf
Monica cabal actividad1 mapa_c.pdf
Monik291971
 
Mapa gerencia de proyectos
Mapa gerencia de proyectosMapa gerencia de proyectos
Mapa gerencia de proyectos
46451052
 
Sensation and perception
Sensation and perceptionSensation and perception
Sensation and perception
discoverccs-org
 

Andere mochten auch (18)

Sulma niño actividad1_mapa_c
Sulma niño actividad1_mapa_cSulma niño actividad1_mapa_c
Sulma niño actividad1_mapa_c
 
Project Selection1
Project Selection1Project Selection1
Project Selection1
 
AI Final Report
AI Final ReportAI Final Report
AI Final Report
 
Actividad 8 walter garcia
Actividad 8 walter garciaActividad 8 walter garcia
Actividad 8 walter garcia
 
Actividad1
Actividad1Actividad1
Actividad1
 
Rinchen_CV
Rinchen_CVRinchen_CV
Rinchen_CV
 
Ajay Chavan CV
Ajay Chavan CVAjay Chavan CV
Ajay Chavan CV
 
Facebook effect
Facebook effectFacebook effect
Facebook effect
 
OSV-El Mundo 23 Febrero 2007
OSV-El Mundo 23 Febrero 2007OSV-El Mundo 23 Febrero 2007
OSV-El Mundo 23 Febrero 2007
 
Monica cabal actividad1 mapa_c.pdf
Monica cabal actividad1 mapa_c.pdfMonica cabal actividad1 mapa_c.pdf
Monica cabal actividad1 mapa_c.pdf
 
Mapa gerencia de proyectos
Mapa gerencia de proyectosMapa gerencia de proyectos
Mapa gerencia de proyectos
 
Application activity
Application activityApplication activity
Application activity
 
rachelle resume
rachelle resumerachelle resume
rachelle resume
 
Tools for teaching&learning..
Tools for teaching&learning..Tools for teaching&learning..
Tools for teaching&learning..
 
Ejemplos de funcionalidad de los Vocabularios Controlados
Ejemplos de funcionalidad de los Vocabularios ControladosEjemplos de funcionalidad de los Vocabularios Controlados
Ejemplos de funcionalidad de los Vocabularios Controlados
 
Tendencia educativa (1)
Tendencia educativa (1)Tendencia educativa (1)
Tendencia educativa (1)
 
Análisis documental - EL TESAURO
Análisis documental - EL TESAUROAnálisis documental - EL TESAURO
Análisis documental - EL TESAURO
 
Sensation and perception
Sensation and perceptionSensation and perception
Sensation and perception
 

Ähnlich wie A^2_Poster

Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
Andi Wu
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
pathsproject
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
Algoscale Technologies Inc.
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
Deepak K
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
Andi Wu
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
Andi Wu
 
Correction of Erroneous Characters
Correction of Erroneous CharactersCorrection of Erroneous Characters
Correction of Erroneous Characters
Andi Wu
 

Ähnlich wie A^2_Poster (20)

NLP todo
NLP todoNLP todo
NLP todo
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
Towards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian LanguagesTowards Building Semantic Role Labeler for Indian Languages
Towards Building Semantic Role Labeler for Indian Languages
 
Vivo Search
Vivo SearchVivo Search
Vivo Search
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Argumentation Framework
Argumentation FrameworkArgumentation Framework
Argumentation Framework
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
Textmining
TextminingTextmining
Textmining
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Correction of Erroneous Characters
Correction of Erroneous CharactersCorrection of Erroneous Characters
Correction of Erroneous Characters
 
Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project Report
 
N01741100102
N01741100102N01741100102
N01741100102
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 

A^2_Poster

  • 1. LATEX TikZposter A2, Coreference Resolution System, CS6340, Fall 2016 Avani Sharma & Aishwarya Asesh School of Computing, University of Utah A2, Coreference Resolution System, CS6340, Fall 2016 Avani Sharma & Aishwarya Asesh School of Computing, University of Utah SYSTEM DESIGN AND COMPONENTS Fig. 1: Overall Basic System Structure INTRODUCTION Coreference Resolution is the task of finding, whether two or more expressions in a text base, refer to the same object or living entity. Coreference Resolution System is an essential step for many high order Natu- ral Language processing implementations such as summarization from a paragraph, building a question answering system, and cases were we need to do information ex- traction based on available database. Such systems are a mix of concepts including NLP, Info Retrieval, Info Extraction, ML, Knowledge Representa- tion, Logic and Inference, Sematic Search. For this particular project, we are more concerned in coreference resolution in cases where domain of search space is limited to a particular document. Emphasis/Originality Key Observation: Variation in accuracy rates occurs due to change of order for exact, substring, abbreviations, gender and other matching methods. The order for matching was finalized after analysing experimental results. SECOND THOUGHTS Successes - Things that went well 1. The Exact, Substring & Abbreviation Match are the key strengths of our algorithm. 2. We are most proud of the generalized implementation, which makes our system balanced irrespective of the scope of the given input file. Regrets - Scope for improvement 1. Due to restrictions of external libraries available for python, we were not able to get proper results during implementation process. 2. Usage of Wordnet for semantic category decisions did not yield perfect results. 3. Stanford Core NLP Toolkit appeared to give better results than NLTK toolkit. 4. We restricted ourselves to python toolkits only. COMPONENTS EXPLAINED 1. Word referred with its exact occurrence in the document. 2. Shortened word/phrase referred with same word or order. 3. Reference given if partial match of a string exists. 4. Reference exist if male/female pronoun found in the list. 5. Reference made if word found in same semantic class. 6. If Capitalized word matches animate/inanimate pronouns. 7. If pronoun type is same, then a reference is recorded. 8. Match if pattern follows (RE) regular expression format. 9. If two noun phrases are adjacent, then Appositives. 10. Remaining values referenced to nearest preceding value. Fig. 2: Process for Coreference Resolution System EXTERNAL RESOURCES 1. Thanks to our friend NLTK, Wordnet and lxml. 2. Stopwords for Head Noun/Substring match taken from www.nltk.org/book/ch02.html 3. Male/Female corpus used from www.cs.cmu.edu/Groups/AI/util/areas/nlp/corpora/names/0.html 4. Noun Phrase Coreference as Clustering , Cardie and Wagstaff, EMNLP 2000 5. A Multi-Pass Sieve for Coreference Resolution , Raghunathan et al., EMNLP 2010 6. Deterministic coreference resolution based on entity-centric, precision- ranked rules , Lee et al., Computational Linguistics 39(4), 2013 PERFORMANCE Due to the generalized Implementation strategy used, our system per- formed in a very balanced manner Evaluation Results: TEST SET 2 = 60.46%, TEST SET 3 = 61.11% Team Member Contribution: Aishwarya Asesh - Abbreviations, Pronoun, Date, Appositives, Ran- dom Matching. Avani Sharma - Ex- act, Head Noun, Gender, Semantic, Proper Name Matching. Did you know? The lack of a proper coreference resolution system is a hinderance in devel- opment of Super Intelligent Robots. Super AI consistently fails while making coreference resolutions such as understanding what - it refers in sen- tence - He saw the doughnut on the table and ate it. This is the reason AI specialists are serious about Elon Musk’s prediction of potentially dangerous encounter with super intelligent silicon. Guess the It Case I: This Sharp TV’s picture quality is so bad, our old Sony TV is much better.It is also so expensive. Case II: This Sharp TV’s picture quality is so bad, our old Sony TV is much better.It is also more reliable. The Case I (It) refers to Sharp, Case II (It) refers to Sony. Did you know? Paper titled ”Corpus-Based Identification of Non-Anaphoric Noun Phrases”, by Bean, D. and our instructor Riloff, E. is one of the earliest papers that describes a modern day approach to the problem of Coreference Resolution. Did you know?Apple’s Siri, IBM’s Watson and GoogleTranslate all have run- ning projects for devel- oping flawless Coreference resolution system. Peo- ple are curious about the data driven or hybrid ap- proach used in IBM Wat- son, namely the most ad- vanced Coreference Re- solver. The development will be made public in the coming years.