SlideShare a Scribd company logo
1 of 37
1
Entity Extraction and Typing by
Relational Graph Construction
and Propagation
JIAWEI HAN
COMPUTER SCIENCE
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
JULY 3, 2015
2
3
Recognizing Typed Entities
The best BBQ I’ve tasted in
Phoenix! I had the pulled pork
sandwich with coleslaw and
baked beans for lunch. ... The
owner is very nice. …
The best BBQ:Food I’ve tasted in
Phoenix:LOC ! I had the [pulled pork
sandwich]:Food with coleslaw:Food
and [baked beans]:Food for lunch. …
The owner:JOB_TITLE is very nice. …
FOOD
LOCATION
JOB_TITLE
EVENT
ORGANIZATION
…
Target Types
Identifying token span as entity mentions in documents and labeling their types
Enabling structured analysis of
unstructured text corpus
Plain text Text with typed entities
FOOD LOCATION EVENT
4
 Extracting and linking entities can be used in a variety of ways:
 serve as primitives for information extraction and knowledge base population
 assist question answering,…
 Traditional named entity recognition systems are designed for major types (e.g.,
PER, LOC, ORG) and general domains (e.g., news)
 Require additional steps to adapt to new domains/types
 Expensive human labor on annotation
 500 documents for entity extraction; 20,000 queries for entity linking
 Unsatisfying agreement due to various granularity levels and scopes of types
 Entities obtained by entity linking techniques have limited coverage and freshness
 >50% unlinkable entity mentions in Web corpus [Lin et al., EMNLP’12]
 >90% in our experiment corpora: tweets, Yelp reviews, …
Traditional NLP Approach: Data Annotation
5
 Typical Entity Extraction Features (Li et al., 2012)
 N-gram: Unigram, bigram and trigram token sequences in the
context window
 Part-of-Speech: POS tags of the context words
 Gazetteers: person names, organizations, countries and cities, titles,
idioms, etc.
 Word clusters: word clusters / embeddings
 Case and Shape: Capitalization and morphology analysis based
features
 Chunking: NP and VP Chunking tags
 Global feature: Sentence level and document level structure/position
features
Traditional NLP Approach: Feature Engineering
6
Traditional NLP Approach: Feature Engineering
Mention/Concept Attribute Description
Name Spelling match Exact string match, acronym match, alias match, string matching…
KB link mining Name pairs mined from KB text redirect and disambiguation pages
Name
Gazetteer
Organization and geo-political entity abbreviation gazetteers
Document
surface
Lexical Words in KB facts, KB text, mention name, mention text.
Tf.idf of words and ngrams
Position Mention name appears early in KB text
Genre Genre of the mention text (newswire, blog, …)
Local Context Lexical and part-of-speech tags of context words
Entity
Context
Type Mention concept type, subtype
Relation/Event Concepts co-occurred, attributes/relations/events with mention
Coreference Co-reference links between the source document and the KB text
Profiling Slot fills of the mention, concept attributes stored in KB infobox
Concept Ontology extracted from KB text
Topic Topics (identity and lexical similarity) for the mention text and KB text
KB Link Mining Attributes extracted from hyperlink graphs of the KB text
Popularity Web Top KB text ranked by search engine and its length
Frequency Frequency in KB texts
 Typical Entity Linking Features (Ji et al., 2011)
7
A New Data Mining Solution
 Acquire labels for a small amount of instances
 Construct a relational graph to connect labeled instances and unlabeled instances
 Construct edges based on coarse-grained data-driven statistics instead of fine-
grained linguistic similarity
 mention correlation
 text co-occurrence
 semantic relatedness based on knowledge graph embeddings
 social networks
 Label propagation across the graph
8
Case Study 1: Entity Extraction
 Goal: recognizing entity mentions of target types with minimal/no human
supervision and with no requirement that entities can be found in a KB.
 Two kinds of efforts towards this goal:
 Weak supervision: relies on manually selected seed entities in applying
pattern-based bootstrapping methods or label propagation methods to
identify more entities
 Both assume seeds are unambiguous and sufficiently frequent  requires
careful seed selection by human
 Distant supervision: leverages entity information in KBs to reduce human
supervision (cont.)
9
Typical Workflow of Distant Supervision
 Detect entity
mentions from text
 Map candidate
mentions to KB
entities of target types
 Use confidently
mapped {mention,
type} to infer types of
remaining candidate
mentions
10
Problem Definition
 Distantly-supervised entity recognition in a domain-specific corpus
 Given:
 a domain-specific corpus D
 a knowledge base (e.g., Freebase)
 a set of target types (T) from a KB
 Detect candidate entity mentions from corpus D
 Categorize each candidate mention by target types or Not-Of-Interest
(NOI), with distant supervision
11
Challenge I: Domain Restriction
 Most existing work assume entity mentions are already extracted by existing
entity detection tools, e.g., noun phrase chunkers
 Usually trained on general-domain corpora like news articles (clean,
grammatical)
 Make use of various linguistics features (e.g., semantic parsing structures)
 Do not work well on specific, dynamic or emerging domains (e.g., tweets,
Yelp reviews)
 E.g, “in-and-out” from Yelp review may not be properly detected
12
Challenge II: Name Ambiguity
 Multiple entities may share the same surface name
 Previous methods simply output a single type/type distribution for each
surface name, instead of an exact type for each entity mention
While Griffin is not the part of Washington’s plan on Sunday’s game, … Sport team
…has concern that Kabul is an ally of Washington. U.S. government
He has office in Washington, Boston and San Francisco U.S. capital city
Washington State or Washington
Sport
team
Govern
-ment
State
…
While Griffin is not the part of
Washington’s plan on Sunday’s
game, …
… news from Washington indicates
that the congress is going to…
It is one of the best state parks in
Washington.
13
Challenge III: Context Sparsity
 A variety of contextual clues are leveraged to find sources of shared semantics
across different entities
 Keywords, Wiki concepts, linguistic patterns, textual relations, …
 There are often many ways to describe even the same relation between two
entities
 Previous methods have difficulties in handling entity mention with sparse
(infrequent) context
ID Sentence Freq
1 The magnitude 9.0 quake caused widespread devastation in [Kesennuma city] 12
2 … tsunami that ravaged [northeastern Japan] last Friday 31
3 The resulting tsunami devastate [Japan]’s northeast 244
14
Our Solution
Domain-agnostic phrase mining algorithm: Extracts candidate entity mentions with
minimal linguistic assumption  address domain restriction
 E.g., part-of-speech (POS) tagging << semantic parsing
Do not simply merge entity mentions with identical surface names
 Model each mention based on its surface name and context, in a scalable way
 address name ambiguity
Mine relation phrase co-occurring with entity mentions; infer synonymous relation
phrases
 Helps form connecting bridges among entities that do not share identical context,
but share synonymous relation phrases  address context sparsity
15
A Relation Phrase-Based Entity Recognition Framework
 POS-constrained phrase segmentation for mining candidate entity mentions and
relation phrases, simultaneously
 Construct a heterogeneous graph to represent available information in a unified
form
Entity mentions are kept as
individual objects to be
disambiguated
Linked to entity surface names
& relation phrases
16
Graph-Based Semi-Supervised Learning Framework
 With the constructed graph, formulate a graph-based semi-supervised
learning of two tasks jointly:
Type propagation on heterogeneous graph
Multi-view relation phrase clustering
Propagate type information
among entities bridges via
synonymous relation phrases
Derived entity argument types serve
as good feature for clustering
relation phrases
Mutually enhancing each other; leads to quality
recognition of unlinkable entity mentions
17
Framework Overview
1. Perform phrase mining on a POS-tagged corpus to extract candidate entity
mentions and relation phrases
2. Construct a heterogeneous graph to encode our insights on modeling the type
for each entity mention
3. Collect seed entity mentions as labels by linking extracted mentions to the KB
4. Estimate type indicator for unlinkable candidate mentions with the proposed
type propagation integrated with relation phrase clustering on the constructed
graph
18
Candidate Generation
 An efficient phrase mining algorithm incorporating both corpus-level statistics and
syntactic constraints
 Global significance score: Filter low-quality candidates; generic POS tag
patterns: remove phrases with improper syntactic structure
 By extending TopMine, the algorithm partitions corpus into segments which
meet both significance threshold and POS patterns  candidate entity mentions
& relation phrases
Algorithm workflow:
1. Mine frequent contiguous patterns
2. Performs greedy-agglomerative merging while
enforcing our syntactic constraints
• Entity mention: consecutive nouns
• Relation phrases: shown in the table
3. Terminates when the next highest-score merging
does not meet a pre-defined significance threshold
Relation phrase: phrase that denotes a unary
or binary relation in a sentence
19
Candidate Generation
 Example output of candidate generation on NYT news articles
 Entity detection performance comparison with an NP chunker
Recall is most critical for this step, since later we cannot detect the misses (i.e., false negatives)
20
Construction of Heterogeneous Graphs
 With three types of objects extracted from corpus: candidate entity mentions,
entity surface names, and relation phrases
 We can construct a heterogeneous graph to enforce several hypotheses for
modeling type of each entity mention (introduced in the following slides)
Basic idea for constructing the graph:
the more two objects are likely to share
the same label, the larger the weight will
be associated with their connecting edge
Three types of links:
1. Mention-name link: (many-to-one) mappings
between entity mention and surface names
2. Name-relation phrase links: corpus-level co-
occurrence between surface names and relation
phrases
3. Mention correlation links: distributional
similarity between entity mentions
21
Entity Mention-Surface Name Subgraph
 Directly modeling type indicator of each entity mention in label propagation
 Intractable size of parameter space
 Both the entity name and the surrounding relation phrases provide strong cues on
the types of a candidate entity mention
 Model the type of each entity mention by (1) type indicator of its surface name; (2) the
type signatures of its surrounding relation phrases (more details in the following slides)
…has concerns whether Kabul is an ally of Washington
Washington
Gover-
nment
State
is an ally of
…has concerns whether Kabul is an ally of Washington: GOVERNMENT
M candidate mentions;
n surface names
Use a bi-adjacency matrix to
represent the mapping
22
Entity Name-Relation Phrase Subgraph
 Aggregated co-occurrences between entity surface names and relation phrases across
corpus  weight importance of different relation phrases for surface names
 use connected edges as bridges to propagate type information
 Left/right entity argument of relation phrase: for each mention, assign it as the left
(right, resp.) argument to the closest relation phrase on its right (left, resp.) in a
sentence
 Type signature of relation phrase: Two type indicators for its left and right arguments
l different relation phrases, mapping between
mentions and relation phrases:
Two bi-adjacency matrices for the subgraph
23
Mention Correlation Subgraph
 An entity mention may have ambiguous name and ambiguous relation phrases
 E.g., “White house” and “felt” in the first sentence of Figure
 Other co-occurring mentions may provide good hints to the type of an entity mention
 E.g., “birth certificate” and “rose garden” in the Figure
Construct KNN graph based on the feature vector f-surface
names of co-occurring entity mentions
 Propagate type
information between
candidate mentions of
each surface name, based
on following hypothesis:
24
Relation Phrase Clustering
 Observation: many relation phrases have very few occurrences in the corpus
 ~37% relation phrases have <3 unique entity surface names (in right or left arguments)
 Hard to model their type signature based on aggregated co-occurrences with
entity surface names (i.e., Hypothesis 1)
 Softly clustering synonymous relation phrases:
 the type signatures of frequent relation phrases can help infer the type
signatures of infrequent (sparse) ones that have similar cluster memberships
25
Relation Phrase Clustering
 Existing work on relation phrase clustering utilizes strings; context words; entity argument
to cluster synonymous relation phrases
 String similarity and distribution similarity may be insufficient to resolve two relation
phrases; type information is particular helpful in such case
 We propose to leverage type signature of relation phrase,and proposed a general relation
phrase clustering method to incorporate different features
 further integrated with the graph-based type propagation in a mutually enhancing
framework, based on following hypothesis
Type signatures
String features
Context features
26
Type Inference: A Joint Optimization Problem
Mention modeling &
mention correlation (Hypo 2)
Multi-view relation phrases clustering (Hypo 3 & 4)
Type propagation between
entity surface names
and relation phrases
(Hypo 1)
27
The ClusType Algorithm
 Can be efficiently solved by alternate
minimization based on block coordinate
descent algorithm
 Algorithm complexity is linear to #entity
mentions, #relation phrases, #cluster,
#clustering features and #target types
Update type indicators and type signatures
For each view, performs single-view NMF until converges
The ClusType algorithm:
Update consensus matrix and relative weights of different views
Until the objective converges
28
29
Experiment Setting
 Datasets: 2013 New York Times news (~110k docs) [event, PER, LOC, ORG]; Yelp
Reviews (~230k) [Food, Job, …]; 2011 Tweets (~300k) [event, product, PER, LOC, …]
 Seed mention sets: < 7% extracted mentions are mapped to Freebase entities
 Evaluation sets: manually annotate mentions of target types for subsets of the corpora
 Evaluation metrics: Follows named entity recognition evaluation (Precision, Recall, F1)
 Compared methods
 Pattern: Stanford pattern-based learning; SemTagger:bootstrapping method which
trains contextual classifier based on seed mentions; FIGER: distantly-supervised
sequence labeling method trained on Wiki corpus; NNPLB: label propagation using
ReVerb assertion and seed mention; APOLLO: mention-level label propagation using
Wiki concepts and KB entities;
 ClusType-NoWm: ignore mention correlation; ClusType-NoClus: conducts only type
propagation; ClusType-TwpStep: first performs hard clustering then type
propagation
30
Comparing ClusType with Other Methods and Its Variants
 vs. FIGER: effectiveness of our candidate generation and proposed
hypotheses on type propagation
 vs. NNPLB and APOLLO: ClusType not only utilizes semantic-rich
relation phrase as type cues, but only cluster synonymous relation
phrases to tackle context sparsity
 vs. variants: (i) model mention correlation for name disambiguation;
(ii) integrates clustering in a mutually enhancing way
46.08% and 48.94%
improvement in F1 score
compared to the best
baseline on the Tweet
and the Yelp datasets
31
Comparing on Different Entity Types
Obtains larger gain on organization and
person (more entities with ambiguous
surface names)
Modeling types on entity mention level
is critical for name disambiguation
Superior performance on product and food mainly
comes from the domain independence of our method
Both NNPLB and SemTagger require sophisticated linguistic
feature generation which is hard to adapt to new types
32
Comparing on Trained NER System
 Compare with Stanford NER, which is trained on general-domain corpora including
ACE corpus and MUC corpus, on three types: PER, LOC, ORG
 ClusType and its variants outperform Stanford NER on both dynamic corpus (NYT)
and domain-specific corpus (Yelp)
 ClusType has lower precision but higher Recall and F1 score on Tweet  Superior
recall of ClusType mainly come from domain-independent candidate generation
33
Example Output and Relation Phrase Clusters
 Extracts more mentions and predicts types with
higher accuracy
 Not only synonymous relation
phrases, but also both sparse
and frequent relation phrase
can be clustered together
  boosts sparse relation
phrases with type information
of frequent relation phrases
34
Testing on Context Sparsity and Surface Name popularity
 Context sparsity:
 Group A: frequent relation phrases
 Group B: sparse relation phrases
 ClusType obtains superior performance over its variants on Group B
  clustering relation phrase is critical for sparse relation phrases
 Surface name popularity:
 Group A: high frequency
surface name
 Group B: infrequent surface
name
 ClusType outperforms its
variants on Group B
  Handles well mentions
with insufficient corpus
statistics
35
Conclusions and Future Work
 Study distantly-supervised entity recognition for domain-specific corpora and
propose a novel relation phrase-based framework
 A data-driven, domain-agnostic phrase mining algorithm for candidate entity
mentions and relation phrase generation
 Integrate relation phrase clustering with type propagation on heterogeneous
graphs, and solve it by a joint optimization problem.
Ongoing:
 Extend to role discovery for scientific concepts  paper profiling (research/demo)
 Study of relation phrase clustering, such as
 joint entity/relation clustering
 synonymous relation phrase canonicalization
 Study of joint entity and relation phrase extraction with phrase mining
36
37
References
 X. Ren, A. El-Kishky, C. Wang, F. Tao, C. R. Voss, H. Ji and J. Han. ClusType: Effective Entity Recognition
and Typing by Relation Phrase-Based Clustering. KDD’15.
 H. Huang, Y. Cao, X. Huang, H. Ji and C. Lin. Collective Tweet Wikification based on Semi-supervised
Graph Regularization. ACL’14
 T. Lin, O. Etzioni, et al. No noun phrase left behind: detecting and typing unlinkable entities. EMNLP’12
 N. Nakashole, T. Tylenda, and G. Weikum. Fine-grained semantic typing of emerging entities. ACL’13
 R. Huang and E. Riloff. Inducing domain-specific semantic class taggers from (almost) nothing. ACL’10
 X. Ling and D. S. Weld. Fine-grained entity recognition. AAAI’12
 W. Shen, J. Wang, P. Luo, and M. Wang. A graph-based approach for ontology population with named
entities. CIKM’12
 S. Gupta and C. D. Manning. Improved pattern learning for bootstrapped entity extraction. CONLL’14
 P. P. Talukdar and F. Pereira. Experiments in graph-based semi-supervised learning methods for class-
instance acquisition. ACL’10
 Z. Kozareva and E. Hovy. Not all seeds are equal: Measuring the quality of text mining seeds. NAACL’10
 L. Gal´arraga, G. Heitz, K. Murphy, and F. M. Suchanek. Canonicalizing open knowledge bases. CIKM’14

More Related Content

What's hot

Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
 
Named entity recognition (ner) with nltk
Named entity recognition (ner) with nltkNamed entity recognition (ner) with nltk
Named entity recognition (ner) with nltkJanu Jahnavi
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexingbalaabirami
 
Phd_cristian_lai presentation
Phd_cristian_lai presentationPhd_cristian_lai presentation
Phd_cristian_lai presentationCristian Lai
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type detemayank272369
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
50 Ways to Makes Sense Of Natural Language
50 Ways to Makes Sense Of Natural Language50 Ways to Makes Sense Of Natural Language
50 Ways to Makes Sense Of Natural LanguageFlumes
 
The impact of domain-specific stop-word lists on ecommerce website search per...
The impact of domain-specific stop-word lists on ecommerce website search per...The impact of domain-specific stop-word lists on ecommerce website search per...
The impact of domain-specific stop-word lists on ecommerce website search per...greatsalvation813
 
In-text Citations - APA 6th ed
In-text Citations - APA 6th edIn-text Citations - APA 6th ed
In-text Citations - APA 6th edJanice Orcutt
 
#3 html in the real world
#3 html in the real world#3 html in the real world
#3 html in the real worldJean Michel
 
Detecting Ontological Conflicts in Protocols between Semantic Web Services
Detecting Ontological Conflicts in Protocols between Semantic Web ServicesDetecting Ontological Conflicts in Protocols between Semantic Web Services
Detecting Ontological Conflicts in Protocols between Semantic Web Servicesdannyijwest
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...sboisen
 

What's hot (19)

Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity Summarization
 
Week12
Week12Week12
Week12
 
Named entity recognition (ner) with nltk
Named entity recognition (ner) with nltkNamed entity recognition (ner) with nltk
Named entity recognition (ner) with nltk
 
NLP todo
NLP todoNLP todo
NLP todo
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Phrase Based Indexing
Phrase Based IndexingPhrase Based Indexing
Phrase Based Indexing
 
Phd_cristian_lai presentation
Phd_cristian_lai presentationPhd_cristian_lai presentation
Phd_cristian_lai presentation
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type dete
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
50 Ways to Makes Sense Of Natural Language
50 Ways to Makes Sense Of Natural Language50 Ways to Makes Sense Of Natural Language
50 Ways to Makes Sense Of Natural Language
 
The impact of domain-specific stop-word lists on ecommerce website search per...
The impact of domain-specific stop-word lists on ecommerce website search per...The impact of domain-specific stop-word lists on ecommerce website search per...
The impact of domain-specific stop-word lists on ecommerce website search per...
 
In-text Citations - APA 6th ed
In-text Citations - APA 6th edIn-text Citations - APA 6th ed
In-text Citations - APA 6th ed
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
#3 html in the real world
#3 html in the real world#3 html in the real world
#3 html in the real world
 
Tutorial 1-Ontologies
Tutorial 1-OntologiesTutorial 1-Ontologies
Tutorial 1-Ontologies
 
Detecting Ontological Conflicts in Protocols between Semantic Web Services
Detecting Ontological Conflicts in Protocols between Semantic Web ServicesDetecting Ontological Conflicts in Protocols between Semantic Web Services
Detecting Ontological Conflicts in Protocols between Semantic Web Services
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
 

Similar to Entity Extraction and Typing through Relational Graph Construction

Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with LydiaJae Hong Kil
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEduserv Foundation
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...Iman Mirrezaei
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingGiuseppe Rizzo
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Bradley Allen
 
Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,Nexgen Technology
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...Julie Allinson
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Anastasia Zhukova
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glassEduserv Foundation
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationArmin Haller
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppratnapatil14
 
Combining Similarities and Regression for Entity Linking.
Combining Similarities and Regression for Entity Linking.Combining Similarities and Regression for Entity Linking.
Combining Similarities and Regression for Entity Linking.César de Pablo
 

Similar to Entity Extraction and Typing through Relational Graph Construction (20)

Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, Mexico
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity Linking
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
 
Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glass
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical Evaluation
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt pppppppppppppppppppppppppp
 
Combining Similarities and Regression for Entity Linking.
Combining Similarities and Regression for Entity Linking.Combining Similarities and Regression for Entity Linking.
Combining Similarities and Regression for Entity Linking.
 
ppt
pptppt
ppt
 

More from jins0618

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networksjins0618
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environmentsjins0618
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践jins0618
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究jins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutlinejins0618
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big datajins0618
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...jins0618
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationjins0618
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdjins0618
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase miningjins0618
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hinjins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutlinejins0618
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisjins0618
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big datajins0618
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...jins0618
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 

More from jins0618 (20)

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Entity Extraction and Typing through Relational Graph Construction

  • 1. 1
  • 2. Entity Extraction and Typing by Relational Graph Construction and Propagation JIAWEI HAN COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN JULY 3, 2015 2
  • 3. 3 Recognizing Typed Entities The best BBQ I’ve tasted in Phoenix! I had the pulled pork sandwich with coleslaw and baked beans for lunch. ... The owner is very nice. … The best BBQ:Food I’ve tasted in Phoenix:LOC ! I had the [pulled pork sandwich]:Food with coleslaw:Food and [baked beans]:Food for lunch. … The owner:JOB_TITLE is very nice. … FOOD LOCATION JOB_TITLE EVENT ORGANIZATION … Target Types Identifying token span as entity mentions in documents and labeling their types Enabling structured analysis of unstructured text corpus Plain text Text with typed entities FOOD LOCATION EVENT
  • 4. 4  Extracting and linking entities can be used in a variety of ways:  serve as primitives for information extraction and knowledge base population  assist question answering,…  Traditional named entity recognition systems are designed for major types (e.g., PER, LOC, ORG) and general domains (e.g., news)  Require additional steps to adapt to new domains/types  Expensive human labor on annotation  500 documents for entity extraction; 20,000 queries for entity linking  Unsatisfying agreement due to various granularity levels and scopes of types  Entities obtained by entity linking techniques have limited coverage and freshness  >50% unlinkable entity mentions in Web corpus [Lin et al., EMNLP’12]  >90% in our experiment corpora: tweets, Yelp reviews, … Traditional NLP Approach: Data Annotation
  • 5. 5  Typical Entity Extraction Features (Li et al., 2012)  N-gram: Unigram, bigram and trigram token sequences in the context window  Part-of-Speech: POS tags of the context words  Gazetteers: person names, organizations, countries and cities, titles, idioms, etc.  Word clusters: word clusters / embeddings  Case and Shape: Capitalization and morphology analysis based features  Chunking: NP and VP Chunking tags  Global feature: Sentence level and document level structure/position features Traditional NLP Approach: Feature Engineering
  • 6. 6 Traditional NLP Approach: Feature Engineering Mention/Concept Attribute Description Name Spelling match Exact string match, acronym match, alias match, string matching… KB link mining Name pairs mined from KB text redirect and disambiguation pages Name Gazetteer Organization and geo-political entity abbreviation gazetteers Document surface Lexical Words in KB facts, KB text, mention name, mention text. Tf.idf of words and ngrams Position Mention name appears early in KB text Genre Genre of the mention text (newswire, blog, …) Local Context Lexical and part-of-speech tags of context words Entity Context Type Mention concept type, subtype Relation/Event Concepts co-occurred, attributes/relations/events with mention Coreference Co-reference links between the source document and the KB text Profiling Slot fills of the mention, concept attributes stored in KB infobox Concept Ontology extracted from KB text Topic Topics (identity and lexical similarity) for the mention text and KB text KB Link Mining Attributes extracted from hyperlink graphs of the KB text Popularity Web Top KB text ranked by search engine and its length Frequency Frequency in KB texts  Typical Entity Linking Features (Ji et al., 2011)
  • 7. 7 A New Data Mining Solution  Acquire labels for a small amount of instances  Construct a relational graph to connect labeled instances and unlabeled instances  Construct edges based on coarse-grained data-driven statistics instead of fine- grained linguistic similarity  mention correlation  text co-occurrence  semantic relatedness based on knowledge graph embeddings  social networks  Label propagation across the graph
  • 8. 8 Case Study 1: Entity Extraction  Goal: recognizing entity mentions of target types with minimal/no human supervision and with no requirement that entities can be found in a KB.  Two kinds of efforts towards this goal:  Weak supervision: relies on manually selected seed entities in applying pattern-based bootstrapping methods or label propagation methods to identify more entities  Both assume seeds are unambiguous and sufficiently frequent  requires careful seed selection by human  Distant supervision: leverages entity information in KBs to reduce human supervision (cont.)
  • 9. 9 Typical Workflow of Distant Supervision  Detect entity mentions from text  Map candidate mentions to KB entities of target types  Use confidently mapped {mention, type} to infer types of remaining candidate mentions
  • 10. 10 Problem Definition  Distantly-supervised entity recognition in a domain-specific corpus  Given:  a domain-specific corpus D  a knowledge base (e.g., Freebase)  a set of target types (T) from a KB  Detect candidate entity mentions from corpus D  Categorize each candidate mention by target types or Not-Of-Interest (NOI), with distant supervision
  • 11. 11 Challenge I: Domain Restriction  Most existing work assume entity mentions are already extracted by existing entity detection tools, e.g., noun phrase chunkers  Usually trained on general-domain corpora like news articles (clean, grammatical)  Make use of various linguistics features (e.g., semantic parsing structures)  Do not work well on specific, dynamic or emerging domains (e.g., tweets, Yelp reviews)  E.g, “in-and-out” from Yelp review may not be properly detected
  • 12. 12 Challenge II: Name Ambiguity  Multiple entities may share the same surface name  Previous methods simply output a single type/type distribution for each surface name, instead of an exact type for each entity mention While Griffin is not the part of Washington’s plan on Sunday’s game, … Sport team …has concern that Kabul is an ally of Washington. U.S. government He has office in Washington, Boston and San Francisco U.S. capital city Washington State or Washington Sport team Govern -ment State … While Griffin is not the part of Washington’s plan on Sunday’s game, … … news from Washington indicates that the congress is going to… It is one of the best state parks in Washington.
  • 13. 13 Challenge III: Context Sparsity  A variety of contextual clues are leveraged to find sources of shared semantics across different entities  Keywords, Wiki concepts, linguistic patterns, textual relations, …  There are often many ways to describe even the same relation between two entities  Previous methods have difficulties in handling entity mention with sparse (infrequent) context ID Sentence Freq 1 The magnitude 9.0 quake caused widespread devastation in [Kesennuma city] 12 2 … tsunami that ravaged [northeastern Japan] last Friday 31 3 The resulting tsunami devastate [Japan]’s northeast 244
  • 14. 14 Our Solution Domain-agnostic phrase mining algorithm: Extracts candidate entity mentions with minimal linguistic assumption  address domain restriction  E.g., part-of-speech (POS) tagging << semantic parsing Do not simply merge entity mentions with identical surface names  Model each mention based on its surface name and context, in a scalable way  address name ambiguity Mine relation phrase co-occurring with entity mentions; infer synonymous relation phrases  Helps form connecting bridges among entities that do not share identical context, but share synonymous relation phrases  address context sparsity
  • 15. 15 A Relation Phrase-Based Entity Recognition Framework  POS-constrained phrase segmentation for mining candidate entity mentions and relation phrases, simultaneously  Construct a heterogeneous graph to represent available information in a unified form Entity mentions are kept as individual objects to be disambiguated Linked to entity surface names & relation phrases
  • 16. 16 Graph-Based Semi-Supervised Learning Framework  With the constructed graph, formulate a graph-based semi-supervised learning of two tasks jointly: Type propagation on heterogeneous graph Multi-view relation phrase clustering Propagate type information among entities bridges via synonymous relation phrases Derived entity argument types serve as good feature for clustering relation phrases Mutually enhancing each other; leads to quality recognition of unlinkable entity mentions
  • 17. 17 Framework Overview 1. Perform phrase mining on a POS-tagged corpus to extract candidate entity mentions and relation phrases 2. Construct a heterogeneous graph to encode our insights on modeling the type for each entity mention 3. Collect seed entity mentions as labels by linking extracted mentions to the KB 4. Estimate type indicator for unlinkable candidate mentions with the proposed type propagation integrated with relation phrase clustering on the constructed graph
  • 18. 18 Candidate Generation  An efficient phrase mining algorithm incorporating both corpus-level statistics and syntactic constraints  Global significance score: Filter low-quality candidates; generic POS tag patterns: remove phrases with improper syntactic structure  By extending TopMine, the algorithm partitions corpus into segments which meet both significance threshold and POS patterns  candidate entity mentions & relation phrases Algorithm workflow: 1. Mine frequent contiguous patterns 2. Performs greedy-agglomerative merging while enforcing our syntactic constraints • Entity mention: consecutive nouns • Relation phrases: shown in the table 3. Terminates when the next highest-score merging does not meet a pre-defined significance threshold Relation phrase: phrase that denotes a unary or binary relation in a sentence
  • 19. 19 Candidate Generation  Example output of candidate generation on NYT news articles  Entity detection performance comparison with an NP chunker Recall is most critical for this step, since later we cannot detect the misses (i.e., false negatives)
  • 20. 20 Construction of Heterogeneous Graphs  With three types of objects extracted from corpus: candidate entity mentions, entity surface names, and relation phrases  We can construct a heterogeneous graph to enforce several hypotheses for modeling type of each entity mention (introduced in the following slides) Basic idea for constructing the graph: the more two objects are likely to share the same label, the larger the weight will be associated with their connecting edge Three types of links: 1. Mention-name link: (many-to-one) mappings between entity mention and surface names 2. Name-relation phrase links: corpus-level co- occurrence between surface names and relation phrases 3. Mention correlation links: distributional similarity between entity mentions
  • 21. 21 Entity Mention-Surface Name Subgraph  Directly modeling type indicator of each entity mention in label propagation  Intractable size of parameter space  Both the entity name and the surrounding relation phrases provide strong cues on the types of a candidate entity mention  Model the type of each entity mention by (1) type indicator of its surface name; (2) the type signatures of its surrounding relation phrases (more details in the following slides) …has concerns whether Kabul is an ally of Washington Washington Gover- nment State is an ally of …has concerns whether Kabul is an ally of Washington: GOVERNMENT M candidate mentions; n surface names Use a bi-adjacency matrix to represent the mapping
  • 22. 22 Entity Name-Relation Phrase Subgraph  Aggregated co-occurrences between entity surface names and relation phrases across corpus  weight importance of different relation phrases for surface names  use connected edges as bridges to propagate type information  Left/right entity argument of relation phrase: for each mention, assign it as the left (right, resp.) argument to the closest relation phrase on its right (left, resp.) in a sentence  Type signature of relation phrase: Two type indicators for its left and right arguments l different relation phrases, mapping between mentions and relation phrases: Two bi-adjacency matrices for the subgraph
  • 23. 23 Mention Correlation Subgraph  An entity mention may have ambiguous name and ambiguous relation phrases  E.g., “White house” and “felt” in the first sentence of Figure  Other co-occurring mentions may provide good hints to the type of an entity mention  E.g., “birth certificate” and “rose garden” in the Figure Construct KNN graph based on the feature vector f-surface names of co-occurring entity mentions  Propagate type information between candidate mentions of each surface name, based on following hypothesis:
  • 24. 24 Relation Phrase Clustering  Observation: many relation phrases have very few occurrences in the corpus  ~37% relation phrases have <3 unique entity surface names (in right or left arguments)  Hard to model their type signature based on aggregated co-occurrences with entity surface names (i.e., Hypothesis 1)  Softly clustering synonymous relation phrases:  the type signatures of frequent relation phrases can help infer the type signatures of infrequent (sparse) ones that have similar cluster memberships
  • 25. 25 Relation Phrase Clustering  Existing work on relation phrase clustering utilizes strings; context words; entity argument to cluster synonymous relation phrases  String similarity and distribution similarity may be insufficient to resolve two relation phrases; type information is particular helpful in such case  We propose to leverage type signature of relation phrase,and proposed a general relation phrase clustering method to incorporate different features  further integrated with the graph-based type propagation in a mutually enhancing framework, based on following hypothesis Type signatures String features Context features
  • 26. 26 Type Inference: A Joint Optimization Problem Mention modeling & mention correlation (Hypo 2) Multi-view relation phrases clustering (Hypo 3 & 4) Type propagation between entity surface names and relation phrases (Hypo 1)
  • 27. 27 The ClusType Algorithm  Can be efficiently solved by alternate minimization based on block coordinate descent algorithm  Algorithm complexity is linear to #entity mentions, #relation phrases, #cluster, #clustering features and #target types Update type indicators and type signatures For each view, performs single-view NMF until converges The ClusType algorithm: Update consensus matrix and relative weights of different views Until the objective converges
  • 28. 28
  • 29. 29 Experiment Setting  Datasets: 2013 New York Times news (~110k docs) [event, PER, LOC, ORG]; Yelp Reviews (~230k) [Food, Job, …]; 2011 Tweets (~300k) [event, product, PER, LOC, …]  Seed mention sets: < 7% extracted mentions are mapped to Freebase entities  Evaluation sets: manually annotate mentions of target types for subsets of the corpora  Evaluation metrics: Follows named entity recognition evaluation (Precision, Recall, F1)  Compared methods  Pattern: Stanford pattern-based learning; SemTagger:bootstrapping method which trains contextual classifier based on seed mentions; FIGER: distantly-supervised sequence labeling method trained on Wiki corpus; NNPLB: label propagation using ReVerb assertion and seed mention; APOLLO: mention-level label propagation using Wiki concepts and KB entities;  ClusType-NoWm: ignore mention correlation; ClusType-NoClus: conducts only type propagation; ClusType-TwpStep: first performs hard clustering then type propagation
  • 30. 30 Comparing ClusType with Other Methods and Its Variants  vs. FIGER: effectiveness of our candidate generation and proposed hypotheses on type propagation  vs. NNPLB and APOLLO: ClusType not only utilizes semantic-rich relation phrase as type cues, but only cluster synonymous relation phrases to tackle context sparsity  vs. variants: (i) model mention correlation for name disambiguation; (ii) integrates clustering in a mutually enhancing way 46.08% and 48.94% improvement in F1 score compared to the best baseline on the Tweet and the Yelp datasets
  • 31. 31 Comparing on Different Entity Types Obtains larger gain on organization and person (more entities with ambiguous surface names) Modeling types on entity mention level is critical for name disambiguation Superior performance on product and food mainly comes from the domain independence of our method Both NNPLB and SemTagger require sophisticated linguistic feature generation which is hard to adapt to new types
  • 32. 32 Comparing on Trained NER System  Compare with Stanford NER, which is trained on general-domain corpora including ACE corpus and MUC corpus, on three types: PER, LOC, ORG  ClusType and its variants outperform Stanford NER on both dynamic corpus (NYT) and domain-specific corpus (Yelp)  ClusType has lower precision but higher Recall and F1 score on Tweet  Superior recall of ClusType mainly come from domain-independent candidate generation
  • 33. 33 Example Output and Relation Phrase Clusters  Extracts more mentions and predicts types with higher accuracy  Not only synonymous relation phrases, but also both sparse and frequent relation phrase can be clustered together   boosts sparse relation phrases with type information of frequent relation phrases
  • 34. 34 Testing on Context Sparsity and Surface Name popularity  Context sparsity:  Group A: frequent relation phrases  Group B: sparse relation phrases  ClusType obtains superior performance over its variants on Group B   clustering relation phrase is critical for sparse relation phrases  Surface name popularity:  Group A: high frequency surface name  Group B: infrequent surface name  ClusType outperforms its variants on Group B   Handles well mentions with insufficient corpus statistics
  • 35. 35 Conclusions and Future Work  Study distantly-supervised entity recognition for domain-specific corpora and propose a novel relation phrase-based framework  A data-driven, domain-agnostic phrase mining algorithm for candidate entity mentions and relation phrase generation  Integrate relation phrase clustering with type propagation on heterogeneous graphs, and solve it by a joint optimization problem. Ongoing:  Extend to role discovery for scientific concepts  paper profiling (research/demo)  Study of relation phrase clustering, such as  joint entity/relation clustering  synonymous relation phrase canonicalization  Study of joint entity and relation phrase extraction with phrase mining
  • 36. 36
  • 37. 37 References  X. Ren, A. El-Kishky, C. Wang, F. Tao, C. R. Voss, H. Ji and J. Han. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering. KDD’15.  H. Huang, Y. Cao, X. Huang, H. Ji and C. Lin. Collective Tweet Wikification based on Semi-supervised Graph Regularization. ACL’14  T. Lin, O. Etzioni, et al. No noun phrase left behind: detecting and typing unlinkable entities. EMNLP’12  N. Nakashole, T. Tylenda, and G. Weikum. Fine-grained semantic typing of emerging entities. ACL’13  R. Huang and E. Riloff. Inducing domain-specific semantic class taggers from (almost) nothing. ACL’10  X. Ling and D. S. Weld. Fine-grained entity recognition. AAAI’12  W. Shen, J. Wang, P. Luo, and M. Wang. A graph-based approach for ontology population with named entities. CIKM’12  S. Gupta and C. D. Manning. Improved pattern learning for bootstrapped entity extraction. CONLL’14  P. P. Talukdar and F. Pereira. Experiments in graph-based semi-supervised learning methods for class- instance acquisition. ACL’10  Z. Kozareva and E. Hovy. Not all seeds are equal: Measuring the quality of text mining seeds. NAACL’10  L. Gal´arraga, G. Heitz, K. Murphy, and F. M. Suchanek. Canonicalizing open knowledge bases. CIKM’14