SlideShare ist ein Scribd-Unternehmen logo
1 von 143
Downloaden Sie, um offline zu lesen
Qi He
Director of Engineering
qhe@linkedin.com
Jaewon Yang Baoxu Shi
Senior Engineer
dashi@linkedin.com
Senior Staff Engineer
jeyang@linkedin.com
Constructing Knowledge Graph
for Social Networks
in a Deep and Holistic Way
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Introduction
Qi He
Director of Engineering,
LinkedIn
Overview
This tutorial will be successful if:
- You learn the problem statement of constructing knowledge graph for social networks and its
technical challenges
- You learn the opportunities for tackling the technical challenges of the problem
- You learn the state-of-the-arts and our experiences of the solutions
Preliminary knowledge
Knowledge Graph Construction is a process of creating structured data: 1) a canonical
representation for every entity, 2) relationships between entities.
Methods: 1) Human curation, 2) AI modeling (ML/NLP), 3) Data ingestion
Problem Statement
- Knowledge Graph Construction for Social Networks
1) input data for each member in the social network is noisy, implicit and in multilingual
2) KG and the social network influence each other via multiple organic feedback loops
Opportunity
A deep and holistic way is the best strategy to tackle the technical challenges.
Deep: develop deep NLP models to deeply understand the input data
- noisy and implicit: train high precision language understanding models by adding small clean
data to the noisy data
- multilingual: expand a single-language KG to multilingual KGs by applying deep transfer
learning models
Holistic: grow social network and KG together via their model interactions
- refine KG by learning deep embeddings from the social network
- grow social network by learning deep embeddings from KG
- launch new products to get explicit feedback on KG from social network members
Q1: How can we recognize existing entities and expand new entities from noisy and multilingual
text?
- The encoder and decoder NLU approach
- Pattern + deep learning based auto-taxonomy expansion
Q2: How can we construct entity relationships with limited input data?
- Unsupervised learning
- Semi-supervised learning
- Pre-trained deep learning models (BERT family)
- Cross-lingual transfer learning (Adversarial learning, Multilingual encoder)
Q3: How can we refine KG by ingesting data from social network?
- Embedding-based entity alignment between social network and KG
- Joint representation learning on social network and KG
- Probabilistic member feedback (label/answer) aggregation from social network
The Three Technical Questions inside Opportunity
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Overview of
LinkedIn’s
Knowledge Graph
and Applications
Qi He
Director of Engineering,
LinkedIn
LinkedIn Knowledge Graph (aka Economic Graph)
Member input data 675M members
50M+ orgs
400+ industriesCertificates,
degrees & more…
200+ countries
50K+ skills
25K+
titles
Roles,
occupations
States, cities,
postal codes…
Tools, products,
technologies…
Specialty
LinkedIn skill taxonomy example
LinkedIn unique asset
Skill identity
Skill type
Relationships
ID:207
Definition:
http://en.wikipedia.org/wiki/Graphic_design
Canonical name:
EN: Graphic Design
Zh_CN: 平面设计
...
Aliases:
"fr_FR": [ "concepteurs graphiques", "editorial design"],
"en_US": …..
Skill type: hard → industry experience
Soft skills
Hard skills
Tools & Technologies
Spoken languages
Industry experience
Design
Graphic
Design
Adobe
Photoshop
Skill or not a Skill?
fear of flying
Skills Titles
phobias; self-esteem;
stress management;
hypnotherapy
hypnotherapist;
psychotherapist;
intelligence national security; military
operations; security
clearance;
Military Intelligence Officer,
Tactical Intelligence Officer
headaches holistic health; sports
injuries; neck pain;
nutrition;
chiropractor; massage
therapist; acupuncturist
Understanding Member skills
Exclude
references to the
company
Include skills
that relate to the
member with a high
confidence score
Exclude skills
with a lower
confidence score
To power all LinkedIn product
Knowledge
Graph
Jobs Search
Job Recommendation
Recruiter Search
Talent Insights
Jobs SEO
Profile Page
People Search
SEO
New-member onboarding
PYMK
Premium
ProFinder
EGR
Notifications
GSO
Ads
Pages
Sales Navigator
LSI
Merlin
Courses
Unlock the full
potential of the
LinkedIn Economic
Graph
Enable positive flywheel effect in LinkedIn ecosystem
Input
signals
Graph
construction
Deliver
value
Engagement
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:45 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Named Entity
Recognition and
Disambiguation
Jaewon Yang,
LinkedIn
● Set of triples (Source entity, Relation, Target entity)
○ (Bill Gates, Founder, Microsoft)
○ (Microsoft, Located_In, Redmond)
○ ...
● Canonical representation of relations among entities
● Examples:
○ Google Knowledge Graph
○ Microsoft Satori
○ Freebase
○ LinkedIn Economic Graph
Knowledge Graph
Microsoft Bill Gates
Redmond
Founder
Located_In
Knowledge Graph Construction
Tesla is an Electric vehicle
company based in Palo Alto.
Tesla
Inc.
Electric
Vehicle
Palo
Alto,
CA
Specialized_In
Located_In
Task 1: Named Entity Recognition
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla
Inc.
Electric
Vehicle
Palo
Alto,
CA
Task 2: Relation Extraction [Next Section]
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla
Inc.
Electric
Vehicle
Palo
Alto,
CA
Specialized_In
Located_In
Named Entity Recognition: Challenges
1. Name Variation
a. Tesla, Tesla Motors, Tesla Inc. … -> Tesla Inc.
2. Ambiguity
a. “Apple” -> Apple Corps VS Apple Inc.
3. Incomplete Entity Dictionary [Later Section on Taxonomy Creation]
4. Multiple Languages [Later Section]
Web-Scale Entity Recognition
[Cucerzan 2007]
Preprocessing
Entity Recognition (Entity Tagging)
Entity Disambiguation
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla
Forecast
Tesla
Inc.
Two Step Approach
Entity Recognition Entity Disambiguation
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla
Forecast
Tesla
Inc.
Entity Recognition
1. Encoder: Generating features
2. Decoder: Doing classification
a. Classification results:
i. B-Com, B-Ind, B-Loc: Beginning of company, industry…
ii. I-Com, I-Ind, I-Loc: Inside of company, industry, …
iii. O: Outside (nothing important)
Tesla is an Electric vehicle
company located in Palo Alto.
[0.1, 0.3. -0.1, ….]
Tesla [B-Com] is an Electric
[B-Ind] vehicle [I-Ind] company
located in Palo [B-Loc] Alto
[I-Loc].
Encoding
Decoding
Encoder (Feature Generation)
● Traditional features
○ Bag-of-words: TF-IDF, BM25, ...
● Recent methods: Deep learning embedding
○ Model learns how to generate feature vector (embedding)
○ Word-level embedding
○ Sequence-of-word embedding
○ Sequence-of-character embedding
● IMPORTANT: These encoders are used in later sections as well!
Encoder: Word-level feature
● Word embedding: Learn a latent vector (embedding) wv
for each word v
● For each word in the input text, use its latent vector as an input feature
● How to learn embedding?
○ Based on which words co-occur under the same context
○ context: k-gram window
The clouds are in the sky
Famous word embeddings
[Mikolov et al. 2013] [Pennington et al. 2014]
● Glove: embeddings approximate the number of co-occurances
● Word2vec: embeddings approximate the probability of co-occuring
● How to choose?
○ Glove is a little bit simpler (e.g., no negative examples), but they are very similar
○ If you use public embedding, pick one with the best coverage
○ If you train on your own, either one would work
● Limitation of word embedding:
○ Does not consider ordering of words
○ Does not generalize to new words
Encoder: Sequence of Words
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
The clouds are in the sky
I grew up in France … I speak fluent French
RNN
LSTM
Encoder: Sequence of words
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Encoder: Sequence of Characters
Akbik et al. 2018
Encoder: Attention-based
Badhanau et al. 2015
● LSTM: Hidden state (encoding) depends mainly on the previous token
● Attention: Hidden state is computed using all tokens
○ Attention: Weights for each token
○ Worked really well in Machine translation
Encoder: Transformer
http://mlexplained.com/2017/12/29/attention-is-all-you-need-explained/
● Compute attention by Query, Key, Value
○ Query: what information are you looking for?
○ Key: how important is each token?
○ Value: the content of each token
● Encoding = sum (attention[i] * value[i] for each token i)
● Attention[i] ~ query * key[i]
Encoder: Transformer
http://jalammar.github.io/illustrated-transformer/
Encoder: Transformer
● Multi-head attention?
○ Use multiple keys, values, queries
○ Helps understanding different relations among tokens
■ I am taller than Jim (comparison)
■ He is none other than Bill Gates (distinguishing)
● Labels must form a valid sequence
○ Beginning must happen before inside
○ If predict each token independently, nonsense may occur e.g.: Electric [I-Ind] vehicle [B-Ind]
● Use CRF to predict the entire sequence
Classification: CRF
http://www.davidsbatista.net/blog/2017/11/13/Conditional_Random_Fields/
Tesla is Electric vehicle company
[Lample et al. 2016]
Putting it All Together: LSTM + CRF
● Variations
○ Word-LSTM + Char-LSTM + CRF [Abkik 2018]
○ Pretrained Transfermer (BERT) + CRF
Named Entity Recognition: Our Experiences
● Having good encoder (good features) is most important
○ Deep learning model is very powerful, but getting enough training data can be tricky
■ Later today, we discuss how to address this by pretraining
○ If available, domain-specific features are still very useful
■ e.g., If we have list of famous companies, this can be used to generate features
● CRF is easy to add and boosts performance incrementally
Entity Disambiguation
● Problem Definition
○ Input: Text span
○ Output: Entity ID
Tesla is an Electric vehicle
company located in Palo Alto.
Tesla
Forecast
Tesla
Inc.
Entity Disambiguation
● Feature generation (Encoder)
○ Encoder for text span: Text encoders (LSTM, Transformer, …)
○ Encoder for entity-related features
■ Text features (entity description): Text encoders
■ Graph features: Graph encoders [Later]
■ Numerical features (frequency statistics): No encoder needed
● Making prediction (Decoder)
○ Multiclass classification
Conclusion
● Two Problems:
○ Entity Recognition: Identify text spans
○ Entity disambiguation: If there are multiple matching entities, find the best match
● Modeling architecture: Encoder and Decoder
○ Encoder: Generate features
○ Decoder: Make classification using the features
● We will use same encoders in later sections
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:45 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Populate
Relationships
between Social
Network Entities
LinkedIn
Relation Extraction
Tesla is an Electric vehicle
company based in Palo Alto.
Tesla
Inc.
Electric
Vehicle
Palo
Alto,
CA
Specialized_In
Located_In
● Input: (Source entity, Target entity, Sentences)
● Output: Relation
Relation Extraction: Challenges
● Challenges
○ Linguistic variation
■ {“Based in”, “Headquartered in”, …} -> Located_In
○ Ambiguity / Implicity
■ “Electric Vehicle company” -> Specialized_In
Relation Extraction: Machine Learning Methods
● Supervised method:
○ Learn classifier from training data
○ Features
■ Text features using text encoders (Transformer, LSTM, …) [Previous section]
■ Graph features [Later section]
○ Similar to entity disambiguation from the previous section
● Semi-supervised method [This section]
● Unsupervised method [This section]
Distant Supervision
● Downside of Supervised method: Labels are sparse, expensive to get
● Solution: Leverage other database (Freebase) to label the text corpus
○ Mintz et al. 2009 “If two entities belong to a certain relation, any sentence containing those
two entities is likely to express that relation”
Distant Supervision
[Mintz et al. 2009]
Microsoft
Redmond
Microsoft is based in Redmond
Located_In Microsoft is headquartered in
Redmond
Microsoft has its main campus in
Redmond
Positive Example
Source (A): Microsoft
Target (B): Redmond
Sentences: (A is based in B, A is
headquartered in B, … )
Relation: Located_In
Distant Supervision
[Mintz et al. 2009]
Microsoft
Larry
Page
Larry Page said about Microsoft
Larry Page commented on
Microsoft
Negative Example
Source (A): Larry Page
Target (B): Microsoft
Sentences: (A said about B, A
commented on B, … )
Relation: Nothing
Data Programming
Snorkel [Ratner et al. 2018]
Distant Supervision
AggregatorRule-based Annotation
Crowdsourcing
Training Data
Labeling functions
● There are other ways to get weak labels
● Can we combine weak labels to get better labels
Data Programming
● How to aggregate? If weak labels are 1, 0, 1:
○ Majority voting: label = 1
○ Generative Model (GM):
■ Label vector 𝛬: [1, 0, 1], True label Y: Unknown
■ Assume (𝛬, Y) is generated with pw
(𝛬, Y)
■ Learn w
■ Compute pw
(Y|𝛬) using pw
(𝛬, Y)
Data Programming: Takeaways
● Generative model is useful when there is ~10 labels per examples
● In fact, we found that if you do weighted voting in majority voting, it works pretty well
● Key part is to find reasonably good weak labels
Open Information Extraction
● Unsupervised Method: Machine learning without labels
● Input: Entities and sentences
● Output: Relation phrase
Tesla is based in Palo Alto
Tesla is headquartered in Palo
Alto
Tesla has its main campus in
Palo Alto
(Tesla, Palo Alto, based in)
(Tesla, Palo Alto, headquartered
in)
(Tesla, Palo Alto, has main
campus in)
Open Information Extraction: ReVerb
[Fader et al. 2011]
1. From a sentence, take longest phrase satisfying either of 3:
a. a verb (e.g., invented)
b. a verb followed immediately by a preposition (e.g., located in)
c. a verb followed by nouns, adjectives, or adverbs + preposition (e.g., has atomic weight of)
2. If that phrase appears too few times, ignore
3. Apply a binary classifier to compute confidence score
a. Classification: Is the phrase a valid relation phrase?
● Use ML models to extract text spans for a relation and its arguments
○ Same methods as NER models (e.g., BiLSTM + CRF, Transformer + CRF, …)
● RnnOIE [Stanovsky et al. 2018]: BiLSTM tagger
Open Information Extraction: Sequence Tagging
[Stanovsky et al. 2018 ]
Tesla is located in Palo Alto
Tesla [Arg 0] is located in
[relation] Palo Alto [Arg 1].
Conclusion
● Key challenge: Get enough training examples to cover wide linguistic variations
● Semi-supervised methods: Come up with heuristics to get weak labels
● Unsupervised: Extract relation phrases
○ Drawback in industry: To map the phrases to the relations, we need another ML models
● Rule of thumb to choose methods
○ Lots of training examples + Complete relation dictionary: Supervised method
○ Few examples + Complete relation dictionary: Semi-supervised method
○ Very incomplete relation dictionary: Unsupervised method
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Scalable Relationship
Extraction with
Limited Data
LinkedIn
Extending Knowledge Graph to New Datasets
● Task: Construct a knowledge graph (KG) for a new data by leveraging existing KGs and data
● Examples
○ Domain adaptation: Build a KG from a domain-specific text corpus
■ Building a KG specialized for Healthcare industry
○ i18n: Build a knowledge graph for data in a new language
■ We have a KG for English users. Can we do the same thing for German users?
Challenges
● Domain specific annotation is very time consuming
○ Annotators need to have enough knowledge in the domain (or in the language)
○ Annotation tasks need to be clearly designed
○ If either is missing, data quality goes down significantly!
● Deep learning models require lots of data
○ Number of parameters in Transformer encoders: 100s of millions!
Solution: Transfer Learning
Data 1
Data 2
Model 1
Model 2
Data 1
Data 2
Model 1
Model 2
Knowledge
Transfer
Supervised Learning Transfer Learning
Transfer Learning
● Cross-domain transfer learning
○ Train a model on general-domain data and then transfer knowledge
● Cross-lingual transfer learning
○ Train a model in English and then transfer knowledge to other language
Cross-domain Transfer Learning: Pretrained Model
● Train a deep learning model with very large text corpus (Wikipedia and so on.)
○ Training is done without any labels
○ Model learns general patterns in natural language
● Update the model parameters using a small number of labels
○ Since the model knows natural language well, it needs smaller number of labels
BERT (Bidirectional Transformer)
[Devlin et al. 2018]
● Train a Transformer encoder for two prediction tasks
● (1) Masked language modeling (predict next word given a bunch of words)
BERT (Bidirectional Transformer)
[Devlin et al. 2018]
● Train a Transformer encoder for two prediction tasks
● (2) Next Sentence Prediction
■ Example:
■ [CLS] The man went to the store [SEP] He bought a gallon of milk [SEP]
■ Label: IsNext
■ [CLS] The man went to the store [SEP] Penguins cannot fly [SEP]
■ Label: NotNext
BERT: Fine-tuning
[Devlin et al. 2018]
● Fine-tuning: Making incremental updates
on model parameters for a given task
○ Sentence pair classification:
■ (Sentence 1, Sentence 2) -> True / False
○ Single sentence classification
■ (Sentence 1) -> True / False
○ Sequence Tagging (Entity Recognition)
■ (Sentence 1) -> Tags for each token
BERT: Results
[Devlin et al. 2018]
● Pre-trained on ~3B words (Wikipedia, Books)
● After fine-tuning, outperformed other methods in 11 benchmark data sets
○ Fine-tuning works with ~3000 examples
○ Without any task-specific feature engineering
● For entity recognition, BERT works even without fine tuning
BERT: Implication
● Why does it work?
○ Context comes from both direction
○ Provides different ways to fine-tune the model
○ The model seems to learn syntactic structures [Hewitt and Manning 2019]
○ Language models seem to be correlated with multiple application tasks
● We can train sophisticated deep learning model with thousands of samples!
Pre-trained Deep Learning Models
[Sanh et al. 2019]
BERT: Limitations
● Slow serving
○ Distillation [Sahn et al. 2019]
○ Code optimization (ONNX Runtime)
● Handling 2+ sentences together?
○ XLNet [Yang et al. 2019]
● Nearest neighbor search is hard (modular scoring is impossible)
○ SentenceBERT [Reimers and Gurevych 2019]
● Handling very long text
○ Transformer-XL [Dai et al. 2019]
Cross-lingual Transfer Learning
● Assume: We have a training data in English, and developed a ML model (NER or Relation)
● Can we use the data (or the model) for other languages?
Multilingual Encoder
Tesla is located in Palo Alto.
[0.1, 0.3. -0.1, ….]
Encoding
Decoding
Tesla befindet sich in Palo Alto.
[0.1, 0.3. -0.1, ….]
Encoding
Decoding
Tesla Palo Alto
Located_In Tesla Palo Alto
Located_In
● If the encoder gives same feature values for sentences with the same meaning in diff. languages?
○ We can reuse the decoder (classifier)
○ Decoder can be trained with English training data
Multilingual Word Embedding
[Mikolov et al. 2013]
● Words have similar embedding if they mean the same thing
● How to get this? Apply “Translation matrix W”
○ W can be learned from parallel word dictionary X, Y
○ W: orthogonal matrix (Procrustes alignment)
X Y
Four Cuatro
Five Cinco
Horse Caballo
Dog Perro
● What if you do not have parallel word dictionary (X, Y)? We do these two steps
○ Learn translation matrix W (without word pairs (X, Y)) by adversarial learning
○ Construct (X, Y) by best-matching words between Wx and y
Unsupervised Multilingual Embedding
● Adversarial learning: Train two models that compete each other
○ Wx: “Translated” embedding in a source language, y: Embedding in a target language
○ Discriminator D: Detect which language the example comes from
■ e.g., Wx is from the source language
○ Translation W: Fool the discriminator to detect the wrong language
Unsupervised Multilingual Embedding
Multilingual Word Embedding: Our Experience
● Public multilingual embedding may have low coverage on the data set
● Unsupervised word alignment (adversarial learning) works
○ But if you have good parallel word dictionary, Procrustes alignment is easier and better
● Our team’s approach:
○ Step 1: Train embedding on the target data set for each language separately
■ Why? to get good coverage
○ Step 2 (Optional): Run adversarial learning to get parallel word dictionary
○ Step 3: Align by solving Y ~ WX
● Pretraining models that cover multiple languages
○ Multilingual BERT
○ XLM: Masked language from parallel sentences
Multilingual Pretrained Model
[Lample and Conneau 2019]
Adversarial Learning: Sentence Classification
[Chen et al. 2018]
● Word embedding / pretrained models: General encoder
● Can we train multilingual, task-specific encoder?
● Yes. Adversarial learning again!
○ Train these two models that compete each other
■ Generator (Encoder): Generate features
■ Discriminator: Detect language from the encoder output
■ Generator has two purposes
● Help classification (sentiment classification)
● Fool the discriminator
● We have (text in source language, labels)
● Can we convert this to (text in target language, labels)?
○ Use translation to convert text
○ Use heuristics to convert labels
Other Approach: Training Data Augmentation
[Huang et al. 2019]
Cross-lingual Transfer Learning: Our Experience
● Game changers:
○ Cross-lingual encoder (word embedding or pretrained models)
○ Adding some (~500) hand-labelled examples for the target language
● Things that help incrementally
○ Data augmentation by Machine translation
○ Task-specific encoder by Adversarial learning
Conclusion
● Transfer learning: Leverage other datasets to transfer the knowledge
● Cross-domain transfer learning
○ Train a model in one domain and fine tune for the target domain
○ Pretrained deep NLP models are the state-of-the-art (BERT and its variants)
● Cross-lingual transfer learning
○ Multilingual encoder works
○ Use adversarial learning to achieve language invariance
Recap before the break
Task Key technologies to discuss
Named Entity Recognition and
Disambiguation
● Natural Language Understanding Models (LSTM, Transformer)
Relation Extraction ● Semi-supervised method (Distant supervision, Data programming)
● Unsupervised method (Open information extraction)
Scalable Relation Extraction with
Limited Data
● Pretrained deep learning models (BERT and friends)
● Cross-lingual transfer learning (Adversarial learning, Multilingual encoder)
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Scalable Graph
Refinement via
Multi-channel Data
Ingestion
Baoxu Shi
LinkedIn
Scalable Graph
Refinement via
Multi-channel Data
Ingestion
● Ingest large-scale, noisy crowd data
● Social network feedback loop
Definition of Graph Refinement
Graph Refinement is a task that aims to infer and add missing knowledge to the graph, or
identify erroneous information.
Accounting
Data Mining
Algorithms
Director of Engineering
Staff Researcher
Graph
Refinement
Data Mining
Algorithms
Director of Engineering
Staff Researcher
Supervised Learning Reinforcement Learning...
Machine Learning
How to Refine the Graph?
Experts Crowd Machine Learning
Quality High Low to Medium Medium to High
Volume Low Medium High
Cost High Medium Low to Medium
Scalable Graph Refinement
Scalable Graph Refinement aims at refining graph at scale via ingesting large scale data.
Crowd Machine Learning
Data for Scalable Graph Refinement
Structured Data Crowd Labels Social Network User Activity
Challenges for Data Ingestion
Ingest large-scale, noisy crowdsourced data
● Q1: How to leverage existing, large-scale structured data?
● Q2: How to leverage large volume of noisy social network text data?
● Q3: How to aggregate accurate labels from crowd workers?
Social network user feedback loop
● Q4: How to validate knowledge via social signals?
● Q5: How to grow social network via constructed knowledge graph?
● Q6: How to improve social network and knowledge graph jointly?
Ingest Structured Data via Entity Alignment
Q1: How to leverage existing, large-scale structured data?
Entity Alignment between knowledge graphs aims to find entities in two graphs that represent the
same real-world entity.
ITransE-SA (Zhu et al. 2017)
Feature-based entity alignment
Q1: How to leverage existing, large-scale structured data?
The alignment score is determined by the average
string similarity between a node pair and their
neighbor pairs connected via the same edge type.
RDF-AI (Scharffe et al. 2009)
Sim(1, 9)=(StrSim(1,9) + StrSim(2,11) + StrSim(3,12))/3
Requires preprocessing (translation) and schema alignment.
Embedding-based Entity Alignment
Q1: How to leverage existing, large-scale structured data?
ITransE-SA (Zhu et al. 2017) Requires a set of aligned entity seed and schema alignment.
True edge
False edge
Triple loss
Optimize embedding of each graph individually.
Embedding-based Entity Alignment
Q1: How to leverage existing, large-scale structured data?
ITransE-SA (Zhu et al. 2017) Requires a set of aligned entity seed and schema alignment.
Alignment loss
Embedding-based Entity Alignment
Q1: How to leverage existing, large-scale structured data?
(Trisedya et al, 2019) Only requires schema alignment.
Located in (transitive rule)
Located in Located in
Predicates are unified.
Q1: How to leverage existing, large-scale structured data?
(Trisedya et al, 2019) Only requires schema alignment.
Address inconsistent attribute representations:
fa
(“Barack Obama”) ~ fa
(“Barack Hussein Obama”)
fa
(“50.9989”)~ fa
(“50.998888889”)
Embedding-based Entity Alignment
Minimize the distance between structure
embedding and attribute embedding
Align entities by computing
their cosine similarity
Recap on Ingesting Structured Data
● Use RDF-AI as a proof of concept if all nodes have textual features in the same language.
● If the schemas are aligned and nodes have textual features, use Trisedya 2019.
● If the schemas are aligned and existing aligned entity pairs exist, use ITransE-SA.
Ingest Crowdsourced Labels via Answer Aggregation
Q3: How to aggregate accurate labels from crowd workers?
Answer aggregation for crowdsourcing is a task that finds the hidden ground truth from a set of
answers given by the crowd workers.
Work with a team of high-performing analytics, data science professionals, and cross-functional
teams to identify business opportunities and develop algorithms and methodologies to address them.
Does the following job description sentence requires data science skill?
Yes or No?Aggregation
Q3: How to aggregate accurate labels from crowd workers?
Answer Ingestion
Social Honeypot (Lee et al, 2010)
reCAPTCHA (Von Ahn et al. 2008)
Answer Filtering
(Remove workers who failed trapping questions)
Trapping question (ground truth)
● Majority vote (Kuncheva et al. 2003)
● Weight answers by worker expertise & question difficulty
○ Trapping question-based (Khattak and Salleb-Aouissi 2011)
○ Supervised EM (binary label only) (Raykar et al, 2009)
● Snorkel (Ratner et al., 2017)
Answer
Aggregation
(Hung et al. 2013)
Generative Answer Aggregation -- Snorkel
Q3: How to aggregate accurate labels from crowd workers?
minstances
n workers
Label matrix Λ
human provided label y or ø if no judgement.
Instance i has label from worker j
Instance i has label yi
from worker j
Worker j and k has the same label
normalizing constant
concatenation of three vectors
true label is unknownuse contrastive divergence to
solve w without ground truth labels
probabilistic training label
Recap on Ingesting CrowdSourced Labels
● Always use trapping questions to filter out low quality answers/workers.
● Use majority vote as the baseline to aggregate answers.
● To further improve the answer quality, use Snorkel to aggregate the labels.
Knowledge Validation via Social Signals
Q4: How to validate knowledge via social signals?
Knowledge validation via social signals is a task that aims at validating factual knowledge graph
information by collecting signals from end-users directly.
Name Quality Cost Scale Setting
Crowd Workers Mid Mid to High Small to Mid Usually single task
Social Signals Mid to High Low Large Multi-task
LinkedIn’s skill validationGoogle Map’s Venue Questions
Social Signal Knowledge Validation in Google Map
Q4: How to validate knowledge via social signals?
Social signal collection for each
(location, attribute) pair
(Kobren. et.al 2019)
location l
attribute a
count of yes vote
count of votes
yes vote rate
expected yes rate certainty of
expected yes rate
Model the member’s voting behavior as a beta distribution
Use user vote’s to construct knowledge base for locations.
Social Signal Knowledge Validation in Google Map
Q4: How to validate knowledge via social signals?
(Kobren. et.al 2019)
location-features
(natural language text)
aggregated vote of other attributes
(raw count, majority vote, etc)
generated location
embedding w.r.t. attribute0
expected yes rate
certainty of expected yes rate
(determines false positive rate)
output
LinkedIn’s Social Skill Validation
Q4: How to validate knowledge via social signals?
(Yan et al. 2019)
Use member actions to learn the skill expertise of our members.
LinkedIn Skill Endorsement Product
Yes/yes question: Users can act without judgement.
No anonymity: Users use as a social gesture .
LinkedIn’s Social Skill Validation
Q4: How to validate knowledge via social signals?
(Yan et al. 2019)
Use member actions to learn the skill expertise of our members.
Compare a connection’s skills within a certain category
Normalizing the score given by the
user to remedy social gesture.
LinkedIn’s Social Skill Validation
Q4: How to validate knowledge via social signals?
(Yan et al. 2019)
Use member actions to learn the skill expertise of our members.
Ask viewer to rank the skill level of candidates
candidate skill
viewer
Uses a ML model to provide candidates.
LinkedIn’s Social Skill Validation
Q4: How to validate knowledge via social signals?
(Yan et al. 2019)
Use member actions to learn the skill expertise of our members.
Multi-task Model (member, skill, expertise score)
Recap on knowledge validation via social signals
● The social signals still requires answer aggregation.
● The design of social signal collection is crucial for data quality.
Knowledge Graph guided Social Link Prediction
Q5: How to grow social network via constructed knowledge graph?
Given a social network and a knowledge graph, Knowledge graph guided social link prediction
aims to predict member - member connections using the knowledge graph.
Social Network
Knowledge Graph
Knowledge
Graph guided
Social Link
Prediction
Matrix Factorization for Social Link Prediction
nmembers
n members
(Menon and Elkan. 2011)
i-j connected?
node embedding
node bias regularization
Purely based on topological information,
Q5: How to grow social network via constructed knowledge graph?
Social Link Prediction using Member Attributes
(Zhang et al. 2018)
Reconstructs weighted
average of neighbors’
attributes
Predicts skip-gram neighbors’
structural embedding.
f(x)
v
one-hot attribute vector
Learns attribute embeddings implicitly.
How Skip-gram works?
Q5: How to grow social network via constructed knowledge graph?
Social Link Prediction using Member Attributes
(Meng et al, 2019)
attribute graph
adjacent matrix
one-hot attribute vector
one-hot node vector
Reconstructed
member-member graph
Reconstructed
member-attribute graph
Attribute embeddings is a function of members and hence for social link prediction only.
Q5: How to grow social network via constructed knowledge graph?
Joint Representation Learning on Social and
Knowledge Graph
Q6: How to improve social network and knowledge graph jointly?
Given a social network and a knowledge graph, we want to learn node representations to refine
both graph jointly.
An example of social network + knowledge graph.
Ambiguous Social Connections: Person-Person connections are ambiguous.
An illustration of LinkedIn’s Heterogeneous Social Network
Colleague
candidate-recruiter
Q6: How to improve social network and knowledge graph jointly?
Joint Representation Learning on Social and
Knowledge Graph
(Shi et al, 2019)
Corrupted Higher-order Proximity: Cannot learn meaningful entity embeddings.
An illustration of LinkedIn’s Heterogeneous Social Network
candidate-recruiter
Not similar because
candidate-recruiter
relationship does not
indicate occupation
similarity.
Q6: How to improve social network and knowledge graph jointly?
Joint Representation Learning on Social and
Knowledge Graph
(Shi et al, 2019)
Joint Representation Learning on Social and
Knowledge Graph
The learned embeddings can be used to predict connections between two arbitrary types.
(Shi et al, 2019)
Q6: How to improve social network and knowledge graph jointly?
Methods to refine a graph in a scalable way
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
● Use graph alignment to ingest large volume of knowledge from external knowledge graphs
● Use Snorkel to aggregate and denoise crowdsourced labeled data for graph refinement.
● Design social feedback loops to collect social signals, aggregate them and refine the graph.
● Use representation learning to refine knowledge graph and social network.
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Automatic
Taxonomy Expansion
Baoxu Shi,
LinkedIn
Taxonomy Examples
“Taxonomy is the practice and science of classification of things or
concepts, including the principles that underlie such classification”
Ciaramita, Massimiliano, et al. "Hierarchical preferences in a broad-coverage lexical taxonomy." Proceedings of the Annual Meeting of the Cognitive Science Society. Vol. 27. No. 27. 2005.
Wheeler, David L., et al. "Database resources of the national center for biotechnology information." Nucleic acids research 36.suppl_1 (2007): D13-D21.
WordNet NCBI Taxonomy Common Tree
Taxonomy Examples
“Taxonomy is the practice and science of classification of things or
concepts, including the principles that underlie such classification”
SOC, URL: https://www.bls.gov/soc/
Library card catalog picture credit: https://www.smithsonianmag.com/smart-news/card-catalog-dead-180956823/
US Bureau of Labor Statistics - SOC Library Card Catalog
Is Taxonomy a Knowledge Graph?
Machine Learning
Data Mining
Algorithms
Director of Engineering
Staff Researcher
Knowledge Graph describes relationships
between real-world entities.
(No hierarchical information)
Taxonomy describes the classification of
real-world entities and concepts.
(Has hierarchical information)
Analytics Software Development Computer Science
Data Mining Machine Learning Algorithms
Specific<-General
Scikit-Learn
Many taxonomies are constructed manually
Name Creator/Organizer Domain Scale Method
O*NET US Department of Labor Occupation 1167 Manual
SOC US Bureau of Labor Statistics Occupation 867 Manual
WordNet Princeton University Noun & Verbs 155,327 Manual
Global WordNet VU University Amsterdam various various Transfer & merge
NCBI Taxonomy National Center for
Biotechnology Information
Biology 657,846 Manual
LCC Library of Congress Library 227 Manual
Updating the O*NET-SOC Taxonomy URL: https://www.onetcenter.org/dl_files/UpdatingTaxonomy_Summary.pdf
Federal Register Notice, URL: https://www.bls.gov/soc/2018/soc2018final.pdf
Miller, George A. WordNet: An electronic lexical database. MIT press, 1998.
Fellbaum, Christiane. "A semantic network of english: the mother of all WordNets." EuroWordNet: A multilingual database with lexical semantic networks. Springer, Dordrecht, 1998. 137-148.
Bo Svensén. 2009. A Handbook of Lexicography. The Theory and Practice of Dictionary-Making. Cambridge University Press.
The NCBI Taxonomy database, URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245000/
Library of Congress Classification Outline, URL: http://www.loc.gov/aba/cataloging/classification/lcco/
Majority of the taxonomies are constructed manually by domain experts.
Updating taxonomy manually is time-consuming
On average the update rate of O*NET is 1.6 occupation per day.
O*NET Occupation Update Summary, URL: https://www.onetcenter.org/dataUpdates.html
Automatic Taxonomy Construction (ATC)
Problem Definition: Given text corpus and/or auxiliary data, construct a directed taxonomy
graph G=(V,E), where V is a set of taxonomy entities, and E is a set of directed edges (u -> v).
and/or
Text Corpus
Auxiliary Data
ATC Model
Inducted Taxonomy
Challenges of Automatic Taxonomy Construction
● Q1: How to ensure high precision of the constructed taxonomy?
● Q2: How to ensure high recall of the constructed taxonomy?
● Q3: How to reduce the need of large volume in-domain corpora?
Hearst Patterns model
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.
Text Corpus
Patterns generated manually
Pattern
Matching
Hypernym(wound, injury)
Hypernym(broken bone, injury)
Hypernym(treasury, civic building)
Hypernym(England, common-law country)
...
Extracted X-isA-Y relationships
(Hearst, 1992&1998)
Q1: How to ensure high precision of the constructed taxonomy?
Discover Hearst Patterns
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.
(Hearst, 1992&1998)
Q1: How to ensure high precision of the constructed taxonomy?
Four steps to automatically create Hearst patterns:
1.Collect noun pairs from corpora, identifying hypernym pairs using
WordNet.
2.For each noun pair, collect sentences in which both nouns occur.
3.Parse the sentences and extract patterns from the parse tree.
4.Train a hypernym classifier based on these features.
Four steps to manually create Hearst patterns:
1.Decide a lexical relation (e.g. is-A).
2.Gather a list of term pairs for which the relation is known to hold
(e.g. [Java, Programming Language])
3.Collect sentences where both terms from a term pair appear.
(e.g. Java is a general-purpose programming language)
4.Find common patterns that indicates the relation of interest
(e.g. X is a (adj.) Y)
(Snow, et. al., 2004)
Limitation of Pattern-based models
● Rule-based method has low recall.
● Performance relies on the completeness of patterns.
● Creating patterns is time-consuming.
● Can only extract relationships between co-occurred entities.
Distributional models can identify relationships between unobserved entity pairs.
Distributional Methods
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.
Q2: How to ensure high recall of the constructed taxonomy?
(Cederberg and Widdows, 2003)
Score a pair of x,y by cosine(hx
, hy
)
Top-1000 non-stop words
Allphrasesinthecorpus
Co-occur count
V*U ∑
Single-value-decomposition
hx
hy
word vector
Hearst + Distributional Methods
Cederberg, Scott, and Dominic Widdows. "Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction." HLT-NAACL, 2003.
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." ACL. 2018.
Q2: How to ensure high recall of the constructed taxonomy?
Score a pair of x,y by spmi(x,y) = ux
∑r
vy
All phrases in the corpus
Allphrasesinthecorpus
V*U ∑r
Single-value-decomposition
ux
vy
Score hypernym(x,y) by
● P(x,y) = count of extracted hypernym(x,y) / total extractions
● Positive Pointwise Mutual Information (Roller et. al., 2018)
Hearst + Distributional Methods
LSTM + Pattern + Word Embedding
(Shwartz et. al., 2016)
+ TF-IDF features
+ Reinforcement Learning
(Mao et. al., 2018)
Limitation of Distributional methods
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
● Requires large amount of in-domain text corpora.
● Can only extract relationships between entities exist in the text corpora.
● Lexical memorization (memorizing certain words correlate with certain label) (Levy et al. 2015)
Instead of requiring in-domain text corpora, one can induce a taxonomy from existing taxonomy.
Q2: How to ensure high recall of the constructed taxonomy?
Keyword + General Purpose Taxonomy
Q3: How to reduce the need of large volume of in-domain text corpora?
(Liu et. al., 2011)
Domain-specific
Keywords
Indiana cheap car insurance
Parent concept probabilities
Search keywords’ context
Bag-of-word
vector
Hierarchical
Clustering
Domain-specific Taxonomy
General purpose taxonomy
Refine Taxonomy via Hyperbolic Embeddings
Q3: How to reduce the need of large volume of in-domain text corpora?
(Nickel and Kiela, 2017, Le et. al., 2019)
Existing taxonomy graph
in the format of (u,v) edges
Hyperbolic
Embedding
Model
Learned taxonomy
Hyperbolic embedding models infer taxonomy from existing graphs instead of text corpora.
Steps to create a taxonomy
Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
● Collect large amount of in-domain data
● Use Hearst rules to extract high precision taxonomy (Hearst 1998, Snow 2014)
● Use distributional method to improve recall (Roller 2018, Mao 2018)
● Use hyperbolic embedding to refine taxonomy structure (Le 2019)
● Extend your taxonomy using new entities / keywords and a general-purpose taxonomy (Liu 2011)
Tutorial’s
Agenda
09:00 Introduction
11:45 Automated Taxonomy Expansion
09:05 Overview of LinkedIn’s Knowledge Graph and Applications
09:15 Named Entity Recognition
09:35 Populate Relationships between Social Network Entities
10:00 Scalable Relationship Extraction with Limited Data
10:30 Coffee Break
11:00 Scalable Graph Refinement via Multi-channel Data Ingestion
12:00 Conclusion and Q&A
Part 1: Construct high-quality knowledge graph for social networks
Part 2: Weakly-supervised, scalable social network knowledge graph construction
Questions?
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using PythonShirin Mojarad, Ph.D.
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicineDeakin University
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine LearningKnoldus Inc.
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
The 7 steps of Machine Learning
The 7 steps of Machine LearningThe 7 steps of Machine Learning
The 7 steps of Machine LearningWaziri Shebogholo
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Artificial Intelligence fundamentals | Machine Learning | Deep Learning
Artificial Intelligence fundamentals | Machine Learning | Deep LearningArtificial Intelligence fundamentals | Machine Learning | Deep Learning
Artificial Intelligence fundamentals | Machine Learning | Deep LearningKrishnaMildain
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 

Was ist angesagt? (20)

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicine
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine Learning
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
The 7 steps of Machine Learning
The 7 steps of Machine LearningThe 7 steps of Machine Learning
The 7 steps of Machine Learning
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Artificial Intelligence fundamentals | Machine Learning | Deep Learning
Artificial Intelligence fundamentals | Machine Learning | Deep LearningArtificial Intelligence fundamentals | Machine Learning | Deep Learning
Artificial Intelligence fundamentals | Machine Learning | Deep Learning
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 

Ähnlich wie Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way

Forrest Iandola: How to become a full-stack deep learning engineer
Forrest Iandola: How to become a full-stack deep learning engineerForrest Iandola: How to become a full-stack deep learning engineer
Forrest Iandola: How to become a full-stack deep learning engineerForrest Iandola
 
CSE NEW_4th yr w.e.f. 2018-19.pdf
CSE NEW_4th yr w.e.f. 2018-19.pdfCSE NEW_4th yr w.e.f. 2018-19.pdf
CSE NEW_4th yr w.e.f. 2018-19.pdfssuser5a7261
 
Architecting-Flow-in-SE.pdf
Architecting-Flow-in-SE.pdfArchitecting-Flow-in-SE.pdf
Architecting-Flow-in-SE.pdfGail Murphy
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesDebmalya Biswas
 
DIPEN AMIT MISTRY Resumea
DIPEN AMIT MISTRY ResumeaDIPEN AMIT MISTRY Resumea
DIPEN AMIT MISTRY ResumeaAtul Sharma
 
Career_camp_professionals.pdf
Career_camp_professionals.pdfCareer_camp_professionals.pdf
Career_camp_professionals.pdfPrajyotSwami2
 
Machine Learning with Data Science Online Course | Learn and Build
 Machine Learning with Data Science Online Course | Learn and Build  Machine Learning with Data Science Online Course | Learn and Build
Machine Learning with Data Science Online Course | Learn and Build Learn and Build
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesCédric Fauvet
 
A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...
A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...
A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...Dr. Daniel Downs
 
Develop Communication using Virtual Reality and Machine Learning
Develop Communication using Virtual Reality and Machine LearningDevelop Communication using Virtual Reality and Machine Learning
Develop Communication using Virtual Reality and Machine LearningIRJET Journal
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedInside Analysis
 
Hemraj_Patil_Resume
Hemraj_Patil_ResumeHemraj_Patil_Resume
Hemraj_Patil_ResumeHemraj Patil
 

Ähnlich wie Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way (20)

Forrest Iandola: How to become a full-stack deep learning engineer
Forrest Iandola: How to become a full-stack deep learning engineerForrest Iandola: How to become a full-stack deep learning engineer
Forrest Iandola: How to become a full-stack deep learning engineer
 
Gobinath.T Resume - Copy
Gobinath.T Resume - CopyGobinath.T Resume - Copy
Gobinath.T Resume - Copy
 
CSE NEW_4th yr w.e.f. 2018-19.pdf
CSE NEW_4th yr w.e.f. 2018-19.pdfCSE NEW_4th yr w.e.f. 2018-19.pdf
CSE NEW_4th yr w.e.f. 2018-19.pdf
 
Architecting-Flow-in-SE.pdf
Architecting-Flow-in-SE.pdfArchitecting-Flow-in-SE.pdf
Architecting-Flow-in-SE.pdf
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML Services
 
Een andere kijk op Microservices
Een andere kijk op MicroservicesEen andere kijk op Microservices
Een andere kijk op Microservices
 
Raju_Resume
Raju_ResumeRaju_Resume
Raju_Resume
 
DIPEN AMIT MISTRY Resumea
DIPEN AMIT MISTRY ResumeaDIPEN AMIT MISTRY Resumea
DIPEN AMIT MISTRY Resumea
 
Career_camp_professionals.pdf
Career_camp_professionals.pdfCareer_camp_professionals.pdf
Career_camp_professionals.pdf
 
Machine Learning with Data Science Online Course | Learn and Build
 Machine Learning with Data Science Online Course | Learn and Build  Machine Learning with Data Science Online Course | Learn and Build
Machine Learning with Data Science Online Course | Learn and Build
 
Guru Prasad(Resume)(1)
Guru Prasad(Resume)(1)Guru Prasad(Resume)(1)
Guru Prasad(Resume)(1)
 
ShriNew
ShriNewShriNew
ShriNew
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...
A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...
A Rubric For District Robotics Success: A Buyer's Guide & Hands On Experience...
 
Resume
ResumeResume
Resume
 
ResumeDinakaran
ResumeDinakaranResumeDinakaran
ResumeDinakaran
 
Gaurav agarwal
Gaurav agarwalGaurav agarwal
Gaurav agarwal
 
Develop Communication using Virtual Reality and Machine Learning
Develop Communication using Virtual Reality and Machine LearningDevelop Communication using Virtual Reality and Machine Learning
Develop Communication using Virtual Reality and Machine Learning
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has Arrived
 
Hemraj_Patil_Resume
Hemraj_Patil_ResumeHemraj_Patil_Resume
Hemraj_Patil_Resume
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way

  • 1. Qi He Director of Engineering qhe@linkedin.com Jaewon Yang Baoxu Shi Senior Engineer dashi@linkedin.com Senior Staff Engineer jeyang@linkedin.com Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
  • 2. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 3. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 4. Introduction Qi He Director of Engineering, LinkedIn
  • 5. Overview This tutorial will be successful if: - You learn the problem statement of constructing knowledge graph for social networks and its technical challenges - You learn the opportunities for tackling the technical challenges of the problem - You learn the state-of-the-arts and our experiences of the solutions Preliminary knowledge Knowledge Graph Construction is a process of creating structured data: 1) a canonical representation for every entity, 2) relationships between entities. Methods: 1) Human curation, 2) AI modeling (ML/NLP), 3) Data ingestion
  • 6. Problem Statement - Knowledge Graph Construction for Social Networks 1) input data for each member in the social network is noisy, implicit and in multilingual 2) KG and the social network influence each other via multiple organic feedback loops
  • 7. Opportunity A deep and holistic way is the best strategy to tackle the technical challenges. Deep: develop deep NLP models to deeply understand the input data - noisy and implicit: train high precision language understanding models by adding small clean data to the noisy data - multilingual: expand a single-language KG to multilingual KGs by applying deep transfer learning models Holistic: grow social network and KG together via their model interactions - refine KG by learning deep embeddings from the social network - grow social network by learning deep embeddings from KG - launch new products to get explicit feedback on KG from social network members
  • 8. Q1: How can we recognize existing entities and expand new entities from noisy and multilingual text? - The encoder and decoder NLU approach - Pattern + deep learning based auto-taxonomy expansion Q2: How can we construct entity relationships with limited input data? - Unsupervised learning - Semi-supervised learning - Pre-trained deep learning models (BERT family) - Cross-lingual transfer learning (Adversarial learning, Multilingual encoder) Q3: How can we refine KG by ingesting data from social network? - Embedding-based entity alignment between social network and KG - Joint representation learning on social network and KG - Probabilistic member feedback (label/answer) aggregation from social network The Three Technical Questions inside Opportunity
  • 9. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 10. Overview of LinkedIn’s Knowledge Graph and Applications Qi He Director of Engineering, LinkedIn
  • 11. LinkedIn Knowledge Graph (aka Economic Graph) Member input data 675M members 50M+ orgs 400+ industriesCertificates, degrees & more… 200+ countries 50K+ skills 25K+ titles Roles, occupations States, cities, postal codes… Tools, products, technologies… Specialty
  • 12. LinkedIn skill taxonomy example LinkedIn unique asset Skill identity Skill type Relationships ID:207 Definition: http://en.wikipedia.org/wiki/Graphic_design Canonical name: EN: Graphic Design Zh_CN: 平面设计 ... Aliases: "fr_FR": [ "concepteurs graphiques", "editorial design"], "en_US": ….. Skill type: hard → industry experience Soft skills Hard skills Tools & Technologies Spoken languages Industry experience Design Graphic Design Adobe Photoshop
  • 13. Skill or not a Skill? fear of flying Skills Titles phobias; self-esteem; stress management; hypnotherapy hypnotherapist; psychotherapist; intelligence national security; military operations; security clearance; Military Intelligence Officer, Tactical Intelligence Officer headaches holistic health; sports injuries; neck pain; nutrition; chiropractor; massage therapist; acupuncturist
  • 14. Understanding Member skills Exclude references to the company Include skills that relate to the member with a high confidence score Exclude skills with a lower confidence score
  • 15. To power all LinkedIn product Knowledge Graph Jobs Search Job Recommendation Recruiter Search Talent Insights Jobs SEO Profile Page People Search SEO New-member onboarding PYMK Premium ProFinder EGR Notifications GSO Ads Pages Sales Navigator LSI Merlin Courses
  • 16. Unlock the full potential of the LinkedIn Economic Graph
  • 17. Enable positive flywheel effect in LinkedIn ecosystem Input signals Graph construction Deliver value Engagement
  • 18. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:45 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 20. ● Set of triples (Source entity, Relation, Target entity) ○ (Bill Gates, Founder, Microsoft) ○ (Microsoft, Located_In, Redmond) ○ ... ● Canonical representation of relations among entities ● Examples: ○ Google Knowledge Graph ○ Microsoft Satori ○ Freebase ○ LinkedIn Economic Graph Knowledge Graph Microsoft Bill Gates Redmond Founder Located_In
  • 21. Knowledge Graph Construction Tesla is an Electric vehicle company based in Palo Alto. Tesla Inc. Electric Vehicle Palo Alto, CA Specialized_In Located_In
  • 22. Task 1: Named Entity Recognition Tesla is an Electric vehicle company located in Palo Alto. Tesla Inc. Electric Vehicle Palo Alto, CA
  • 23. Task 2: Relation Extraction [Next Section] Tesla is an Electric vehicle company located in Palo Alto. Tesla Inc. Electric Vehicle Palo Alto, CA Specialized_In Located_In
  • 24. Named Entity Recognition: Challenges 1. Name Variation a. Tesla, Tesla Motors, Tesla Inc. … -> Tesla Inc. 2. Ambiguity a. “Apple” -> Apple Corps VS Apple Inc. 3. Incomplete Entity Dictionary [Later Section on Taxonomy Creation] 4. Multiple Languages [Later Section]
  • 25. Web-Scale Entity Recognition [Cucerzan 2007] Preprocessing Entity Recognition (Entity Tagging) Entity Disambiguation Tesla is an Electric vehicle company located in Palo Alto. Tesla is an Electric vehicle company located in Palo Alto. Tesla Forecast Tesla Inc.
  • 26. Two Step Approach Entity Recognition Entity Disambiguation Tesla is an Electric vehicle company located in Palo Alto. Tesla is an Electric vehicle company located in Palo Alto. Tesla is an Electric vehicle company located in Palo Alto. Tesla Forecast Tesla Inc.
  • 27. Entity Recognition 1. Encoder: Generating features 2. Decoder: Doing classification a. Classification results: i. B-Com, B-Ind, B-Loc: Beginning of company, industry… ii. I-Com, I-Ind, I-Loc: Inside of company, industry, … iii. O: Outside (nothing important) Tesla is an Electric vehicle company located in Palo Alto. [0.1, 0.3. -0.1, ….] Tesla [B-Com] is an Electric [B-Ind] vehicle [I-Ind] company located in Palo [B-Loc] Alto [I-Loc]. Encoding Decoding
  • 28. Encoder (Feature Generation) ● Traditional features ○ Bag-of-words: TF-IDF, BM25, ... ● Recent methods: Deep learning embedding ○ Model learns how to generate feature vector (embedding) ○ Word-level embedding ○ Sequence-of-word embedding ○ Sequence-of-character embedding ● IMPORTANT: These encoders are used in later sections as well!
  • 29. Encoder: Word-level feature ● Word embedding: Learn a latent vector (embedding) wv for each word v ● For each word in the input text, use its latent vector as an input feature ● How to learn embedding? ○ Based on which words co-occur under the same context ○ context: k-gram window The clouds are in the sky
  • 30. Famous word embeddings [Mikolov et al. 2013] [Pennington et al. 2014] ● Glove: embeddings approximate the number of co-occurances ● Word2vec: embeddings approximate the probability of co-occuring ● How to choose? ○ Glove is a little bit simpler (e.g., no negative examples), but they are very similar ○ If you use public embedding, pick one with the best coverage ○ If you train on your own, either one would work ● Limitation of word embedding: ○ Does not consider ordering of words ○ Does not generalize to new words
  • 31. Encoder: Sequence of Words https://colah.github.io/posts/2015-08-Understanding-LSTMs/ The clouds are in the sky I grew up in France … I speak fluent French RNN LSTM
  • 32. Encoder: Sequence of words https://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 33. Encoder: Sequence of Characters Akbik et al. 2018
  • 34. Encoder: Attention-based Badhanau et al. 2015 ● LSTM: Hidden state (encoding) depends mainly on the previous token ● Attention: Hidden state is computed using all tokens ○ Attention: Weights for each token ○ Worked really well in Machine translation
  • 35. Encoder: Transformer http://mlexplained.com/2017/12/29/attention-is-all-you-need-explained/ ● Compute attention by Query, Key, Value ○ Query: what information are you looking for? ○ Key: how important is each token? ○ Value: the content of each token ● Encoding = sum (attention[i] * value[i] for each token i) ● Attention[i] ~ query * key[i]
  • 37. Encoder: Transformer ● Multi-head attention? ○ Use multiple keys, values, queries ○ Helps understanding different relations among tokens ■ I am taller than Jim (comparison) ■ He is none other than Bill Gates (distinguishing)
  • 38. ● Labels must form a valid sequence ○ Beginning must happen before inside ○ If predict each token independently, nonsense may occur e.g.: Electric [I-Ind] vehicle [B-Ind] ● Use CRF to predict the entire sequence Classification: CRF http://www.davidsbatista.net/blog/2017/11/13/Conditional_Random_Fields/ Tesla is Electric vehicle company
  • 39. [Lample et al. 2016] Putting it All Together: LSTM + CRF ● Variations ○ Word-LSTM + Char-LSTM + CRF [Abkik 2018] ○ Pretrained Transfermer (BERT) + CRF
  • 40. Named Entity Recognition: Our Experiences ● Having good encoder (good features) is most important ○ Deep learning model is very powerful, but getting enough training data can be tricky ■ Later today, we discuss how to address this by pretraining ○ If available, domain-specific features are still very useful ■ e.g., If we have list of famous companies, this can be used to generate features ● CRF is easy to add and boosts performance incrementally
  • 41. Entity Disambiguation ● Problem Definition ○ Input: Text span ○ Output: Entity ID Tesla is an Electric vehicle company located in Palo Alto. Tesla Forecast Tesla Inc.
  • 42. Entity Disambiguation ● Feature generation (Encoder) ○ Encoder for text span: Text encoders (LSTM, Transformer, …) ○ Encoder for entity-related features ■ Text features (entity description): Text encoders ■ Graph features: Graph encoders [Later] ■ Numerical features (frequency statistics): No encoder needed ● Making prediction (Decoder) ○ Multiclass classification
  • 43. Conclusion ● Two Problems: ○ Entity Recognition: Identify text spans ○ Entity disambiguation: If there are multiple matching entities, find the best match ● Modeling architecture: Encoder and Decoder ○ Encoder: Generate features ○ Decoder: Make classification using the features ● We will use same encoders in later sections
  • 44. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:45 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 46. Relation Extraction Tesla is an Electric vehicle company based in Palo Alto. Tesla Inc. Electric Vehicle Palo Alto, CA Specialized_In Located_In ● Input: (Source entity, Target entity, Sentences) ● Output: Relation
  • 47. Relation Extraction: Challenges ● Challenges ○ Linguistic variation ■ {“Based in”, “Headquartered in”, …} -> Located_In ○ Ambiguity / Implicity ■ “Electric Vehicle company” -> Specialized_In
  • 48. Relation Extraction: Machine Learning Methods ● Supervised method: ○ Learn classifier from training data ○ Features ■ Text features using text encoders (Transformer, LSTM, …) [Previous section] ■ Graph features [Later section] ○ Similar to entity disambiguation from the previous section ● Semi-supervised method [This section] ● Unsupervised method [This section]
  • 49. Distant Supervision ● Downside of Supervised method: Labels are sparse, expensive to get ● Solution: Leverage other database (Freebase) to label the text corpus ○ Mintz et al. 2009 “If two entities belong to a certain relation, any sentence containing those two entities is likely to express that relation”
  • 50. Distant Supervision [Mintz et al. 2009] Microsoft Redmond Microsoft is based in Redmond Located_In Microsoft is headquartered in Redmond Microsoft has its main campus in Redmond Positive Example Source (A): Microsoft Target (B): Redmond Sentences: (A is based in B, A is headquartered in B, … ) Relation: Located_In
  • 51. Distant Supervision [Mintz et al. 2009] Microsoft Larry Page Larry Page said about Microsoft Larry Page commented on Microsoft Negative Example Source (A): Larry Page Target (B): Microsoft Sentences: (A said about B, A commented on B, … ) Relation: Nothing
  • 52. Data Programming Snorkel [Ratner et al. 2018] Distant Supervision AggregatorRule-based Annotation Crowdsourcing Training Data Labeling functions ● There are other ways to get weak labels ● Can we combine weak labels to get better labels
  • 53. Data Programming ● How to aggregate? If weak labels are 1, 0, 1: ○ Majority voting: label = 1 ○ Generative Model (GM): ■ Label vector 𝛬: [1, 0, 1], True label Y: Unknown ■ Assume (𝛬, Y) is generated with pw (𝛬, Y) ■ Learn w ■ Compute pw (Y|𝛬) using pw (𝛬, Y)
  • 54. Data Programming: Takeaways ● Generative model is useful when there is ~10 labels per examples ● In fact, we found that if you do weighted voting in majority voting, it works pretty well ● Key part is to find reasonably good weak labels
  • 55. Open Information Extraction ● Unsupervised Method: Machine learning without labels ● Input: Entities and sentences ● Output: Relation phrase Tesla is based in Palo Alto Tesla is headquartered in Palo Alto Tesla has its main campus in Palo Alto (Tesla, Palo Alto, based in) (Tesla, Palo Alto, headquartered in) (Tesla, Palo Alto, has main campus in)
  • 56. Open Information Extraction: ReVerb [Fader et al. 2011] 1. From a sentence, take longest phrase satisfying either of 3: a. a verb (e.g., invented) b. a verb followed immediately by a preposition (e.g., located in) c. a verb followed by nouns, adjectives, or adverbs + preposition (e.g., has atomic weight of) 2. If that phrase appears too few times, ignore 3. Apply a binary classifier to compute confidence score a. Classification: Is the phrase a valid relation phrase?
  • 57. ● Use ML models to extract text spans for a relation and its arguments ○ Same methods as NER models (e.g., BiLSTM + CRF, Transformer + CRF, …) ● RnnOIE [Stanovsky et al. 2018]: BiLSTM tagger Open Information Extraction: Sequence Tagging [Stanovsky et al. 2018 ] Tesla is located in Palo Alto Tesla [Arg 0] is located in [relation] Palo Alto [Arg 1].
  • 58. Conclusion ● Key challenge: Get enough training examples to cover wide linguistic variations ● Semi-supervised methods: Come up with heuristics to get weak labels ● Unsupervised: Extract relation phrases ○ Drawback in industry: To map the phrases to the relations, we need another ML models ● Rule of thumb to choose methods ○ Lots of training examples + Complete relation dictionary: Supervised method ○ Few examples + Complete relation dictionary: Semi-supervised method ○ Very incomplete relation dictionary: Unsupervised method
  • 59. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 61. Extending Knowledge Graph to New Datasets ● Task: Construct a knowledge graph (KG) for a new data by leveraging existing KGs and data ● Examples ○ Domain adaptation: Build a KG from a domain-specific text corpus ■ Building a KG specialized for Healthcare industry ○ i18n: Build a knowledge graph for data in a new language ■ We have a KG for English users. Can we do the same thing for German users?
  • 62. Challenges ● Domain specific annotation is very time consuming ○ Annotators need to have enough knowledge in the domain (or in the language) ○ Annotation tasks need to be clearly designed ○ If either is missing, data quality goes down significantly! ● Deep learning models require lots of data ○ Number of parameters in Transformer encoders: 100s of millions!
  • 63. Solution: Transfer Learning Data 1 Data 2 Model 1 Model 2 Data 1 Data 2 Model 1 Model 2 Knowledge Transfer Supervised Learning Transfer Learning
  • 64. Transfer Learning ● Cross-domain transfer learning ○ Train a model on general-domain data and then transfer knowledge ● Cross-lingual transfer learning ○ Train a model in English and then transfer knowledge to other language
  • 65. Cross-domain Transfer Learning: Pretrained Model ● Train a deep learning model with very large text corpus (Wikipedia and so on.) ○ Training is done without any labels ○ Model learns general patterns in natural language ● Update the model parameters using a small number of labels ○ Since the model knows natural language well, it needs smaller number of labels
  • 66. BERT (Bidirectional Transformer) [Devlin et al. 2018] ● Train a Transformer encoder for two prediction tasks ● (1) Masked language modeling (predict next word given a bunch of words)
  • 67. BERT (Bidirectional Transformer) [Devlin et al. 2018] ● Train a Transformer encoder for two prediction tasks ● (2) Next Sentence Prediction ■ Example: ■ [CLS] The man went to the store [SEP] He bought a gallon of milk [SEP] ■ Label: IsNext ■ [CLS] The man went to the store [SEP] Penguins cannot fly [SEP] ■ Label: NotNext
  • 68. BERT: Fine-tuning [Devlin et al. 2018] ● Fine-tuning: Making incremental updates on model parameters for a given task ○ Sentence pair classification: ■ (Sentence 1, Sentence 2) -> True / False ○ Single sentence classification ■ (Sentence 1) -> True / False ○ Sequence Tagging (Entity Recognition) ■ (Sentence 1) -> Tags for each token
  • 69. BERT: Results [Devlin et al. 2018] ● Pre-trained on ~3B words (Wikipedia, Books) ● After fine-tuning, outperformed other methods in 11 benchmark data sets ○ Fine-tuning works with ~3000 examples ○ Without any task-specific feature engineering ● For entity recognition, BERT works even without fine tuning
  • 70. BERT: Implication ● Why does it work? ○ Context comes from both direction ○ Provides different ways to fine-tune the model ○ The model seems to learn syntactic structures [Hewitt and Manning 2019] ○ Language models seem to be correlated with multiple application tasks ● We can train sophisticated deep learning model with thousands of samples!
  • 71. Pre-trained Deep Learning Models [Sanh et al. 2019]
  • 72. BERT: Limitations ● Slow serving ○ Distillation [Sahn et al. 2019] ○ Code optimization (ONNX Runtime) ● Handling 2+ sentences together? ○ XLNet [Yang et al. 2019] ● Nearest neighbor search is hard (modular scoring is impossible) ○ SentenceBERT [Reimers and Gurevych 2019] ● Handling very long text ○ Transformer-XL [Dai et al. 2019]
  • 73. Cross-lingual Transfer Learning ● Assume: We have a training data in English, and developed a ML model (NER or Relation) ● Can we use the data (or the model) for other languages?
  • 74. Multilingual Encoder Tesla is located in Palo Alto. [0.1, 0.3. -0.1, ….] Encoding Decoding Tesla befindet sich in Palo Alto. [0.1, 0.3. -0.1, ….] Encoding Decoding Tesla Palo Alto Located_In Tesla Palo Alto Located_In ● If the encoder gives same feature values for sentences with the same meaning in diff. languages? ○ We can reuse the decoder (classifier) ○ Decoder can be trained with English training data
  • 75. Multilingual Word Embedding [Mikolov et al. 2013] ● Words have similar embedding if they mean the same thing ● How to get this? Apply “Translation matrix W” ○ W can be learned from parallel word dictionary X, Y ○ W: orthogonal matrix (Procrustes alignment) X Y Four Cuatro Five Cinco Horse Caballo Dog Perro
  • 76. ● What if you do not have parallel word dictionary (X, Y)? We do these two steps ○ Learn translation matrix W (without word pairs (X, Y)) by adversarial learning ○ Construct (X, Y) by best-matching words between Wx and y Unsupervised Multilingual Embedding
  • 77. ● Adversarial learning: Train two models that compete each other ○ Wx: “Translated” embedding in a source language, y: Embedding in a target language ○ Discriminator D: Detect which language the example comes from ■ e.g., Wx is from the source language ○ Translation W: Fool the discriminator to detect the wrong language Unsupervised Multilingual Embedding
  • 78. Multilingual Word Embedding: Our Experience ● Public multilingual embedding may have low coverage on the data set ● Unsupervised word alignment (adversarial learning) works ○ But if you have good parallel word dictionary, Procrustes alignment is easier and better ● Our team’s approach: ○ Step 1: Train embedding on the target data set for each language separately ■ Why? to get good coverage ○ Step 2 (Optional): Run adversarial learning to get parallel word dictionary ○ Step 3: Align by solving Y ~ WX
  • 79. ● Pretraining models that cover multiple languages ○ Multilingual BERT ○ XLM: Masked language from parallel sentences Multilingual Pretrained Model [Lample and Conneau 2019]
  • 80. Adversarial Learning: Sentence Classification [Chen et al. 2018] ● Word embedding / pretrained models: General encoder ● Can we train multilingual, task-specific encoder? ● Yes. Adversarial learning again! ○ Train these two models that compete each other ■ Generator (Encoder): Generate features ■ Discriminator: Detect language from the encoder output ■ Generator has two purposes ● Help classification (sentiment classification) ● Fool the discriminator
  • 81. ● We have (text in source language, labels) ● Can we convert this to (text in target language, labels)? ○ Use translation to convert text ○ Use heuristics to convert labels Other Approach: Training Data Augmentation [Huang et al. 2019]
  • 82. Cross-lingual Transfer Learning: Our Experience ● Game changers: ○ Cross-lingual encoder (word embedding or pretrained models) ○ Adding some (~500) hand-labelled examples for the target language ● Things that help incrementally ○ Data augmentation by Machine translation ○ Task-specific encoder by Adversarial learning
  • 83. Conclusion ● Transfer learning: Leverage other datasets to transfer the knowledge ● Cross-domain transfer learning ○ Train a model in one domain and fine tune for the target domain ○ Pretrained deep NLP models are the state-of-the-art (BERT and its variants) ● Cross-lingual transfer learning ○ Multilingual encoder works ○ Use adversarial learning to achieve language invariance
  • 84. Recap before the break Task Key technologies to discuss Named Entity Recognition and Disambiguation ● Natural Language Understanding Models (LSTM, Transformer) Relation Extraction ● Semi-supervised method (Distant supervision, Data programming) ● Unsupervised method (Open information extraction) Scalable Relation Extraction with Limited Data ● Pretrained deep learning models (BERT and friends) ● Cross-lingual transfer learning (Adversarial learning, Multilingual encoder)
  • 85. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 86. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 87. Scalable Graph Refinement via Multi-channel Data Ingestion Baoxu Shi LinkedIn
  • 88. Scalable Graph Refinement via Multi-channel Data Ingestion ● Ingest large-scale, noisy crowd data ● Social network feedback loop
  • 89. Definition of Graph Refinement Graph Refinement is a task that aims to infer and add missing knowledge to the graph, or identify erroneous information. Accounting Data Mining Algorithms Director of Engineering Staff Researcher Graph Refinement Data Mining Algorithms Director of Engineering Staff Researcher Supervised Learning Reinforcement Learning... Machine Learning
  • 90. How to Refine the Graph? Experts Crowd Machine Learning Quality High Low to Medium Medium to High Volume Low Medium High Cost High Medium Low to Medium
  • 91. Scalable Graph Refinement Scalable Graph Refinement aims at refining graph at scale via ingesting large scale data. Crowd Machine Learning
  • 92. Data for Scalable Graph Refinement Structured Data Crowd Labels Social Network User Activity
  • 93. Challenges for Data Ingestion Ingest large-scale, noisy crowdsourced data ● Q1: How to leverage existing, large-scale structured data? ● Q2: How to leverage large volume of noisy social network text data? ● Q3: How to aggregate accurate labels from crowd workers? Social network user feedback loop ● Q4: How to validate knowledge via social signals? ● Q5: How to grow social network via constructed knowledge graph? ● Q6: How to improve social network and knowledge graph jointly?
  • 94. Ingest Structured Data via Entity Alignment Q1: How to leverage existing, large-scale structured data? Entity Alignment between knowledge graphs aims to find entities in two graphs that represent the same real-world entity. ITransE-SA (Zhu et al. 2017)
  • 95. Feature-based entity alignment Q1: How to leverage existing, large-scale structured data? The alignment score is determined by the average string similarity between a node pair and their neighbor pairs connected via the same edge type. RDF-AI (Scharffe et al. 2009) Sim(1, 9)=(StrSim(1,9) + StrSim(2,11) + StrSim(3,12))/3 Requires preprocessing (translation) and schema alignment.
  • 96. Embedding-based Entity Alignment Q1: How to leverage existing, large-scale structured data? ITransE-SA (Zhu et al. 2017) Requires a set of aligned entity seed and schema alignment. True edge False edge Triple loss Optimize embedding of each graph individually.
  • 97. Embedding-based Entity Alignment Q1: How to leverage existing, large-scale structured data? ITransE-SA (Zhu et al. 2017) Requires a set of aligned entity seed and schema alignment. Alignment loss
  • 98. Embedding-based Entity Alignment Q1: How to leverage existing, large-scale structured data? (Trisedya et al, 2019) Only requires schema alignment. Located in (transitive rule) Located in Located in Predicates are unified.
  • 99. Q1: How to leverage existing, large-scale structured data? (Trisedya et al, 2019) Only requires schema alignment. Address inconsistent attribute representations: fa (“Barack Obama”) ~ fa (“Barack Hussein Obama”) fa (“50.9989”)~ fa (“50.998888889”) Embedding-based Entity Alignment Minimize the distance between structure embedding and attribute embedding Align entities by computing their cosine similarity
  • 100. Recap on Ingesting Structured Data ● Use RDF-AI as a proof of concept if all nodes have textual features in the same language. ● If the schemas are aligned and nodes have textual features, use Trisedya 2019. ● If the schemas are aligned and existing aligned entity pairs exist, use ITransE-SA.
  • 101. Ingest Crowdsourced Labels via Answer Aggregation Q3: How to aggregate accurate labels from crowd workers? Answer aggregation for crowdsourcing is a task that finds the hidden ground truth from a set of answers given by the crowd workers. Work with a team of high-performing analytics, data science professionals, and cross-functional teams to identify business opportunities and develop algorithms and methodologies to address them. Does the following job description sentence requires data science skill? Yes or No?Aggregation
  • 102. Q3: How to aggregate accurate labels from crowd workers? Answer Ingestion Social Honeypot (Lee et al, 2010) reCAPTCHA (Von Ahn et al. 2008) Answer Filtering (Remove workers who failed trapping questions) Trapping question (ground truth) ● Majority vote (Kuncheva et al. 2003) ● Weight answers by worker expertise & question difficulty ○ Trapping question-based (Khattak and Salleb-Aouissi 2011) ○ Supervised EM (binary label only) (Raykar et al, 2009) ● Snorkel (Ratner et al., 2017) Answer Aggregation (Hung et al. 2013)
  • 103. Generative Answer Aggregation -- Snorkel Q3: How to aggregate accurate labels from crowd workers? minstances n workers Label matrix Λ human provided label y or ø if no judgement. Instance i has label from worker j Instance i has label yi from worker j Worker j and k has the same label normalizing constant concatenation of three vectors true label is unknownuse contrastive divergence to solve w without ground truth labels probabilistic training label
  • 104. Recap on Ingesting CrowdSourced Labels ● Always use trapping questions to filter out low quality answers/workers. ● Use majority vote as the baseline to aggregate answers. ● To further improve the answer quality, use Snorkel to aggregate the labels.
  • 105. Knowledge Validation via Social Signals Q4: How to validate knowledge via social signals? Knowledge validation via social signals is a task that aims at validating factual knowledge graph information by collecting signals from end-users directly. Name Quality Cost Scale Setting Crowd Workers Mid Mid to High Small to Mid Usually single task Social Signals Mid to High Low Large Multi-task LinkedIn’s skill validationGoogle Map’s Venue Questions
  • 106. Social Signal Knowledge Validation in Google Map Q4: How to validate knowledge via social signals? Social signal collection for each (location, attribute) pair (Kobren. et.al 2019) location l attribute a count of yes vote count of votes yes vote rate expected yes rate certainty of expected yes rate Model the member’s voting behavior as a beta distribution Use user vote’s to construct knowledge base for locations.
  • 107. Social Signal Knowledge Validation in Google Map Q4: How to validate knowledge via social signals? (Kobren. et.al 2019) location-features (natural language text) aggregated vote of other attributes (raw count, majority vote, etc) generated location embedding w.r.t. attribute0 expected yes rate certainty of expected yes rate (determines false positive rate) output
  • 108. LinkedIn’s Social Skill Validation Q4: How to validate knowledge via social signals? (Yan et al. 2019) Use member actions to learn the skill expertise of our members. LinkedIn Skill Endorsement Product Yes/yes question: Users can act without judgement. No anonymity: Users use as a social gesture .
  • 109. LinkedIn’s Social Skill Validation Q4: How to validate knowledge via social signals? (Yan et al. 2019) Use member actions to learn the skill expertise of our members. Compare a connection’s skills within a certain category Normalizing the score given by the user to remedy social gesture.
  • 110. LinkedIn’s Social Skill Validation Q4: How to validate knowledge via social signals? (Yan et al. 2019) Use member actions to learn the skill expertise of our members. Ask viewer to rank the skill level of candidates candidate skill viewer Uses a ML model to provide candidates.
  • 111. LinkedIn’s Social Skill Validation Q4: How to validate knowledge via social signals? (Yan et al. 2019) Use member actions to learn the skill expertise of our members. Multi-task Model (member, skill, expertise score)
  • 112. Recap on knowledge validation via social signals ● The social signals still requires answer aggregation. ● The design of social signal collection is crucial for data quality.
  • 113. Knowledge Graph guided Social Link Prediction Q5: How to grow social network via constructed knowledge graph? Given a social network and a knowledge graph, Knowledge graph guided social link prediction aims to predict member - member connections using the knowledge graph. Social Network Knowledge Graph Knowledge Graph guided Social Link Prediction
  • 114. Matrix Factorization for Social Link Prediction nmembers n members (Menon and Elkan. 2011) i-j connected? node embedding node bias regularization Purely based on topological information, Q5: How to grow social network via constructed knowledge graph?
  • 115. Social Link Prediction using Member Attributes (Zhang et al. 2018) Reconstructs weighted average of neighbors’ attributes Predicts skip-gram neighbors’ structural embedding. f(x) v one-hot attribute vector Learns attribute embeddings implicitly. How Skip-gram works? Q5: How to grow social network via constructed knowledge graph?
  • 116. Social Link Prediction using Member Attributes (Meng et al, 2019) attribute graph adjacent matrix one-hot attribute vector one-hot node vector Reconstructed member-member graph Reconstructed member-attribute graph Attribute embeddings is a function of members and hence for social link prediction only. Q5: How to grow social network via constructed knowledge graph?
  • 117. Joint Representation Learning on Social and Knowledge Graph Q6: How to improve social network and knowledge graph jointly? Given a social network and a knowledge graph, we want to learn node representations to refine both graph jointly. An example of social network + knowledge graph.
  • 118. Ambiguous Social Connections: Person-Person connections are ambiguous. An illustration of LinkedIn’s Heterogeneous Social Network Colleague candidate-recruiter Q6: How to improve social network and knowledge graph jointly? Joint Representation Learning on Social and Knowledge Graph (Shi et al, 2019)
  • 119. Corrupted Higher-order Proximity: Cannot learn meaningful entity embeddings. An illustration of LinkedIn’s Heterogeneous Social Network candidate-recruiter Not similar because candidate-recruiter relationship does not indicate occupation similarity. Q6: How to improve social network and knowledge graph jointly? Joint Representation Learning on Social and Knowledge Graph (Shi et al, 2019)
  • 120. Joint Representation Learning on Social and Knowledge Graph The learned embeddings can be used to predict connections between two arbitrary types. (Shi et al, 2019) Q6: How to improve social network and knowledge graph jointly?
  • 121. Methods to refine a graph in a scalable way Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. ● Use graph alignment to ingest large volume of knowledge from external knowledge graphs ● Use Snorkel to aggregate and denoise crowdsourced labeled data for graph refinement. ● Design social feedback loops to collect social signals, aggregate them and refine the graph. ● Use representation learning to refine knowledge graph and social network.
  • 122. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction
  • 124. Taxonomy Examples “Taxonomy is the practice and science of classification of things or concepts, including the principles that underlie such classification” Ciaramita, Massimiliano, et al. "Hierarchical preferences in a broad-coverage lexical taxonomy." Proceedings of the Annual Meeting of the Cognitive Science Society. Vol. 27. No. 27. 2005. Wheeler, David L., et al. "Database resources of the national center for biotechnology information." Nucleic acids research 36.suppl_1 (2007): D13-D21. WordNet NCBI Taxonomy Common Tree
  • 125. Taxonomy Examples “Taxonomy is the practice and science of classification of things or concepts, including the principles that underlie such classification” SOC, URL: https://www.bls.gov/soc/ Library card catalog picture credit: https://www.smithsonianmag.com/smart-news/card-catalog-dead-180956823/ US Bureau of Labor Statistics - SOC Library Card Catalog
  • 126. Is Taxonomy a Knowledge Graph? Machine Learning Data Mining Algorithms Director of Engineering Staff Researcher Knowledge Graph describes relationships between real-world entities. (No hierarchical information) Taxonomy describes the classification of real-world entities and concepts. (Has hierarchical information) Analytics Software Development Computer Science Data Mining Machine Learning Algorithms Specific<-General Scikit-Learn
  • 127. Many taxonomies are constructed manually Name Creator/Organizer Domain Scale Method O*NET US Department of Labor Occupation 1167 Manual SOC US Bureau of Labor Statistics Occupation 867 Manual WordNet Princeton University Noun & Verbs 155,327 Manual Global WordNet VU University Amsterdam various various Transfer & merge NCBI Taxonomy National Center for Biotechnology Information Biology 657,846 Manual LCC Library of Congress Library 227 Manual Updating the O*NET-SOC Taxonomy URL: https://www.onetcenter.org/dl_files/UpdatingTaxonomy_Summary.pdf Federal Register Notice, URL: https://www.bls.gov/soc/2018/soc2018final.pdf Miller, George A. WordNet: An electronic lexical database. MIT press, 1998. Fellbaum, Christiane. "A semantic network of english: the mother of all WordNets." EuroWordNet: A multilingual database with lexical semantic networks. Springer, Dordrecht, 1998. 137-148. Bo Svensén. 2009. A Handbook of Lexicography. The Theory and Practice of Dictionary-Making. Cambridge University Press. The NCBI Taxonomy database, URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245000/ Library of Congress Classification Outline, URL: http://www.loc.gov/aba/cataloging/classification/lcco/ Majority of the taxonomies are constructed manually by domain experts.
  • 128. Updating taxonomy manually is time-consuming On average the update rate of O*NET is 1.6 occupation per day. O*NET Occupation Update Summary, URL: https://www.onetcenter.org/dataUpdates.html
  • 129. Automatic Taxonomy Construction (ATC) Problem Definition: Given text corpus and/or auxiliary data, construct a directed taxonomy graph G=(V,E), where V is a set of taxonomy entities, and E is a set of directed edges (u -> v). and/or Text Corpus Auxiliary Data ATC Model Inducted Taxonomy
  • 130. Challenges of Automatic Taxonomy Construction ● Q1: How to ensure high precision of the constructed taxonomy? ● Q2: How to ensure high recall of the constructed taxonomy? ● Q3: How to reduce the need of large volume in-domain corpora?
  • 131. Hearst Patterns model Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992. Text Corpus Patterns generated manually Pattern Matching Hypernym(wound, injury) Hypernym(broken bone, injury) Hypernym(treasury, civic building) Hypernym(England, common-law country) ... Extracted X-isA-Y relationships (Hearst, 1992&1998) Q1: How to ensure high precision of the constructed taxonomy?
  • 132. Discover Hearst Patterns Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992. (Hearst, 1992&1998) Q1: How to ensure high precision of the constructed taxonomy? Four steps to automatically create Hearst patterns: 1.Collect noun pairs from corpora, identifying hypernym pairs using WordNet. 2.For each noun pair, collect sentences in which both nouns occur. 3.Parse the sentences and extract patterns from the parse tree. 4.Train a hypernym classifier based on these features. Four steps to manually create Hearst patterns: 1.Decide a lexical relation (e.g. is-A). 2.Gather a list of term pairs for which the relation is known to hold (e.g. [Java, Programming Language]) 3.Collect sentences where both terms from a term pair appear. (e.g. Java is a general-purpose programming language) 4.Find common patterns that indicates the relation of interest (e.g. X is a (adj.) Y) (Snow, et. al., 2004)
  • 133. Limitation of Pattern-based models ● Rule-based method has low recall. ● Performance relies on the completeness of patterns. ● Creating patterns is time-consuming. ● Can only extract relationships between co-occurred entities. Distributional models can identify relationships between unobserved entity pairs.
  • 134. Distributional Methods Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. Hearst, Marti A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992. Q2: How to ensure high recall of the constructed taxonomy? (Cederberg and Widdows, 2003) Score a pair of x,y by cosine(hx , hy ) Top-1000 non-stop words Allphrasesinthecorpus Co-occur count V*U ∑ Single-value-decomposition hx hy word vector
  • 135. Hearst + Distributional Methods Cederberg, Scott, and Dominic Widdows. "Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction." HLT-NAACL, 2003. Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." ACL. 2018. Q2: How to ensure high recall of the constructed taxonomy? Score a pair of x,y by spmi(x,y) = ux ∑r vy All phrases in the corpus Allphrasesinthecorpus V*U ∑r Single-value-decomposition ux vy Score hypernym(x,y) by ● P(x,y) = count of extracted hypernym(x,y) / total extractions ● Positive Pointwise Mutual Information (Roller et. al., 2018)
  • 136. Hearst + Distributional Methods LSTM + Pattern + Word Embedding (Shwartz et. al., 2016) + TF-IDF features + Reinforcement Learning (Mao et. al., 2018)
  • 137. Limitation of Distributional methods Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. ● Requires large amount of in-domain text corpora. ● Can only extract relationships between entities exist in the text corpora. ● Lexical memorization (memorizing certain words correlate with certain label) (Levy et al. 2015) Instead of requiring in-domain text corpora, one can induce a taxonomy from existing taxonomy. Q2: How to ensure high recall of the constructed taxonomy?
  • 138. Keyword + General Purpose Taxonomy Q3: How to reduce the need of large volume of in-domain text corpora? (Liu et. al., 2011) Domain-specific Keywords Indiana cheap car insurance Parent concept probabilities Search keywords’ context Bag-of-word vector Hierarchical Clustering Domain-specific Taxonomy General purpose taxonomy
  • 139. Refine Taxonomy via Hyperbolic Embeddings Q3: How to reduce the need of large volume of in-domain text corpora? (Nickel and Kiela, 2017, Le et. al., 2019) Existing taxonomy graph in the format of (u,v) edges Hyperbolic Embedding Model Learned taxonomy Hyperbolic embedding models infer taxonomy from existing graphs instead of text corpora.
  • 140. Steps to create a taxonomy Roller, Stephen, Douwe Kiela, and Maximilian Nickel. "Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. ● Collect large amount of in-domain data ● Use Hearst rules to extract high precision taxonomy (Hearst 1998, Snow 2014) ● Use distributional method to improve recall (Roller 2018, Mao 2018) ● Use hyperbolic embedding to refine taxonomy structure (Le 2019) ● Extend your taxonomy using new entities / keywords and a general-purpose taxonomy (Liu 2011)
  • 141. Tutorial’s Agenda 09:00 Introduction 11:45 Automated Taxonomy Expansion 09:05 Overview of LinkedIn’s Knowledge Graph and Applications 09:15 Named Entity Recognition 09:35 Populate Relationships between Social Network Entities 10:00 Scalable Relationship Extraction with Limited Data 10:30 Coffee Break 11:00 Scalable Graph Refinement via Multi-channel Data Ingestion 12:00 Conclusion and Q&A Part 1: Construct high-quality knowledge graph for social networks Part 2: Weakly-supervised, scalable social network knowledge graph construction