LinkedIn Skills: RecSys Conference 2014

LinkedIn Skills: Large-Scale Topic Extraction
and Inference
Mathieu Bastian
LinkedIn Corporation ©2014 All Rights Reserved

The World’s Largest Professional Network
Members Worldwide
2 new
Members Per Second
100M+
Monthly Unique Visitors
313M+ 3M+
Company Pages
Connecting Talent  Opportunity. At scale…

LinkedIn Profile
 313M+ profiles in 200+ countries
 Organized into sections
– Standardized: Companies, Titles, Industry,
Location etc.
– Unstandardized: Text (Summary, Position
description, specialties)
 Skills & Endorsements section
– Introduced in 2011
– Limited to 50 skills per profile

Skills at LinkedIn
 Key component of the
professional identity
 Dictionary of 45k+ skills in
English
 Members have diverse skills
– Java Programming
– Ballet
– Politics
– Bow Hunting
 Many of these are long-tailExample of a Skills section on a LinkedIn profile

Folksonomy creation

Folksonomy creation
 Create a folksonomy of skills based on LinkedIn profiles
 Leverage the “specialties” section
 Detect comma-separated lists and extract skill phrases
 Use stop-list and exclude other entities (e.g. companies, titles,
degrees)
 150k skill phrases extracted after removing long-tail noise
skill
phrases

Disambiguation
 Need to add context to differentiate skill phrases with multiple
meanings (e.g. NLP = Natural Language Processing,
NLP = Neuro-linguistic programming)
 Different meanings have different sets of related phrases
 Use Jaccard Similarity on LinkedIn profiles for related phrases and
then SVD + KMeans to identify clusers of phrases
References: R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463

De-duplication
 Need to group phrases with similar meaning together. Examples:
– Acronyms: B2B, Business to Business
– Synonyms: Java Programming, Java Development
– Typos: Government Liason
 Many of the skill phrases could be tied to a Wikipedia page
 Built Mechanical Turk (www.mturk.com) task to find the Wikipedia
page associated with a skill phrase
Java programming
Java development
Java
http://en.wikipedia.org/wiki/Java
_(programming_language)
Cluster

 Extraction based on 12M of LinkedIn profiles with “specialties”
 Extracted 150k skill phrases
 Clustered related phrases adding the industry context to ambiguous
phrases
 De-duplication using MTurk
 Final master list contains 50k skills
Folksonomy creation summary
Examples of synonyms of
“Microsoft Office”

Inference and Recommendation

 Goal was boosting skills adoption with a recommender system:
“suggested skills”
 Inferring the skills members have, similar to discovering latent
attributes in profiles
 Develop a collaborative filtering solution using profile attributes
Skills Inference and Recommendation
References: A. Mislove and al. You are who you know: Inferring user profiles in online social networks.
R. Jäschke and al. Tag recommendations in folksonomies.
Skills Typeahead on LinkedIn
Suggested Skills

 Large number of standardized profile attributes (i.e. can be
represented by a unique identifier)
 Members with similar profiles attributes are likely to have similar
skills (e.g. If you work at Apple, you probably know “Mac OS”)
Features
Type Example Cardinality
Title (Headline) Product Manager Thousands
Function Engineering Dozens
Industry Healthcare Dozens
Title (Employment Position) Product Manager Thousands
Company LinkedIn Millions
Group membership Healthcare Professionals Millions
Skills Matlab Thousands

 Calculate the likelihood that a member has a given
skill, given his profile attributes
 No direct user similarity metric
 Large number of features (e.g. 3M companies) and 50k classes
Problem
the set of profile attributes
the folksonomy of skills

 Used a Naïve Bayes Classifier to produce inferred skills
 Training data based on members already with skills
 Result is a ranking of inferred skills, which can directly be used in
 Evaluation methodology
– AUC for each skill
– P@k and Recall for evaluating the recommendations
Naïve Bayes Classifier
with

 Evaluate how well we can predict skills members’ have
Evaluation
ROC of skill “Hadoop” Distribution of ROC across
all skills

 12X improvement in conversion using “suggested skills”
Results
Without
With

Our Contributions
 End-to-end creation of a skills folksonomy based on free-text
specialties section
 Efficient inferred skills model with good offline performance
 Skills recommender system based on profile attributes

LinkedIn Skills: RecSys Conference 2014

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie LinkedIn Skills: RecSys Conference 2014

Ähnlich wie LinkedIn Skills: RecSys Conference 2014 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

LinkedIn Skills: RecSys Conference 2014

Hinweis der Redaktion