SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Using Deep Learning To
Predict Performance From
Resumes
Ben Taylor, Chief Data Scientist
INTRODUCTIONS
Ben Taylor @bentaylordata
Background Personal
• Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries
NATURAL LANGUAGE
PROCESSING (NLP)
GRIT MOTIVATION ENGAGEMENT PERFORMANCE
1 55 80 95%
0 75 10 22%
0 50 20 57%
1 20 90 91%
0 40 60 11%
Basic Tutorial On How To Build A Numeric Feature Model
BUILDING A MODEL
ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
I want to work here 1 55 80 95%
I have great teamwork 0 75 10 22%
Synergy 0 50 20 57%
I have so much grit 1 20 90 91%
They fired that individual 0 40 60 11%
Now what?!?
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1
I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
Band-Aid: Concept of n-grams
BUILDING A MODEL
SENTIMENT EXAMPLE
(multiclass)
We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)
Before we can process this we need to do the proper formatting to get it ready
SENTIMENT DATASET - FORMATTING
text
@Christian_Rocha i miss u!!!!!
@llanitos there's still some St Werburghs hone...
@Ashley96 it's me
@Phillykidd we use to be like bestfriends
Just got back from Manchester. I went to the T...
@LauraDark thnks x el rt
"Ughh it's so hot & the singing lady is st...
@hnprashanth @dkris I was out to my native for...
Girls night with the bests Wish you were here J!
Just watched @paulkehler rock the crap out of ...
i got the gurl! i got the ride! now im just on...
@ninthspace how is the table building going?
by d way guyz I must log out na see u again to...
@dreday11 its only 20 mins...
Python list
Now we can go all the way to model training and prediction
SENTIMENT DATASET – UNIGRAM
y
[0,1,0,1,1]
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
I want to work here have great
1 1 1 1 1 0 0
1 0 0 0 0 1 1
0 0 0 0 0 0 0
1 0 0 0 0 1 0
0 0 0 0 0 0 0
Now we can go all the way to model training and prediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]
BUILDING A MODEL
Convert labels to integers
SENTIMENT DATASET - FORMATTING
Python int array
label
neg
pos
pos
neg
neg
pos
neg
neg
pos
neg
pos
pos
pos
neg
Convert labels to integers
SENTIMENT DATASET - FORMATTING
model.fit(X,Y)
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
Now we can go all the way to model training and prediction
SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?
DON’T CHEAT!
PROPER MODEL VALIDATION
We need to hold out data we can test against, this is called your validation set
SENTIMENT DATASET – VALIDATION
Train on 20%, test on 80%
SENTIMENT DATASET – VALIDATION
20% 80%
Best score yet
SENTIMENT DATASET – VALIDATION
60% 40%
Best score yet
SENTIMENT DATASET – VALIDATION
70% 30%
Best score yet
SENTIMENT DATASET – VALIDATION
80% 20%
Best score yet
SENTIMENT DATASET – VALIDATION
99% 1%
Perfect scores
SENTIMENT DATASET – VALIDATION
99.9999% 2
Predict Every Point, k-folding
Folds = 9 Fold = 1 Fold = 2… Y_pred
SENTIMENT DATASET – Validation
10 folds
SENTIMENT DATASET – Validation
100 folds
BIGRAM BOOST
acc: 0.8015
r: 0.2061
AUROC: 0.8738
acc: 0.7809
r: 0.1238
AUROC: 0.8554
Feature Creation
Model Selection
Feature Reduction
BETTER MODELS
acc: 0.8208
r: 0.2832
AUROC: 0.8939
acc: 0.8015
r: 0.2061
AUROC: 0.8739
Was:
Now: (10x average)
EMAIL CLASSIFICATION
(multiclass)
EMAIL MULTICLASS DATASET (20 classes)
alt.atheism
comp.graphics
comp.os.ms-windows.misc
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
misc.forsale
rec.autos
rec.motorcycles
rec.sport.baseball
rec.sport.hockey
sci.crypt
sci.electronics
sci.med
sci.space
soc.religion.christian
talk.politics.guns
talk.politics.mideast
talk.politics.misc
talk.religion.misc
EMAIL MULTICLASS DATASET (20 classes)
From: lerxst@wam.umd.edu (where's my thing)
Subject: WHAT car is this!?
Nntp-Posting-Host: rac3.wam.umd.edu
Organization: University of Maryland, College Park
Lines: 15
MSG: I was wondering if anyone out there could enlighten me on this car I sawnthe other day. It was a 2-door
sports car, looked to be from the late 60s/nearly 70s. It was called a Bricklin. The doors were really small. In
addition,nthe front bumper was separate from the rest of the body. This is nall I know. If anyone can tellme a
model name, engine specs, yearsnof production, where this car is made, history, or whatever info younhave on
this funky looking car, please e-mail.nnThanks,n- ILn ---- brought to you by your neighborhood Lerxst ----
nnnnn"
rec.autos
EMAIL MULTICLASS DATASET (20 classes)
From: guykuo@carson.u.washington.edu (Guy Kuo)
Subject: SI Clock Poll - Final Call
Summary: Final call for SI clock reports
Keywords: SI,acceleration,clock,upgrade
Article-I.D.: shelley.1qvfo9INNc3s
Organization: University of Washington
Lines: 11
NNTP-Posting-Host: carson.u.washington.edu
MSG: A fair number of brave souls who upgraded their SI clock oscillator havenshared their experiences for
this poll. Please send a brief message detailingnyour experiences with the procedure. Top speed attained, CPU
rated speed,nadd on cards and adapters, heat sinks, hour of usage per day, floppy disknfunctionality with 800
and 1.4 m floppies are especially requested.nnI will be summarizing in the next two days, so please add to the
networknknowledge base if you have done the clock upgrade and haven't answered thisnpoll. Thanks.nnGuy
Kuo <guykuo@u.washington.edu>n"
comp.sys.mac.hardware
EMAIL MULTICLASS DATASET (20 classes)
From: jgreen@amber (Joe Green)
Subject: Re: Weitek P9000 ?
Organization: Harris Computer Systems Division
Lines: 14
Distribution: world
NNTP-Posting-Host: amber.ssd.csd.harris.com
X-Newsreader: TIN [version 1.1 PL9]
MSG: Robert J.C. Kyanko (rob@rjck.UUCP) wrote:n> abraxis@iastate.edu writes in article
<abraxis.734340159@class1.iastate.edu>:n> > Anyone know about the Weitek P9000 graphics chip?n> As far
as the low-level stuff goes, it looks pretty nice. It's got thisn> quadrilateral fill command that requires just the
four points.nnDo you have Weitek's address/phone number? I'd like to get some informationnabout this
chip.nn--nJoe GreenttttHarris Corporationnjgreen@csd.harris.comtttComputer Systems Divisionn"The
only thing that really scares me is a person with no sense of humor."ntttttt-- Jonathan Wintersn’
comp.graphics
EMAIL MULTICLASS DATASET (20 classes)
RESUME MODELING
(binary)
Upload Your
Resume
Now painstakingly fill out
this form containing all of
the exact same information
Document modeling review
UNSTRUCTURED
STRUCTURED
MUNGED
Resume Extension
Resume format consolidation
GPA Inclusion (18%)
GPA Replacement
Mimicking the human recruiter
Feature Hunt
ONE FEATURE AT A TIME
INCREMENTAL GAINS
DEEP LEARNING
Unstructured
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
Structured
I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
ESSAY
3 2 1 4 5
3 7 67 345
54
3 7 99 10234
78 203 501 14
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING
AUTOMATIC FEATURE GENERATION
AUTOMATIC FEATURE GENERATION
AUTOMATIC FEATURE GENERATION
BEGIN SCRATCHING AT LAYOUT
AUTOMATIC FEATURE GENERATION (LAYOUT)
CNN:
bit.ly/pacon
INTERVIEW MODELING
59
Would you ever hire from just a resume?
INTERVIEW MODELING
SOFT/TECHNICAL COMPETENCIES
Resume can overstate and understate
Audio VideoText
QUESTIONS

Weitere ähnliche Inhalte

Ähnlich wie Using Deep Learning And NLP To Predict Performance From Resumes

Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
kalai75
 
Interop - Exploring Machine Data
Interop - Exploring Machine DataInterop - Exploring Machine Data
Interop - Exploring Machine Data
Michael Wilde
 

Ähnlich wie Using Deep Learning And NLP To Predict Performance From Resumes (20)

Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
7 Dangerous Myths DBAs Believe about Data Modeling
7 Dangerous Myths DBAs Believe about Data Modeling7 Dangerous Myths DBAs Believe about Data Modeling
7 Dangerous Myths DBAs Believe about Data Modeling
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Sww 2006 Redesigning Processes For Solid Works
Sww 2006   Redesigning Processes For Solid WorksSww 2006   Redesigning Processes For Solid Works
Sww 2006 Redesigning Processes For Solid Works
 
Introducing HOSTING Labs - Ed Schaefer
Introducing HOSTING Labs - Ed Schaefer Introducing HOSTING Labs - Ed Schaefer
Introducing HOSTING Labs - Ed Schaefer
 
Excel Power-ups for Going Beast-mode in Local SEO
Excel Power-ups for Going Beast-mode in Local SEOExcel Power-ups for Going Beast-mode in Local SEO
Excel Power-ups for Going Beast-mode in Local SEO
 
Use atomic design ftw
Use atomic design ftwUse atomic design ftw
Use atomic design ftw
 
Interop - Exploring Machine Data
Interop - Exploring Machine DataInterop - Exploring Machine Data
Interop - Exploring Machine Data
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 
Approaching (almost) Any NLP Problem
Approaching (almost) Any NLP ProblemApproaching (almost) Any NLP Problem
Approaching (almost) Any NLP Problem
 
Progressing and enhancing
Progressing and enhancingProgressing and enhancing
Progressing and enhancing
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
Nagios Conference 2014 - David Josephsen - Graphing Nagios
Nagios Conference 2014 - David Josephsen - Graphing NagiosNagios Conference 2014 - David Josephsen - Graphing Nagios
Nagios Conference 2014 - David Josephsen - Graphing Nagios
 
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
 
Test Automation Day 2018
Test Automation Day 2018Test Automation Day 2018
Test Automation Day 2018
 
Design Patterns - IA Summit 2006
Design Patterns - IA Summit 2006Design Patterns - IA Summit 2006
Design Patterns - IA Summit 2006
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 

Mehr von Benjamin Taylor

#SIOP15 Presentation on
#SIOP15 Presentation on #SIOP15 Presentation on
#SIOP15 Presentation on
Benjamin Taylor
 

Mehr von Benjamin Taylor (10)

Deep learning for_devs
Deep learning for_devsDeep learning for_devs
Deep learning for_devs
 
Python genetics
Python geneticsPython genetics
Python genetics
 
Homeless story
Homeless storyHomeless story
Homeless story
 
#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews
 
#SIOP15 Presentation on
#SIOP15 Presentation on #SIOP15 Presentation on
#SIOP15 Presentation on
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A Rockstar
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yield
 
Text analytics intro
Text analytics introText analytics intro
Text analytics intro
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
 

Kürzlich hochgeladen

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 

Kürzlich hochgeladen (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

Using Deep Learning And NLP To Predict Performance From Resumes

  • 1. Using Deep Learning To Predict Performance From Resumes Ben Taylor, Chief Data Scientist
  • 4. • Sequoia Capital • Largest Video Interviewing Platform • Forbes #10 most promising companies • Global: 189 countries
  • 6. GRIT MOTIVATION ENGAGEMENT PERFORMANCE 1 55 80 95% 0 75 10 22% 0 50 20 57% 1 20 90 91% 0 40 60 11% Basic Tutorial On How To Build A Numeric Feature Model BUILDING A MODEL
  • 7. ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE I want to work here 1 55 80 95% I have great teamwork 0 75 10 22% Synergy 0 50 20 57% I have so much grit 1 20 90 91% They fired that individual 0 40 60 11% Now what?!? BUILDING A MODEL
  • 8. ESSAY PERFORMANCE I want to work here 95% I have great teamwork 22% Synergy 57% I have so much grit 91% They fired that individual 11% There are really two different options, mapping or tokenizing BUILDING A MODEL Map: Bad = 0 Good = 1 Better = 2 Best = 3 Tokenize: Female = 1 Male = 1 Female Male 1 0 0 1
  • 9. I want to work here have great PERF. 1 1 1 1 1 0 0 95% 1 0 0 0 0 1 1 22% 0 0 0 0 0 0 0 57% 1 0 0 0 0 1 0 91% 0 0 0 0 0 0 0 11% Tokenize the text into unique word columns BUILDING A MODEL ESSAY PERFORMANCE I want to work here 95% I have great teamwork 22% Synergy 57% I have so much grit 91% They fired that individual 11%
  • 10. I want to work here have great PERF. 1 1 1 1 1 0 0 95% 1 0 0 0 0 1 1 22% 0 0 0 0 0 0 0 57% 1 0 0 0 0 1 0 91% 0 0 0 0 0 0 0 11% Bag of words modeling, sequence and ordering is lost BUILDING A MODEL
  • 11. Bag of words modeling, sequence and ordering is lost BUILDING A MODEL
  • 12. I want Want to to go work here PERF. 1 1 1 1 1 95% 1 0 0 0 0 22% 0 0 0 0 0 57% 1 0 0 0 0 91% 0 0 0 0 0 11% Band-Aid: Concept of n-grams BUILDING A MODEL
  • 14. We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all. SENTIMENT DATASET, 1.5M TWEETS label text neg @Christian_Rocha i miss u!!!!! pos @llanitos there's still some St Werburghs hone... pos @Ashley96 it's me neg @Phillykidd we use to be like bestfriends neg Just got back from Manchester. I went to the T... pos @LauraDark thnks x el rt neg "Ughh it's so hot &amp; the singing lady is st... neg @hnprashanth @dkris I was out to my native for... pos Girls night with the bests Wish you were here J! neg Just watched @paulkehler rock the crap out of ... pos i got the gurl! i got the ride! now im just on... pos @ninthspace how is the table building going? pos by d way guyz I must log out na see u again to... neg @dreday11 its only 20 mins... Sentiment140 cs.stanford.edu :(:)
  • 15. Before we can process this we need to do the proper formatting to get it ready SENTIMENT DATASET - FORMATTING text @Christian_Rocha i miss u!!!!! @llanitos there's still some St Werburghs hone... @Ashley96 it's me @Phillykidd we use to be like bestfriends Just got back from Manchester. I went to the T... @LauraDark thnks x el rt "Ughh it's so hot &amp; the singing lady is st... @hnprashanth @dkris I was out to my native for... Girls night with the bests Wish you were here J! Just watched @paulkehler rock the crap out of ... i got the gurl! i got the ride! now im just on... @ninthspace how is the table building going? by d way guyz I must log out na see u again to... @dreday11 its only 20 mins... Python list
  • 16. Now we can go all the way to model training and prediction SENTIMENT DATASET – UNIGRAM y [0,1,0,1,1] text_data [[‘this is a tweet’] [‘sounds good’] [‘not really’]] I want to work here have great 1 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
  • 17. Now we can go all the way to model training and prediction SENTIMENT DATASET – BIGRAM I want Want to to go work here 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 text_data [[‘this is a tweet’] [‘sounds good’] [‘not really’]] y [0,1,0,1,1]
  • 19. Convert labels to integers SENTIMENT DATASET - FORMATTING Python int array label neg pos pos neg neg pos neg neg pos neg pos pos pos neg
  • 20. Convert labels to integers SENTIMENT DATASET - FORMATTING model.fit(X,Y) X [4,0,0,0,0,7,0,0,1] [0,0,0,0,9,0,0,0,2]
  • 21. Now we can go all the way to model training and prediction SENTIMENT DATASET – BUILD A MODEL y [0,1,0,1,1] X [4,0,0,0,0,7,0,0,1] [0,0,0,0,9,0,0,0,2] PERFORMANCE?
  • 24. We need to hold out data we can test against, this is called your validation set SENTIMENT DATASET – VALIDATION
  • 25. Train on 20%, test on 80% SENTIMENT DATASET – VALIDATION 20% 80%
  • 26. Best score yet SENTIMENT DATASET – VALIDATION 60% 40%
  • 27. Best score yet SENTIMENT DATASET – VALIDATION 70% 30%
  • 28. Best score yet SENTIMENT DATASET – VALIDATION 80% 20%
  • 29. Best score yet SENTIMENT DATASET – VALIDATION 99% 1%
  • 30. Perfect scores SENTIMENT DATASET – VALIDATION 99.9999% 2
  • 31. Predict Every Point, k-folding Folds = 9 Fold = 1 Fold = 2… Y_pred
  • 32. SENTIMENT DATASET – Validation 10 folds
  • 33. SENTIMENT DATASET – Validation 100 folds
  • 34. BIGRAM BOOST acc: 0.8015 r: 0.2061 AUROC: 0.8738 acc: 0.7809 r: 0.1238 AUROC: 0.8554
  • 36. BETTER MODELS acc: 0.8208 r: 0.2832 AUROC: 0.8939 acc: 0.8015 r: 0.2061 AUROC: 0.8739 Was: Now: (10x average)
  • 38. EMAIL MULTICLASS DATASET (20 classes) alt.atheism comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x misc.forsale rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.crypt sci.electronics sci.med sci.space soc.religion.christian talk.politics.guns talk.politics.mideast talk.politics.misc talk.religion.misc
  • 39. EMAIL MULTICLASS DATASET (20 classes) From: lerxst@wam.umd.edu (where's my thing) Subject: WHAT car is this!? Nntp-Posting-Host: rac3.wam.umd.edu Organization: University of Maryland, College Park Lines: 15 MSG: I was wondering if anyone out there could enlighten me on this car I sawnthe other day. It was a 2-door sports car, looked to be from the late 60s/nearly 70s. It was called a Bricklin. The doors were really small. In addition,nthe front bumper was separate from the rest of the body. This is nall I know. If anyone can tellme a model name, engine specs, yearsnof production, where this car is made, history, or whatever info younhave on this funky looking car, please e-mail.nnThanks,n- ILn ---- brought to you by your neighborhood Lerxst ---- nnnnn" rec.autos
  • 40. EMAIL MULTICLASS DATASET (20 classes) From: guykuo@carson.u.washington.edu (Guy Kuo) Subject: SI Clock Poll - Final Call Summary: Final call for SI clock reports Keywords: SI,acceleration,clock,upgrade Article-I.D.: shelley.1qvfo9INNc3s Organization: University of Washington Lines: 11 NNTP-Posting-Host: carson.u.washington.edu MSG: A fair number of brave souls who upgraded their SI clock oscillator havenshared their experiences for this poll. Please send a brief message detailingnyour experiences with the procedure. Top speed attained, CPU rated speed,nadd on cards and adapters, heat sinks, hour of usage per day, floppy disknfunctionality with 800 and 1.4 m floppies are especially requested.nnI will be summarizing in the next two days, so please add to the networknknowledge base if you have done the clock upgrade and haven't answered thisnpoll. Thanks.nnGuy Kuo <guykuo@u.washington.edu>n" comp.sys.mac.hardware
  • 41. EMAIL MULTICLASS DATASET (20 classes) From: jgreen@amber (Joe Green) Subject: Re: Weitek P9000 ? Organization: Harris Computer Systems Division Lines: 14 Distribution: world NNTP-Posting-Host: amber.ssd.csd.harris.com X-Newsreader: TIN [version 1.1 PL9] MSG: Robert J.C. Kyanko (rob@rjck.UUCP) wrote:n> abraxis@iastate.edu writes in article <abraxis.734340159@class1.iastate.edu>:n> > Anyone know about the Weitek P9000 graphics chip?n> As far as the low-level stuff goes, it looks pretty nice. It's got thisn> quadrilateral fill command that requires just the four points.nnDo you have Weitek's address/phone number? I'd like to get some informationnabout this chip.nn--nJoe GreenttttHarris Corporationnjgreen@csd.harris.comtttComputer Systems Divisionn"The only thing that really scares me is a person with no sense of humor."ntttttt-- Jonathan Wintersn’ comp.graphics
  • 42. EMAIL MULTICLASS DATASET (20 classes)
  • 44. Upload Your Resume Now painstakingly fill out this form containing all of the exact same information
  • 50. Mimicking the human recruiter Feature Hunt ONE FEATURE AT A TIME INCREMENTAL GAINS
  • 52. Unstructured ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE AUTOMATIC FEATURE GENERATION Structured I want Want to to go work here PERF. 1 1 1 1 1 95% 1 0 0 0 0 22% 0 0 0 0 0 57% 1 0 0 0 0 91% 0 0 0 0 0 11% ESSAY I want to work here I have great teamwork Synergy I have so much grit They fired that individual
  • 53. ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE AUTOMATIC FEATURE GENERATION ESSAY I want to work here I have great teamwork Synergy I have so much grit They fired that individual ESSAY 3 2 1 4 5 3 7 67 345 54 3 7 99 10234 78 203 501 14 1 2 3 4 5 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 LSTM RAW TEXT WORD SEQUENCE ENCODING
  • 57. BEGIN SCRATCHING AT LAYOUT AUTOMATIC FEATURE GENERATION (LAYOUT) CNN: bit.ly/pacon
  • 59. 59 Would you ever hire from just a resume? INTERVIEW MODELING SOFT/TECHNICAL COMPETENCIES Resume can overstate and understate

Hinweis der Redaktion

  1. My name is Ben Taylor, I’m the Chief Data Scientist for a great startup called HireVue. Today I will be talking about NLP as well as deep learning. This talk is meant to be an introductory to those who are less familiar with NLP
  2. HR has seen a great cross pollination from other industries, I am an example of that. I studied chemical engineering through both undergrad and graduate programs. I then went and worked as a quant for a Manhattan hedge fund manager on a 600 GPU cluster. And… now I’m in HR. Oh, and I LOVE love love backcountry snowboarding. I took this photo last week and I go 2-3 times a week before work. I will never work anywhere besides Utah because of this.
  3. What is HireVue? We are a digital interviewing & interaction company We are backed by Sequoia Capital In 2014 we were # 10 for Forebes most promising companies Global, supporting digital interviews in 189 countries
  4. Building predictive models from competencies or other numeric features is straight forward. You take the columns or features of interest on the left, and the performance labels on the right and you pass them through a type of regression. Excel will do this, many programs will do this just fine. If you are MORE advanced you can use programs like R, python to do more advanced regressions like random forest, gradient boosting regression, or other… Raise your hand if you know how to build a model from this data?
  5. Now to throw a wrench in your process, I have decided to inject open ended essay response into my feature set. Raise your hand again if you know how/what to build a predictive model with this? Most classical statistician/mathematicians/analysis are justifiabilty confused by this is
  6. Like most data science or machine learning tricks, once they are explained at a 5th grade level, we tend to be underwhelmed. The computer can’t understand the raw text in its native format, it must convert them to numbers. One way to accomplish this is to map the text to numeric replacements. Good, better, best, can become 1,2, and 3. What would you do if you had something like male or female? You can map these, because if you made the male 2 and the female 1 are you being sexist? The are completely different, they can’t be directly compared. Therefore they must be tokenized where each column now represents the variable, so columns are created
  7. In the case of text you can have a LOT of columns. In some cases you may exceed 10,000, 100,000, or even 10M columns. Imagine attempting to open a dataset like this in excel, with over 1M columns. You have to use special software in R or python that can handle these types of data objects in a compressed sparse format. Can anyone see what the problems are with this approach? There is a major drawback. [sequence loss[
  8. Bag of words! This is called bag of words because you can visualize the words as if they are picked up by a paper bag. All sequence and ordering is lost. Is that a problem? Maybe.
  9. I analyzed some twitter data for Skullcandy a few years ago. When we presented our results to the engineering team we asked them “If someone says the F word in a tweet and tags your company… is that a bad thing?”. Think about it, for most of us in the room, with the companies we work for and represent does that give you anxiety thinking about that? The reason that gives us anxiety is because we know that would be a terrible thing and it would be really bad. Skullcandy knew their customer base well enough, they said they were sure. And sure enough the data showed that half of the people saying the F word on twitter and tagging Skullcandy said nice things, and the other have said mean things. So a word that is typically polarizing had no impact. Bad.... Bad is a bad word Ass.... Ass is a bad word But... If I say “bad ass” my bag of words method is going to see that as a very very bad thing, when in fact is is a very nice thing. How do we fix that?
  10. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  11. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  12. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  13. Excel 2002: 256 columns Excel 2003: 256 columns Excel 2016: 1,048,576 rows by 16,384 columns
  14. Excel 2002: 256 columns Excel 2003: 256 columns Excel 2016: 1,048,576 rows by 16,384 columns
  15. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  16. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  17. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  18. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  19. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  20. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  21. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  22. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  23. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  24. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  25. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  26. One point to bring up is proper model validation. I used to be confused when someone said train on 70, test on 30. Or train on 80 test on 20. Who was right? The answer I have settled on now is their of them are right. Explain the conflict.
  27. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  28. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  29. Now that we have some basic NLP background we will change gears to RESUME modeling. Who hates looking through stacks of resumes? Not sure, sometimes it can be fun, but depending on the stack size you might be spending 30 seconds on a resume, 7 seconds? How quickly can you screen a resume? Think about what you are doing when you screen a resume? Where you are your eyes looking? School GPA Skills Work history Name? Hopefully you don’t look at name.
  30. To review possible flow using NLP first we have an unstructured resume, we are forced to structure it somehow, then tokenize or munge the data into numeric
  31. Sometimes we can predict things without opening up the resume. Checkout these file extensions. It is hard to see, but statistically someone who uploads a DOC resume is be more likely to interview well than someone who does RTF. Likewise DOCX beats DOC And PDF beats DOCX
  32. What do we do with ALL of these formats? DOCX, txt, pdf? This is actually a big problem, we can’t do anything cool until we standardize the formats. Luckily there is a free open source office platform that can do the conversion for us. I recommend converting it to either txt or html.
  33. Now that we have text we can write specific feature grabbers like GPA. For the resumes we analyzed we noticed that GPAs were only included 1/5 resumes. Also this is where the distributions fell, not very many below 3.00 GPA are reporting. What do you do if someone does not include a GPA? When a feature is missing you MUST replace. Do you replace the GPA with a 0? That’s harsh, a 2.0? 4.0? Average? It depends
  34. Testing prediction quality we found that optimal prediction quality comes when we replace the GPA with a 3.6 What does that mean? That means if you have less than a 3.6 GPA, as far as the computer is concerned, including it doesn’t help you.
  35. There are so many features to create in the case of a resume model, you can save yourself a lot of time using a resume parsing service. The majority of the value comes from BOW. You quickly begin approaching incremental returns where a LOT of effort results in marginal gain. Malicious resume
  36. The biggest value that deep learning offers is automatic feature value discovery. This has been incredibly valuable with image, hitting new high points. It can also be valuable for text, allowing you to forget the concept of a tuple or n-gram.
  37. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  38. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  39. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  40. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  41. Fun tangent, does resume formatting matter? Margins? Font size, layout?
  42. Would you ever hire from just a resume? Why not?
  43. For interview modeling we use spoken text, which is more difficult because of the transcription accuracies. Raw audio (utterence, repetition) Video, micro expression (Lie to me)