This is a presentation I gave at SIOP 2015 in Philadelphia. The presentation shows how you can predict performance from a video interview using unstructured feature extraction and supervised learning. It also discusses k-folding cross validation which is less commonly known with in the IO community, but preferred within the data science community.
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
#SIOP15 Presentation On Performance Sorting Using Video Interviews
1. Model
Driven
Candidate
Sor0ng
Based
On
Video
Interview
Cues
Benjamin
Taylor
Chief
Data
Scien-st
2. Outline
• Introduc)on
• Case
study
objec)ve
• Big
data
landscape
• Problem
setup
• Results/Conclusion
• Future
work
@bentaylordata
3. Introduc0on
• Chemical
Engineering
(BS/MS/PhD
Candidate)
• 5
years
Intel/Micron
– Photolithography,
process
control,
yield
modeling
• AIQ
Hedge
fund
– 600
GPU
chip
cluster,
algorithmic
stock
modeling,
– distributed
metaheuris)c
algorithms
• HireVue,
Chief
Data
Scien0st
– HR
analy)cs,
interview
modeling
@bentaylordata
4. Case
Study
Objec0ve
• Given
400
recorded
video
interviews
for
sales
posi)ons
and
post
hire
performance
data
can
improved
sor)ng
efficiency
be
demonstrate
out-‐of-‐sample?
V=400
Input
Data
Set
Target
Data
Set,
n=400
Personal
Email
Perf
rich.taylor@gmail.com
Exceeds
wasatch@aol.com
Meets
tradmonkey@mx.com
Below
hsommer@gmail.com
Meets
@bentaylordata
5. bigdata
hadoop
Big
data
landscape
• Big
data
plaVorms
have
mo)vated
innova)ons
around
unstructured
data
handling.
These
innova)ons
have
involved
new
algorithms
and
beWer
unstructured
wrangling
methods.
@bentaylordata
6. Big
data
landscape
• Unstructured
data
– Data
that
does
not
have
a
predefine
data
model
or
schema,
i.e.
tool
logs,
resumes,
cover
le8ers,
images,
audio,
video,
Twi8er,
LinkedIn
• Structured
data
– Data
that
fits
within
a
predefined
data
model.
Most
common
structured
data
formats
involve
a
column/row
architecture.
Most
familiar
examples
include
spreadsheet
soYware
such
as
Excel.
@bentaylordata
7. Problem
setup
• Unstructured
data
challenge
– How
do
we
convert
the
video
into
a
manageable
machine
ready
format?
AKA
unstructured
>
structured
data.
0.23,0.15,0.98,0.63,0.45,0.36…
1D
Vector
representa.on
Method?
@bentaylordata
8. F 3.95 Data Scientist Yale Sky diving
M 2.93 HR Analyst SLCC Poetry
F 3.41 Data Munger Harvard Cycling
1 3.95 5 310 56
0 2.93 7 520 91
1 3.41 6 240 56
Name: Sally Taylor
GPA: 3.95
Previous Job: Data Scientist
School: Yale
Hobbies: Sky diving
UNSTRUCTURED
STRUCTURED
TOKENIZED
Problem
Setup
• What
is
done
for
text
modeling?
@bentaylordata
9. Problem
Setup
• Piecemeal
the
structuring:
final
outputs
are
scalars
Audio
Video
Text
Signal
Processing
Personality
Expression
Signal
Processing
ts
ts
us
us
us
us
=
unstructured
data
ts
=
-me
series
data
s
=
scalar
data
s
@bentaylordata
12. Feature
Gen
Video
Indicators
@bentaylordata
Signal
Processing
F989
F990
F991
scalar
13. @bentaylordata
Combining
All
Features
X
56.341
-‐200.45
0
1
2
4
60.71
12
52.15
-‐350.12
1
1
Feature
Mapping:
As
the
features
are
produced
they
are
stored
in
a
matrix
where
each
column
represents
a
feature
and
each
row
represents
an
interview
2
4
60.71
12
52.15
-‐350.12
1
0
2
3
16.16
21
25.51
-‐105.21
0
0
NA
NA
NA
NA
NA
14. How
To
Build
A
Model
Model
Best
Fitness?
@bentaylordata
15. A
Lesson
On
K-‐folding
@bentaylordata
Folds
=
9
Cut
your
data
up
into
fixed
folds
16. A
Lesson
On
K-‐folding
@bentaylordata
Folds
=
9
Fold
=
1
Fold
=
2…
Y_pred
18. Results:
Conclusion:
Using
structured
features
from
audio
and
video
we
are
able
to
show
predic)ve
sor)ng
value
in
our
out-‐of-‐
sample
interviews.
Model
AUC
score
Bernoulli
NB
0.75
Other
0.79
67.50%
reduc)on
in
interview
evalua)on
>300%
increase
in
concentra)on
@bentaylordata
19. Feature
Engineering
Auto
Feature
Engineering
Future
Work:
Future
work
involves
offloading
the
feature
engineering
tasks
to
a
more
automated
Process
such
as
deep
learning
or
more
advanced
ensemble
modeling
methods.
My
Contact
Info:
Twi^er:
@bentaylordata
Email:
btaylor@hirevue.com
LinkedIn:
bentaylordata
@bentaylordata