Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
#SIOP15 Presentation on
1. Model Driven Candidate Sorting
Based On Video Interview Cues
Benjamin Taylor
Chief Data Scientist
2. Outline
• Introduction
• Case study objective
• Big data landscape
• Problem setup
• Results/Conclusion
• Future work
@bentaylordata
3. Introduction
• Chemical Engineering (BS/MS/PhD Candidate)
• 5 years Intel/Micron
– Photolithography, process control, yield modeling
• AIQ Hedge fund
– 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuristic
algorithms
• HireVue, Chief Data Scientist
– HR analytics, interview modeling
@bentaylordata
4. Case Study Objective
• Given 400 recorded video interviews for sales positions
and post hire performance data can improved sorting
efficiency be demonstrate out-of-sample?
V=400
Input Data Set Target Data Set, n=400
Personal Email Perf
rich.taylor@gmail.com Exceeds
wasatch@aol.com Meets
tradmonkey@mx.com Below
hsommer@gmail.com Meets
@bentaylordata
5. big
data
hadoop
Big data landscape
• Big data platforms have motivated innovations around
unstructured data handling. These innovations have
involved new algorithms and better unstructured
wrangling methods.
@bentaylordata
6. Big data landscape
• Unstructured data
– Data that does not have a predefine data model or schema, i.e.
tool logs, resumes, cover letters, images, audio, video, Twitter,
LinkedIn
• Structured data
– Data that fits within a predefined data model. Most common
structured data formats involve a column/row architecture.
Most familiar examples include spreadsheet software such as
Excel.
@bentaylordata
7. Problem setup
• Unstructured data challenge
– How do we convert the video into a manageable machine
ready format? AKA unstructured > structured data.
0.23,0.15,0.98,0.63,0.45,0.36…
1D Vector representation
Method?
@bentaylordata
9. Problem Setup
• Piecemeal the structuring: final outputs are scalars
Audio
Video
Text
Signal Processing
Personality
Expression Signal Processing
ts
ts
us
us
us
us = unstructured data
ts = time series data
s = scalar data
s
@bentaylordata
13. @bentaylordata
Combining All Features
X
56.341 -200.45 0 1
2 4 60.71 12 52.15 -350.12 1 1
Feature Mapping:
As the features are produced they
are stored in a matrix where each
column represents a feature and
each row represents an interview
2 4 60.71 12 52.15 -350.12 1 0
2 3 16.16 21 25.51 -105.21 0 0
NA
NA
NA
NA
NA
14. How To Build A Model
Model
Best
Fitness?
@bentaylordata
15. A Lesson On K-folding
@bentaylordata
Folds = 9
Cut your data up
into fixed folds
16. A Lesson On K-folding
@bentaylordata
Folds = 9 Fold = 1 Fold = 2… Y_pred
18. Results:
Conclusion:
Using structured features
from audio and video we
are able to show predictive
sorting value in our out-of-
sample interviews.
Model AUC score
Bernoulli NB 0.75
Other 0.79
67.50% reduction in interview evaluation
>300% increase in concentration
@bentaylordata
19. Feature
Engineering
Auto Feature
Engineering
Future Work:
Future work involves offloading the feature engineering tasks to a more automated
Process such as deep learning or more advanced ensemble modeling methods.
My Contact Info:
Twitter: @bentaylordata
Email: btaylor@hirevue.com
LinkedIn: bentaylordata
@bentaylordata
Hinweis der Redaktion
Hadoop story:
Why is it called Hadoop?
Google paper?
Hadoop story:
Why is it called Hadoop?
Google paper?
Hadoop story:
Why is it called Hadoop?
Google paper?