SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Model	
  Driven	
  Candidate	
  Sor0ng	
  	
  
Based	
  On	
  Video	
  Interview	
  Cues	
  
	
  	
  	
  
Benjamin	
  Taylor	
  
Chief	
  Data	
  Scien-st	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Outline	
  
•  Introduc)on	
  
•  Case	
  study	
  objec)ve	
  
•  Big	
  data	
  landscape	
  	
  
•  Problem	
  setup	
  
•  Results/Conclusion	
  
•  Future	
  work	
  
@bentaylordata	
  
Introduc0on	
  
•  Chemical	
  Engineering	
  (BS/MS/PhD	
  Candidate)	
  
•  5	
  years	
  Intel/Micron	
  
–  Photolithography,	
  process	
  control,	
  yield	
  modeling	
  
•  AIQ	
  Hedge	
  fund	
  
–  600	
  GPU	
  chip	
  cluster,	
  algorithmic	
  stock	
  modeling,	
  	
  
–  distributed	
  metaheuris)c	
  algorithms	
  
•  HireVue,	
  Chief	
  Data	
  Scien0st	
  
–  HR	
  analy)cs,	
  interview	
  modeling	
  
	
  
@bentaylordata	
  
Case	
  Study	
  Objec0ve	
  
•  Given	
  400	
  recorded	
  video	
  interviews	
  for	
  sales	
  posi)ons	
  
and	
  post	
  hire	
  performance	
  data	
  can	
  improved	
  sor)ng	
  
efficiency	
  be	
  demonstrate	
  out-­‐of-­‐sample?	
  	
  
V=400	
  
Input	
  Data	
  Set	
   Target	
  Data	
  Set,	
  n=400	
  
Personal	
  Email	
   Perf	
  
rich.taylor@gmail.com	
   Exceeds	
  
wasatch@aol.com	
   Meets	
  
tradmonkey@mx.com	
   Below	
  
hsommer@gmail.com	
   Meets	
  
@bentaylordata	
  
bigdata
hadoop
Big	
  data	
  landscape	
  
•  Big	
  data	
  plaVorms	
  have	
  mo)vated	
  innova)ons	
  around	
  
unstructured	
  data	
  handling.	
  These	
  innova)ons	
  have	
  
involved	
  new	
  algorithms	
  and	
  beWer	
  unstructured	
  
wrangling	
  methods.	
  	
  
@bentaylordata	
  
Big	
  data	
  landscape	
  
•  Unstructured	
  data	
  
–  Data	
  that	
  does	
  not	
  have	
  a	
  predefine	
  data	
  model	
  or	
  schema,	
  
i.e.	
  tool	
  logs,	
  resumes,	
  cover	
  le8ers,	
  images,	
  audio,	
  video,	
  
Twi8er,	
  LinkedIn	
  
•  Structured	
  data	
  
–  Data	
  that	
  fits	
  within	
  a	
  predefined	
  data	
  model.	
  Most	
  common	
  
structured	
  data	
  formats	
  involve	
  a	
  column/row	
  architecture.	
  
Most	
  familiar	
  examples	
  include	
  spreadsheet	
  soYware	
  such	
  as	
  
Excel.	
  
@bentaylordata	
  
Problem	
  setup	
  
•  Unstructured	
  data	
  challenge	
  
–  How	
  do	
  we	
  convert	
  the	
  video	
  into	
  a	
  manageable	
  machine	
  
ready	
  format?	
  AKA	
  unstructured	
  >	
  structured	
  data.	
  	
  
0.23,0.15,0.98,0.63,0.45,0.36…	
  
1D	
  Vector	
  representa.on	
  
Method?	
  
@bentaylordata	
  
F 3.95 Data Scientist Yale Sky diving
M 2.93 HR Analyst SLCC Poetry
F 3.41 Data Munger Harvard Cycling
1 3.95 5 310 56
0 2.93 7 520 91
1 3.41 6 240 56
Name: Sally Taylor
GPA: 3.95
Previous Job: Data Scientist
School: Yale
Hobbies: Sky diving
UNSTRUCTURED
STRUCTURED
TOKENIZED
Problem	
  Setup	
  
•  What	
  is	
  done	
  for	
  text	
  modeling?	
  
@bentaylordata	
  
Problem	
  Setup	
  
•  Piecemeal	
  the	
  structuring:	
  final	
  outputs	
  are	
  scalars	
  
Audio	
  
Video	
  
Text	
  
Signal	
  Processing	
  
Personality	
  
Expression	
   Signal	
  Processing	
  
ts	
  
ts	
  
us	
  
us	
  
us	
  
us	
  =	
  unstructured	
  data	
  
ts	
  =	
  -me	
  series	
  data	
  
s	
  =	
  scalar	
  data	
  
s	
  
@bentaylordata	
  
Feature	
  
Gen	
  
Raw	
  Audio	
  Indicators	
  
@bentaylordata	
  
•  Engagement	
  
•  Mo)va)on	
  
•  Distress	
  
•  Aggression	
  
Model	
  
Personality	
  Models	
  
@bentaylordata	
  
Feature	
  
Gen	
  
Video	
  Indicators	
  
@bentaylordata	
  
Signal	
  
Processing	
  
F989	
   F990	
   F991	
  
scalar	
  
@bentaylordata	
  
Combining	
  All	
  Features	
  
X	
  
56.341	
  	
  -­‐200.45	
  	
  0	
  	
  1	
  	
  
2	
  4	
  60.71	
  12	
  	
  52.15	
  	
  -­‐350.12	
  	
  1	
  	
  1	
  	
  
Feature	
  Mapping:	
  
As	
  the	
  features	
  are	
  produced	
  they	
  
are	
  stored	
  in	
  a	
  matrix	
  where	
  each	
  
column	
  represents	
  a	
  feature	
  and	
  
each	
  row	
  represents	
  an	
  interview	
  
2	
  4	
  60.71	
  12	
  	
  52.15	
  	
  -­‐350.12	
  	
  1	
  	
  0	
  	
  
2	
  3	
  16.16	
  21	
  	
  25.51	
  	
  -­‐105.21	
  	
  0	
  	
  0	
  	
  
NA	
  
NA	
  
NA	
  
NA	
  
NA	
  
How	
  To	
  Build	
  A	
  Model	
  
Model	
  
Best	
  	
  
Fitness?	
  
	
  
@bentaylordata	
  
A	
  Lesson	
  On	
  K-­‐folding	
  
@bentaylordata	
  
Folds	
  =	
  9	
  
Cut	
  your	
  data	
  up	
  
into	
  fixed	
  folds	
  
A	
  Lesson	
  On	
  K-­‐folding	
  
@bentaylordata	
  
Folds	
  =	
  9	
   Fold	
  =	
  1	
   Fold	
  =	
  2…	
   Y_pred	
  
Fitness	
  Metric?	
  
Top	
  Performer	
  Accuracy	
   AUC	
  
@bentaylordata	
  
Results:	
  
Conclusion:	
  
Using	
  structured	
  features	
  
from	
  audio	
  and	
  video	
  we	
  
are	
  able	
  to	
  show	
  predic)ve	
  
sor)ng	
  value	
  in	
  our	
  out-­‐of-­‐
sample	
  interviews.	
  
	
  
	
  	
  
Model	
   AUC	
  score	
  
Bernoulli	
  NB	
   0.75	
  
Other	
   0.79	
  
67.50%	
  reduc)on	
  in	
  interview	
  evalua)on	
  
>300%	
  increase	
  in	
  concentra)on	
  
@bentaylordata	
  
Feature	
  
Engineering	
  
Auto	
  Feature	
  	
  
Engineering	
  
Future	
  Work:	
  
Future	
  work	
  involves	
  offloading	
  the	
  feature	
  engineering	
  tasks	
  to	
  a	
  more	
  automated	
  
Process	
  such	
  as	
  deep	
  learning	
  or	
  more	
  advanced	
  ensemble	
  modeling	
  methods.	
  
My	
  Contact	
  Info:	
  
	
  Twi^er:	
  @bentaylordata	
  
	
  Email:	
  btaylor@hirevue.com	
  
	
  LinkedIn:	
  	
  bentaylordata	
  
	
  
@bentaylordata	
  

Weitere ähnliche Inhalte

Ähnlich wie #SIOP15 Presentation On Performance Sorting Using Video Interviews

How to Develop and Simulate Models with No Coding Experience
How to Develop and Simulate Models with No Coding ExperienceHow to Develop and Simulate Models with No Coding Experience
How to Develop and Simulate Models with No Coding ExperienceElizabeth Steiner
 
Benchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaBenchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaNick Craswell
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
2010/09 - Database Architechs - Performance & Tuning Tool
2010/09 - Database Architechs - Performance & Tuning Tool2010/09 - Database Architechs - Performance & Tuning Tool
2010/09 - Database Architechs - Performance & Tuning ToolDatabase Architechs
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015loydbakerjr
 
Bb world 2012 using database statistics to make capacity planning decisions...
Bb world 2012   using database statistics to make capacity planning decisions...Bb world 2012   using database statistics to make capacity planning decisions...
Bb world 2012 using database statistics to make capacity planning decisions...Geoff Mower
 
2010/10 - Database Architechs - Perf. & Tuning Tools
2010/10 - Database Architechs - Perf. & Tuning Tools2010/10 - Database Architechs - Perf. & Tuning Tools
2010/10 - Database Architechs - Perf. & Tuning ToolsDatabase Architechs
 
Techorama 2017 - Testing the unit, and beyond.
Techorama 2017 - Testing the unit, and beyond.Techorama 2017 - Testing the unit, and beyond.
Techorama 2017 - Testing the unit, and beyond.Bert Brouns
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews
 
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...Neil Thompson
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdfvkharish18
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learningIvo Andreev
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
 
What's New in Innoslate 4.4?
What's New in Innoslate 4.4?What's New in Innoslate 4.4?
What's New in Innoslate 4.4?SarahCraig7
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 

Ähnlich wie #SIOP15 Presentation On Performance Sorting Using Video Interviews (20)

How to Develop and Simulate Models with No Coding Experience
How to Develop and Simulate Models with No Coding ExperienceHow to Develop and Simulate Models with No Coding Experience
How to Develop and Simulate Models with No Coding Experience
 
Benchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaBenchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academia
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Future career goals in it
Future career goals in itFuture career goals in it
Future career goals in it
 
2010/09 - Database Architechs - Performance & Tuning Tool
2010/09 - Database Architechs - Performance & Tuning Tool2010/09 - Database Architechs - Performance & Tuning Tool
2010/09 - Database Architechs - Performance & Tuning Tool
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015
 
Bb world 2012 using database statistics to make capacity planning decisions...
Bb world 2012   using database statistics to make capacity planning decisions...Bb world 2012   using database statistics to make capacity planning decisions...
Bb world 2012 using database statistics to make capacity planning decisions...
 
2010/10 - Database Architechs - Perf. & Tuning Tools
2010/10 - Database Architechs - Perf. & Tuning Tools2010/10 - Database Architechs - Perf. & Tuning Tools
2010/10 - Database Architechs - Perf. & Tuning Tools
 
Resume 2016 Dec
Resume 2016 DecResume 2016 Dec
Resume 2016 Dec
 
Techorama 2017 - Testing the unit, and beyond.
Techorama 2017 - Testing the unit, and beyond.Techorama 2017 - Testing the unit, and beyond.
Techorama 2017 - Testing the unit, and beyond.
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
Test Data, Information, Knowledge, Wisdom: past, present & future of standing...
 
Ankit.Srivastava
Ankit.SrivastavaAnkit.Srivastava
Ankit.Srivastava
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
JKSQL
JKSQLJKSQL
JKSQL
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
What's New in Innoslate 4.4?
What's New in Innoslate 4.4?What's New in Innoslate 4.4?
What's New in Innoslate 4.4?
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 

Mehr von Benjamin Taylor

How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A RockstarBenjamin Taylor
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Benjamin Taylor
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yieldBenjamin Taylor
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionBenjamin Taylor
 

Mehr von Benjamin Taylor (7)

Python genetics
Python geneticsPython genetics
Python genetics
 
Homeless story
Homeless storyHomeless story
Homeless story
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A Rockstar
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yield
 
Text analytics intro
Text analytics introText analytics intro
Text analytics intro
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
 

#SIOP15 Presentation On Performance Sorting Using Video Interviews

  • 1. Model  Driven  Candidate  Sor0ng     Based  On  Video  Interview  Cues         Benjamin  Taylor   Chief  Data  Scien-st                                            
  • 2. Outline   •  Introduc)on   •  Case  study  objec)ve   •  Big  data  landscape     •  Problem  setup   •  Results/Conclusion   •  Future  work   @bentaylordata  
  • 3. Introduc0on   •  Chemical  Engineering  (BS/MS/PhD  Candidate)   •  5  years  Intel/Micron   –  Photolithography,  process  control,  yield  modeling   •  AIQ  Hedge  fund   –  600  GPU  chip  cluster,  algorithmic  stock  modeling,     –  distributed  metaheuris)c  algorithms   •  HireVue,  Chief  Data  Scien0st   –  HR  analy)cs,  interview  modeling     @bentaylordata  
  • 4. Case  Study  Objec0ve   •  Given  400  recorded  video  interviews  for  sales  posi)ons   and  post  hire  performance  data  can  improved  sor)ng   efficiency  be  demonstrate  out-­‐of-­‐sample?     V=400   Input  Data  Set   Target  Data  Set,  n=400   Personal  Email   Perf   rich.taylor@gmail.com   Exceeds   wasatch@aol.com   Meets   tradmonkey@mx.com   Below   hsommer@gmail.com   Meets   @bentaylordata  
  • 5. bigdata hadoop Big  data  landscape   •  Big  data  plaVorms  have  mo)vated  innova)ons  around   unstructured  data  handling.  These  innova)ons  have   involved  new  algorithms  and  beWer  unstructured   wrangling  methods.     @bentaylordata  
  • 6. Big  data  landscape   •  Unstructured  data   –  Data  that  does  not  have  a  predefine  data  model  or  schema,   i.e.  tool  logs,  resumes,  cover  le8ers,  images,  audio,  video,   Twi8er,  LinkedIn   •  Structured  data   –  Data  that  fits  within  a  predefined  data  model.  Most  common   structured  data  formats  involve  a  column/row  architecture.   Most  familiar  examples  include  spreadsheet  soYware  such  as   Excel.   @bentaylordata  
  • 7. Problem  setup   •  Unstructured  data  challenge   –  How  do  we  convert  the  video  into  a  manageable  machine   ready  format?  AKA  unstructured  >  structured  data.     0.23,0.15,0.98,0.63,0.45,0.36…   1D  Vector  representa.on   Method?   @bentaylordata  
  • 8. F 3.95 Data Scientist Yale Sky diving M 2.93 HR Analyst SLCC Poetry F 3.41 Data Munger Harvard Cycling 1 3.95 5 310 56 0 2.93 7 520 91 1 3.41 6 240 56 Name: Sally Taylor GPA: 3.95 Previous Job: Data Scientist School: Yale Hobbies: Sky diving UNSTRUCTURED STRUCTURED TOKENIZED Problem  Setup   •  What  is  done  for  text  modeling?   @bentaylordata  
  • 9. Problem  Setup   •  Piecemeal  the  structuring:  final  outputs  are  scalars   Audio   Video   Text   Signal  Processing   Personality   Expression   Signal  Processing   ts   ts   us   us   us   us  =  unstructured  data   ts  =  -me  series  data   s  =  scalar  data   s   @bentaylordata  
  • 10. Feature   Gen   Raw  Audio  Indicators   @bentaylordata  
  • 11. •  Engagement   •  Mo)va)on   •  Distress   •  Aggression   Model   Personality  Models   @bentaylordata  
  • 12. Feature   Gen   Video  Indicators   @bentaylordata   Signal   Processing   F989   F990   F991   scalar  
  • 13. @bentaylordata   Combining  All  Features   X   56.341    -­‐200.45    0    1     2  4  60.71  12    52.15    -­‐350.12    1    1     Feature  Mapping:   As  the  features  are  produced  they   are  stored  in  a  matrix  where  each   column  represents  a  feature  and   each  row  represents  an  interview   2  4  60.71  12    52.15    -­‐350.12    1    0     2  3  16.16  21    25.51    -­‐105.21    0    0     NA   NA   NA   NA   NA  
  • 14. How  To  Build  A  Model   Model   Best     Fitness?     @bentaylordata  
  • 15. A  Lesson  On  K-­‐folding   @bentaylordata   Folds  =  9   Cut  your  data  up   into  fixed  folds  
  • 16. A  Lesson  On  K-­‐folding   @bentaylordata   Folds  =  9   Fold  =  1   Fold  =  2…   Y_pred  
  • 17. Fitness  Metric?   Top  Performer  Accuracy   AUC   @bentaylordata  
  • 18. Results:   Conclusion:   Using  structured  features   from  audio  and  video  we   are  able  to  show  predic)ve   sor)ng  value  in  our  out-­‐of-­‐ sample  interviews.         Model   AUC  score   Bernoulli  NB   0.75   Other   0.79   67.50%  reduc)on  in  interview  evalua)on   >300%  increase  in  concentra)on   @bentaylordata  
  • 19. Feature   Engineering   Auto  Feature     Engineering   Future  Work:   Future  work  involves  offloading  the  feature  engineering  tasks  to  a  more  automated   Process  such  as  deep  learning  or  more  advanced  ensemble  modeling  methods.   My  Contact  Info:    Twi^er:  @bentaylordata    Email:  btaylor@hirevue.com    LinkedIn:    bentaylordata     @bentaylordata